Skip to main content

On This Page

A Coding Guide to Demonstrate Targeted Data Poisoning Attacks in Deep Learning

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Targeted Data Poisoning Attacks in Deep Learning

This tutorial demonstrates a realistic data poisoning attack where labels in the CIFAR-10 dataset are manipulated to observe the resulting impact on model behavior. By flipping labels from a target class to a malicious class during training, the study shows how subtle data corruption can lead to systematic misclassification.

Why This Matters

Ideal machine learning models assume clean, representative training data; however, real-world datasets are often vulnerable to malicious manipulation. Data poisoning attacks can compromise model integrity, leading to biased predictions or targeted failures, with potential costs reaching millions of dollars in compromised systems, especially in contexts like autonomous driving or finance.

Key Insights

  • Label Flipping: A common data poisoning technique where the assigned label of a training sample is changed.
  • CIFAR-10 Dataset: A widely used benchmark dataset for image classification, consisting of 60,000 32x32 color images in 10 classes.
  • ResNet Architecture: The study utilizes a ResNet-18 model, a convolutional neural network known for its ability to train deeper networks and achieve high accuracy.

Working Example

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset
import numpy as np

CONFIG = {
    "batch_size": 128,
    "epochs": 10,
    "lr": 0.001,
    "target_class": 1,
    "malicious_label": 9,
    "poison_ratio": 0.4,
}
torch.manual_seed(42)
np.random.seed(42)

class PoisonedCIFAR10(Dataset):
    def __init__(self, original_dataset, target_class, malicious_label, ratio, is_train=True):
        self.dataset = original_dataset
        self.targets = np.array(original_dataset.targets)
        self.is_train = is_train
        if is_train and ratio > 0:
            indices = np.where(self.targets == target_class)[0]
            n_poison = int(len(indices) * ratio)
            poison_indices = np.random.choice(indices, n_poison, replace=False)
            self.targets[poison_indices] = malicious_label

    def __getitem__(self, index):
        img, _ = self.dataset[index]
        return img, self.targets[index]

    def __len__(self):
        return len(self.dataset)

def get_model():
    model = torchvision.models.resnet18(num_classes=10)
    model.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
    model.maxpool = nn.Identity()
    return model.to(CONFIG["device"])

transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465),
                          (0.2023, 0.1994, 0.2010))
])
base_train = torchvision.datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
poison_ds = PoisonedCIFAR10(base_train, CONFIG["target_class"], CONFIG["malicious_label"], ratio=CONFIG["poison_ratio"])
poison_loader = DataLoader(poison_ds, batch_size=CONFIG["batch_size"], shuffle=True)
model = get_model()
optimizer = optim.Adam(model.parameters(), lr=CONFIG["lr"])
criterion = nn.CrossEntropyLoss()
for _ in range(CONFIG["epochs"]):
    model.train()
    for images, labels in poison_loader:
        images = images.to(CONFIG["device"])
        labels = labels.to(CONFIG["device"])
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

Practical Applications

  • Autonomous Vehicles: An attacker could poison training data to cause a self-driving car to misclassify road signs.
  • Spam Filtering: Poisoning the training data for a spam filter to allow malicious emails to bypass detection.

References:

Continue reading

Next article

API First in Practice: How We Made Frontend Types Predictable and Stable

Related Content