bob-onboard

Research Code Onboarding with IBM Bob

View the Project on GitHub lavneethora/bob-onboard

Adversarial Robustness Toolbox (ART) - Developer Onboarding Guide

Executive Summary

The Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security hosted by the Linux Foundation AI & Data Foundation. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against adversarial threats.

Research Domain: Machine Learning Security, Adversarial Machine Learning, AI Safety

Primary Goals:

Key Technologies: Python 3.10+, NumPy, SciPy, scikit-learn, TensorFlow, PyTorch, Keras

Target Audience: ML security researchers, adversarial ML practitioners, AI safety engineers, and developers building robust ML systems


Architecture Overview

High-Level System Design

ART follows a modular architecture built around four core pillars that mirror the adversarial ML threat landscape:

  1. Estimators: Wrappers around ML models from various frameworks (PyTorch, TensorFlow, Keras, scikit-learn, etc.) that provide a unified interface for attacks and defenses
  2. Attacks: Implementations of adversarial attack methods organized by threat type (evasion, poisoning, extraction, inference)
  3. Defences: Countermeasures against attacks, including preprocessors, postprocessors, trainers, and transformers
  4. Metrics & Evaluations: Tools for measuring model robustness and evaluating defense effectiveness

The library uses abstract base classes to define contracts that concrete implementations must follow, enabling easy extension with new attacks, defenses, or model types.

Component Diagram

classDiagram
    class BaseEstimator {
        <<abstract>>
        +model
        +clip_values
        +preprocessing_defences
        +postprocessing_defences
        +predict(x)
        +fit(x, y)
        +input_shape
    }
    
    class LossGradientsMixin {
        <<abstract>>
        +loss_gradient(x, y)
    }
    
    class NeuralNetworkMixin {
        <<abstract>>
        +channels_first
        +get_activations(x, layer)
        +fit_generator(generator)
    }
    
    class Attack {
        <<abstract>>
        +estimator
        +attack_params
        +_estimator_requirements
        +set_params(kwargs)
    }
    
    class EvasionAttack {
        <<abstract>>
        +targeted
        +generate(x, y)
    }
    
    class PoisoningAttack {
        <<abstract>>
        +poison(x, y)
    }
    
    class ExtractionAttack {
        <<abstract>>
        +extract(x, y)
    }
    
    class InferenceAttack {
        <<abstract>>
        +infer(x, y)
    }
    
    class Preprocessor {
        <<abstract>>
        +is_fitted
        +apply_fit
        +apply_predict
        +__call__(x, y)
        +estimate_gradient(x, grad)
    }
    
    class Postprocessor {
        <<abstract>>
        +is_fitted
        +apply_fit
        +apply_predict
        +__call__(preds)
    }
    
    class Trainer {
        <<abstract>>
        +classifier
        +fit(x, y)
    }
    
    BaseEstimator <|-- LossGradientsMixin
    BaseEstimator <|-- NeuralNetworkMixin
    Attack <|-- EvasionAttack
    Attack <|-- PoisoningAttack
    Attack <|-- ExtractionAttack
    Attack <|-- InferenceAttack
    Attack --> BaseEstimator : targets
    Preprocessor --> BaseEstimator : defends
    Postprocessor --> BaseEstimator : defends
    Trainer --> BaseEstimator : trains

Core Modules

art/estimators/: Model wrappers providing unified interfaces across ML frameworks

art/attacks/: Adversarial attack implementations

art/defences/: Defense mechanisms against adversarial attacks

art/metrics/: Evaluation metrics for robustness assessment

Module Relationships:


Setup and Installation Guide

Prerequisites

Installation Steps

  1. Clone the repository:
    git clone https://github.com/Trusted-AI/adversarial-robustness-toolbox.git
    cd adversarial-robustness-toolbox
    
  2. Set up the environment:
    # Using venv
    python -m venv art_env
    source art_env/bin/activate  # On Windows: art_env\Scripts\activate
       
    # Or using conda
    conda create -n art_env python=3.10
    conda activate art_env
    
  3. Install dependencies:
    # Minimal installation (core dependencies only)
    pip install -e .
       
    # Install with specific framework support
    pip install -e .[pytorch]        # PyTorch support
    pip install -e .[tensorflow]     # TensorFlow support
    pip install -e .[keras]          # Keras support
       
    # Install with all dependencies (for development)
    pip install -r requirements_test.txt
       
    # Install specific extras
    pip install -e .[pytorch_image]  # PyTorch with image processing
    pip install -e .[tensorflow_audio]  # TensorFlow with audio processing
    
  4. Configure the project:
    # No additional configuration needed for basic usage
    # Optional: Set up TensorBoard for attack visualization
    export TENSORBOARD_LOGDIR=./runs
    
  5. Verify installation:
    # Run a simple test
    python -c "import art; print(art.__version__)"
       
    # Run unit tests (requires test dependencies)
    pytest tests/ -v
       
    # Run a quick example
    python examples/get_started_pytorch.py
    

Environment Variables

Variable Description Default Required
ART_DATA_PATH Path for storing datasets ~/.art/data No
TENSORBOARD_LOGDIR TensorBoard log directory ./runs No
CUDA_VISIBLE_DEVICES GPU device selection All GPUs No

Common Setup Issues

Issue: ImportError for framework-specific modules

Issue: CUDA out of memory errors

Issue: Slow attack generation

Issue: Version conflicts with existing packages


Key Concepts and Domain Knowledge

Research Background

Adversarial machine learning studies the vulnerability of ML models to malicious inputs designed to cause misclassification or extract sensitive information. ART addresses four main threat categories:

  1. Evasion Attacks: Crafting inputs at test time to fool trained models (e.g., adversarial examples)
  2. Poisoning Attacks: Manipulating training data to compromise model behavior (e.g., backdoor attacks)
  3. Extraction Attacks: Stealing model functionality or architecture through queries
  4. Inference Attacks: Extracting sensitive information about training data or model internals

Core Algorithms

Fast Gradient Sign Method (FGSM)

Projected Gradient Descent (PGD)

Carlini & Wagner (C&W) Attack

DeepFool

Adversarial Training

Algorithm References

The codebase implements algorithms from numerous research papers. Key references found in code:

Data Structures

Estimator Wrapper Pattern

Attack Parameters

Clip Values

Mathematical Foundations

Lp Norms

Loss Functions

Gradient-Based Optimization

Paper Implementations

This codebase implements numerous research papers. Key implementations include:


Code Walkthrough of Critical Components

Entry Points

Main Package Initialization: art/__init__.py

Example Scripts: examples/ directory

Main execution flow:

1. Load/create ML model
2. Wrap model in ART estimator (e.g., PyTorchClassifier)
3. Create attack instance with estimator
4. Generate adversarial examples using attack.generate()
5. Evaluate model on clean and adversarial examples

Component 1: BaseEstimator

Location: art/estimators/estimator.py:38-365

Purpose: Abstract base class defining the interface all model wrappers must implement. Provides core functionality for preprocessing, postprocessing, and parameter management.

Key Methods:

Example Usage:

from art.estimators.classification import PyTorchClassifier
import torch.nn as nn
import torch.optim as optim

# Define model
model = nn.Sequential(
    nn.Conv2d(1, 32, 3),
    nn.ReLU(),
    nn.Flatten(),
    nn.Linear(26*26*32, 10)
)

# Wrap in ART estimator
classifier = PyTorchClassifier(
    model=model,
    loss=nn.CrossEntropyLoss(),
    optimizer=optim.Adam(model.parameters(), lr=0.01),
    input_shape=(1, 28, 28),
    nb_classes=10,
    clip_values=(0.0, 1.0)
)

# Use estimator
predictions = classifier.predict(x_test)
classifier.fit(x_train, y_train, batch_size=64, nb_epochs=10)

Component 2: Attack Base Classes

Location: art/attacks/attack.py:93-559

Purpose: Define abstract interfaces for all attack types. Enforce requirements on target estimators and provide common functionality for parameter management.

Key Methods:

Example Usage:

from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import PyTorchClassifier

# Create attack
attack = FastGradientMethod(
    estimator=classifier,
    eps=0.3,           # Perturbation budget
    norm=np.inf,       # L-infinity norm
    targeted=False,    # Untargeted attack
    batch_size=32
)

# Generate adversarial examples
x_adv = attack.generate(x=x_test)

# Evaluate attack success
predictions_clean = classifier.predict(x_test)
predictions_adv = classifier.predict(x_adv)
accuracy_clean = np.mean(np.argmax(predictions_clean, axis=1) == np.argmax(y_test, axis=1))
accuracy_adv = np.mean(np.argmax(predictions_adv, axis=1) == np.argmax(y_test, axis=1))
print(f"Clean accuracy: {accuracy_clean:.2%}, Adversarial accuracy: {accuracy_adv:.2%}")

Component 3: Preprocessor Defenses

Location: art/defences/preprocessor/preprocessor.py:35-335

Purpose: Abstract base class for input preprocessing defenses that transform data before it reaches the model. Supports gradient estimation for differentiable defenses.

Key Methods:

Example Usage:

from art.defences.preprocessor import JpegCompression
from art.estimators.classification import PyTorchClassifier

# Create preprocessing defense
jpeg_defense = JpegCompression(
    clip_values=(0.0, 1.0),
    quality=50,  # JPEG quality parameter
    apply_fit=False,
    apply_predict=True
)

# Attach to estimator
classifier = PyTorchClassifier(
    model=model,
    loss=criterion,
    optimizer=optimizer,
    input_shape=(3, 32, 32),
    nb_classes=10,
    preprocessing_defences=jpeg_defense
)

# Defense is automatically applied during prediction
predictions = classifier.predict(x_test)  # JPEG compression applied internally

Data Flow

Training Flow:

Raw Training Data
    ↓
Preprocessing Defenses (if apply_fit=True)
    ↓
Model Training (fit method)
    ↓
Trained Model

Inference Flow:

Raw Input
    ↓
Preprocessing Defenses (if apply_predict=True)
    ↓
Model Prediction
    ↓
Postprocessing Defenses (if apply_predict=True)
    ↓
Final Predictions

Attack Flow:

Clean Examples
    ↓
Attack Algorithm (generate/poison/extract/infer)
    ↓
Query Target Model (via estimator interface)
    ↓
Compute Gradients (if white-box) or Observe Outputs (if black-box)
    ↓
Update Adversarial Examples
    ↓
Repeat until convergence or max iterations
    ↓
Final Adversarial Examples

Common Workflows and Usage Examples

Workflow 1: Evaluating Model Robustness with Evasion Attacks

Purpose: Assess how vulnerable a trained classifier is to adversarial examples

Steps:

  1. Load or train a classifier and wrap it in an ART estimator
  2. Create one or more evasion attacks (FGSM, PGD, C&W, etc.)
  3. Generate adversarial examples on test data
  4. Evaluate model accuracy on both clean and adversarial examples
  5. Compare robustness across different attack strengths

Example:

import numpy as np
from art.attacks.evasion import FastGradientMethod, ProjectedGradientDescent
from art.estimators.classification import PyTorchClassifier
from art.utils import load_mnist

# Load data
(x_train, y_train), (x_test, y_test), min_val, max_val = load_mnist()
x_train = np.transpose(x_train, (0, 3, 1, 2)).astype(np.float32)
x_test = np.transpose(x_test, (0, 3, 1, 2)).astype(np.float32)

# Create and wrap model
classifier = PyTorchClassifier(
    model=model,
    loss=criterion,
    optimizer=optimizer,
    input_shape=(1, 28, 28),
    nb_classes=10,
    clip_values=(min_val, max_val)
)

# Train model
classifier.fit(x_train, y_train, batch_size=128, nb_epochs=10)

# Evaluate on clean data
predictions_clean = classifier.predict(x_test)
accuracy_clean = np.mean(np.argmax(predictions_clean, axis=1) == np.argmax(y_test, axis=1))
print(f"Clean accuracy: {accuracy_clean:.2%}")

# Test multiple attacks
attacks = {
    'FGSM': FastGradientMethod(classifier, eps=0.3),
    'PGD': ProjectedGradientDescent(classifier, eps=0.3, eps_step=0.01, max_iter=40)
}

for attack_name, attack in attacks.items():
    x_adv = attack.generate(x=x_test)
    predictions_adv = classifier.predict(x_adv)
    accuracy_adv = np.mean(np.argmax(predictions_adv, axis=1) == np.argmax(y_test, axis=1))
    print(f"{attack_name} accuracy: {accuracy_adv:.2%}")

Workflow 2: Implementing Adversarial Training Defense

Purpose: Train a robust model by augmenting training data with adversarial examples

Steps:

  1. Create a classifier wrapped in an ART estimator
  2. Define an attack to generate adversarial training examples
  3. Use AdversarialTrainer to train the model with adversarial augmentation
  4. Evaluate robustness improvement on test set

Example:

from art.defences.trainer import AdversarialTrainer
from art.attacks.evasion import ProjectedGradientDescent
from art.estimators.classification import PyTorchClassifier

# Create classifier
classifier = PyTorchClassifier(
    model=model,
    loss=criterion,
    optimizer=optimizer,
    input_shape=(1, 28, 28),
    nb_classes=10,
    clip_values=(0.0, 1.0)
)

# Define attack for training
attack = ProjectedGradientDescent(
    classifier,
    eps=0.3,
    eps_step=0.01,
    max_iter=40,
    targeted=False
)

# Create adversarial trainer
adv_trainer = AdversarialTrainer(classifier, attacks=attack, ratio=0.5)

# Train with adversarial examples
adv_trainer.fit(x_train, y_train, nb_epochs=10, batch_size=128)

# Evaluate robustness
x_test_adv = attack.generate(x=x_test)
predictions = classifier.predict(x_test_adv)
accuracy = np.mean(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1))
print(f"Adversarial accuracy after training: {accuracy:.2%}")

Workflow 3: Detecting Backdoor Poisoning Attacks

Purpose: Identify if a model has been compromised by a backdoor attack during training

Steps:

  1. Train a model on potentially poisoned data
  2. Use Neural Cleanse or Activation Clustering defense to detect backdoors
  3. Analyze detected triggers and suspicious samples
  4. Optionally retrain on cleaned data

Example:

from art.defences.detector.poison import ActivationDefence
from art.estimators.classification import KerasClassifier

# Assume we have a trained model on potentially poisoned data
classifier = KerasClassifier(model=model, clip_values=(0.0, 1.0))

# Create activation defense
defense = ActivationDefence(classifier, x_train, y_train)

# Detect poisoned samples
report, is_clean_lst = defense.detect_poison(nb_clusters=2, nb_dims=10, reduce='PCA')

# Analyze results
poisoned_indices = np.where(is_clean_lst == 0)[0]
print(f"Detected {len(poisoned_indices)} poisoned samples out of {len(x_train)}")

# Get clean data
x_train_clean = x_train[is_clean_lst == 1]
y_train_clean = y_train[is_clean_lst == 1]

# Retrain on clean data
classifier.fit(x_train_clean, y_train_clean, nb_epochs=10, batch_size=128)

Testing and Debugging

Running tests:

# Run all tests
pytest tests/ -v

# Run specific test module
pytest tests/attacks/test_fast_gradient.py -v

# Run tests for specific framework
pytest tests/estimators/classification/test_pytorch.py -v

# Run with coverage
pytest tests/ --cov=art --cov-report=html

# Run specific test function
pytest tests/attacks/test_fast_gradient.py::TestFastGradientMethod::test_generate -v

Debugging tips:


Known Incomplete Components

This section documents areas of the codebase that are incomplete or under development. Understanding these gaps helps new developers avoid confusion and identify potential contribution opportunities.

Pass-Only Classes

No classes containing only pass statements were found in the codebase. All classes have at least minimal implementations.

NotImplementedError Sites

Methods and functions that explicitly raise NotImplementedError, marking planned but unimplemented functionality:

Abstract Base Class Methods (Expected)

Attack Base Classes (Expected)

Defence Base Classes (Expected)

Framework-Specific Limitations

Optimization Limitations

Model-Specific Limitations

TODO/FIXME/XXX/HACK Comments

Development notes and known issues documented in code comments:

Performance Optimizations Needed

Algorithm Improvements

Feature Enhancements

Code Quality

Commented Out Code

Unimplemented Abstract Methods

All abstract methods in base classes are properly marked with @abc.abstractmethod and raise NotImplementedError. Concrete implementations are expected to override these methods. The architecture is well-designed with clear contracts between base classes and implementations.

Key Extension Points for New Contributions:

  1. New attack types: Inherit from appropriate attack base class (EvasionAttack, PoisoningAttack, etc.)
  2. New defenses: Inherit from Preprocessor, Postprocessor, or Trainer
  3. New estimators: Inherit from BaseEstimator and appropriate mixins
  4. Framework support: Implement framework-specific versions in corresponding modules

Next Steps

For new developers:

  1. Start with the examples in examples/ directory to understand basic workflows
  2. Read through the architecture overview and base class documentation
  3. Run existing tests to verify your environment setup
  4. Try modifying an existing attack or defense to understand the codebase structure

Recommended learning path:

  1. Week 1: Understand estimator wrappers and run basic examples
    • Study art/estimators/estimator.py and framework-specific implementations
    • Run examples/get_started_pytorch.py and examples/mnist_cnn_fgsm.py
    • Experiment with different attack parameters
  2. Week 2: Deep dive into attack implementations
    • Study art/attacks/attack.py base classes
    • Examine FGSM and PGD implementations in detail
    • Implement a simple custom attack
  3. Week 3: Explore defense mechanisms
    • Study preprocessor and postprocessor base classes
    • Understand adversarial training workflow
    • Test different defense combinations
  4. Week 4: Contribute to the project
    • Identify an incomplete component or TODO item
    • Implement a new attack/defense from a recent paper
    • Add tests and documentation for your contribution

How to contribute:

  1. Read CONTRIBUTING.md for contribution guidelines
  2. Follow PEP 8 coding standards
  3. Sign off commits with DCO: git commit -s -m "message"
  4. Provide unit tests with 80%+ coverage for new features
  5. Submit pull requests through GitHub
  6. Join the Slack channel for discussions: https://ibm-art.slack.com

Contribution Ideas:


Additional Resources

Documentation:

External References:

Community:

Related Projects:


This onboarding guide was generated to help developers quickly understand and contribute to the Adversarial Robustness Toolbox. For questions or clarifications, please refer to the project maintainers through GitHub issues or the Slack channel.