Research Code Onboarding with IBM Bob
The Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security hosted by the Linux Foundation AI & Data Foundation. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against adversarial threats.
Research Domain: Machine Learning Security, Adversarial Machine Learning, AI Safety
Primary Goals:
Key Technologies: Python 3.10+, NumPy, SciPy, scikit-learn, TensorFlow, PyTorch, Keras
Target Audience: ML security researchers, adversarial ML practitioners, AI safety engineers, and developers building robust ML systems
ART follows a modular architecture built around four core pillars that mirror the adversarial ML threat landscape:
The library uses abstract base classes to define contracts that concrete implementations must follow, enabling easy extension with new attacks, defenses, or model types.
classDiagram
class BaseEstimator {
<<abstract>>
+model
+clip_values
+preprocessing_defences
+postprocessing_defences
+predict(x)
+fit(x, y)
+input_shape
}
class LossGradientsMixin {
<<abstract>>
+loss_gradient(x, y)
}
class NeuralNetworkMixin {
<<abstract>>
+channels_first
+get_activations(x, layer)
+fit_generator(generator)
}
class Attack {
<<abstract>>
+estimator
+attack_params
+_estimator_requirements
+set_params(kwargs)
}
class EvasionAttack {
<<abstract>>
+targeted
+generate(x, y)
}
class PoisoningAttack {
<<abstract>>
+poison(x, y)
}
class ExtractionAttack {
<<abstract>>
+extract(x, y)
}
class InferenceAttack {
<<abstract>>
+infer(x, y)
}
class Preprocessor {
<<abstract>>
+is_fitted
+apply_fit
+apply_predict
+__call__(x, y)
+estimate_gradient(x, grad)
}
class Postprocessor {
<<abstract>>
+is_fitted
+apply_fit
+apply_predict
+__call__(preds)
}
class Trainer {
<<abstract>>
+classifier
+fit(x, y)
}
BaseEstimator <|-- LossGradientsMixin
BaseEstimator <|-- NeuralNetworkMixin
Attack <|-- EvasionAttack
Attack <|-- PoisoningAttack
Attack <|-- ExtractionAttack
Attack <|-- InferenceAttack
Attack --> BaseEstimator : targets
Preprocessor --> BaseEstimator : defends
Postprocessor --> BaseEstimator : defends
Trainer --> BaseEstimator : trains
art/estimators/: Model wrappers providing unified interfaces across ML frameworks
estimator.py: Base classes (BaseEstimator, LossGradientsMixin, NeuralNetworkMixin, DecisionTreeMixin)classification/: Classifiers for various frameworks (PyTorch, TensorFlow, Keras, scikit-learn, XGBoost, LightGBM, CatBoost)object_detection/: Object detection models (Faster R-CNN, YOLO, DETR)speech_recognition/: Speech recognition models (DeepSpeech, Espresso)certification/: Certified defense estimators (randomized smoothing, derandomized smoothing)art/attacks/: Adversarial attack implementations
attack.py: Base attack classes defining the attack interfaceevasion/: White-box and black-box evasion attacks (FGSM, PGD, C&W, DeepFool, etc.)poisoning/: Data poisoning attacks (backdoor attacks, clean-label attacks)extraction/: Model extraction attacks (copycat, knockoff nets)inference/: Privacy attacks (membership inference, attribute inference, model inversion)art/defences/: Defense mechanisms against adversarial attacks
preprocessor/: Input preprocessing defenses (feature squeezing, JPEG compression, spatial smoothing)postprocessor/: Output postprocessing defenses (high confidence, reverse sigmoid)trainer/: Adversarial training methods (standard, TRADES, AWP)detector/: Adversarial example detectorstransformer/: Model transformation defensesart/metrics/: Evaluation metrics for robustness assessment
metrics.py: Empirical robustness, CLEVER score, loss sensitivityprivacy/: Privacy leakage metricsModule Relationships:
git clone https://github.com/Trusted-AI/adversarial-robustness-toolbox.git
cd adversarial-robustness-toolbox
# Using venv
python -m venv art_env
source art_env/bin/activate # On Windows: art_env\Scripts\activate
# Or using conda
conda create -n art_env python=3.10
conda activate art_env
# Minimal installation (core dependencies only)
pip install -e .
# Install with specific framework support
pip install -e .[pytorch] # PyTorch support
pip install -e .[tensorflow] # TensorFlow support
pip install -e .[keras] # Keras support
# Install with all dependencies (for development)
pip install -r requirements_test.txt
# Install specific extras
pip install -e .[pytorch_image] # PyTorch with image processing
pip install -e .[tensorflow_audio] # TensorFlow with audio processing
# No additional configuration needed for basic usage
# Optional: Set up TensorBoard for attack visualization
export TENSORBOARD_LOGDIR=./runs
# Run a simple test
python -c "import art; print(art.__version__)"
# Run unit tests (requires test dependencies)
pytest tests/ -v
# Run a quick example
python examples/get_started_pytorch.py
| Variable | Description | Default | Required |
|---|---|---|---|
ART_DATA_PATH |
Path for storing datasets | ~/.art/data |
No |
TENSORBOARD_LOGDIR |
TensorBoard log directory | ./runs |
No |
CUDA_VISIBLE_DEVICES |
GPU device selection | All GPUs | No |
Issue: ImportError for framework-specific modules
pip install adversarial-robustness-toolbox[pytorch]Issue: CUDA out of memory errors
Issue: Slow attack generation
Issue: Version conflicts with existing packages
requirements_test.txt for compatible versionsAdversarial machine learning studies the vulnerability of ML models to malicious inputs designed to cause misclassification or extract sensitive information. ART addresses four main threat categories:
Fast Gradient Sign Method (FGSM)
art/attacks/evasion/fast_gradient.pyProjected Gradient Descent (PGD)
art/attacks/evasion/projected_gradient_descent/Carlini & Wagner (C&W) Attack
art/attacks/evasion/carlini.pyDeepFool
art/attacks/evasion/deepfool.pyAdversarial Training
art/defences/trainer/adversarial_trainer.pyThe codebase implements algorithms from numerous research papers. Key references found in code:
Estimator Wrapper Pattern
predict(), fit(), loss_gradient()Attack Parameters
attack_params list specifying configurable parameterseps (perturbation budget), norm (distance metric), targeted (attack type)Clip Values
(min, max) defining valid input range(0.0, 1.0) for normalized imagesLp Norms
Loss Functions
Gradient-Based Optimization
This codebase implements numerous research papers. Key implementations include:
Main Package Initialization: art/__init__.py
__version__ = "1.20.1"Example Scripts: examples/ directory
get_started_pytorch.py: Basic PyTorch workflow with FGSM attackget_started_tensorflow_v2.py: TensorFlow 2.x examplemnist_cnn_fgsm.py: MNIST classification with FGSM evaluationMain execution flow:
1. Load/create ML model
2. Wrap model in ART estimator (e.g., PyTorchClassifier)
3. Create attack instance with estimator
4. Generate adversarial examples using attack.generate()
5. Evaluate model on clean and adversarial examples
Location: art/estimators/estimator.py:38-365
Purpose: Abstract base class defining the interface all model wrappers must implement. Provides core functionality for preprocessing, postprocessing, and parameter management.
Key Methods:
predict(x, **kwargs): Perform prediction on input samplesfit(x, y, **kwargs): Train the model on provided datainput_shape: Property returning shape of one input sample_apply_preprocessing(x, y, fit): Apply preprocessing defenses to inputs_apply_postprocessing(preds, fit): Apply postprocessing defenses to predictionsset_params(**kwargs): Update estimator parametersget_params(): Retrieve all estimator parametersExample Usage:
from art.estimators.classification import PyTorchClassifier
import torch.nn as nn
import torch.optim as optim
# Define model
model = nn.Sequential(
nn.Conv2d(1, 32, 3),
nn.ReLU(),
nn.Flatten(),
nn.Linear(26*26*32, 10)
)
# Wrap in ART estimator
classifier = PyTorchClassifier(
model=model,
loss=nn.CrossEntropyLoss(),
optimizer=optim.Adam(model.parameters(), lr=0.01),
input_shape=(1, 28, 28),
nb_classes=10,
clip_values=(0.0, 1.0)
)
# Use estimator
predictions = classifier.predict(x_test)
classifier.fit(x_train, y_train, batch_size=64, nb_epochs=10)
Location: art/attacks/attack.py:93-559
Purpose: Define abstract interfaces for all attack types. Enforce requirements on target estimators and provide common functionality for parameter management.
Key Methods:
__init__(estimator, summary_writer): Initialize attack with target estimatoris_estimator_valid(estimator, requirements): Check if estimator satisfies attack requirementsset_params(**kwargs): Update attack parametersgenerate(x, y) (EvasionAttack): Generate adversarial examplespoison(x, y) (PoisoningAttack): Generate poisoned training dataextract(x, y) (ExtractionAttack): Extract model copyinfer(x, y) (InferenceAttack): Infer sensitive informationExample Usage:
from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import PyTorchClassifier
# Create attack
attack = FastGradientMethod(
estimator=classifier,
eps=0.3, # Perturbation budget
norm=np.inf, # L-infinity norm
targeted=False, # Untargeted attack
batch_size=32
)
# Generate adversarial examples
x_adv = attack.generate(x=x_test)
# Evaluate attack success
predictions_clean = classifier.predict(x_test)
predictions_adv = classifier.predict(x_adv)
accuracy_clean = np.mean(np.argmax(predictions_clean, axis=1) == np.argmax(y_test, axis=1))
accuracy_adv = np.mean(np.argmax(predictions_adv, axis=1) == np.argmax(y_test, axis=1))
print(f"Clean accuracy: {accuracy_clean:.2%}, Adversarial accuracy: {accuracy_adv:.2%}")
Location: art/defences/preprocessor/preprocessor.py:35-335
Purpose: Abstract base class for input preprocessing defenses that transform data before it reaches the model. Supports gradient estimation for differentiable defenses.
Key Methods:
__call__(x, y): Apply preprocessing to inputs and labelsfit(x, y, **kwargs): Fit preprocessor parameters if neededestimate_gradient(x, grad): Estimate gradient through preprocessing (for BPDA)forward(x, y) (framework-specific): Apply preprocessing in native frameworkestimate_forward(x, y) (framework-specific): Differentiable approximation for gradient computationExample Usage:
from art.defences.preprocessor import JpegCompression
from art.estimators.classification import PyTorchClassifier
# Create preprocessing defense
jpeg_defense = JpegCompression(
clip_values=(0.0, 1.0),
quality=50, # JPEG quality parameter
apply_fit=False,
apply_predict=True
)
# Attach to estimator
classifier = PyTorchClassifier(
model=model,
loss=criterion,
optimizer=optimizer,
input_shape=(3, 32, 32),
nb_classes=10,
preprocessing_defences=jpeg_defense
)
# Defense is automatically applied during prediction
predictions = classifier.predict(x_test) # JPEG compression applied internally
Training Flow:
Raw Training Data
↓
Preprocessing Defenses (if apply_fit=True)
↓
Model Training (fit method)
↓
Trained Model
Inference Flow:
Raw Input
↓
Preprocessing Defenses (if apply_predict=True)
↓
Model Prediction
↓
Postprocessing Defenses (if apply_predict=True)
↓
Final Predictions
Attack Flow:
Clean Examples
↓
Attack Algorithm (generate/poison/extract/infer)
↓
Query Target Model (via estimator interface)
↓
Compute Gradients (if white-box) or Observe Outputs (if black-box)
↓
Update Adversarial Examples
↓
Repeat until convergence or max iterations
↓
Final Adversarial Examples
Purpose: Assess how vulnerable a trained classifier is to adversarial examples
Steps:
Example:
import numpy as np
from art.attacks.evasion import FastGradientMethod, ProjectedGradientDescent
from art.estimators.classification import PyTorchClassifier
from art.utils import load_mnist
# Load data
(x_train, y_train), (x_test, y_test), min_val, max_val = load_mnist()
x_train = np.transpose(x_train, (0, 3, 1, 2)).astype(np.float32)
x_test = np.transpose(x_test, (0, 3, 1, 2)).astype(np.float32)
# Create and wrap model
classifier = PyTorchClassifier(
model=model,
loss=criterion,
optimizer=optimizer,
input_shape=(1, 28, 28),
nb_classes=10,
clip_values=(min_val, max_val)
)
# Train model
classifier.fit(x_train, y_train, batch_size=128, nb_epochs=10)
# Evaluate on clean data
predictions_clean = classifier.predict(x_test)
accuracy_clean = np.mean(np.argmax(predictions_clean, axis=1) == np.argmax(y_test, axis=1))
print(f"Clean accuracy: {accuracy_clean:.2%}")
# Test multiple attacks
attacks = {
'FGSM': FastGradientMethod(classifier, eps=0.3),
'PGD': ProjectedGradientDescent(classifier, eps=0.3, eps_step=0.01, max_iter=40)
}
for attack_name, attack in attacks.items():
x_adv = attack.generate(x=x_test)
predictions_adv = classifier.predict(x_adv)
accuracy_adv = np.mean(np.argmax(predictions_adv, axis=1) == np.argmax(y_test, axis=1))
print(f"{attack_name} accuracy: {accuracy_adv:.2%}")
Purpose: Train a robust model by augmenting training data with adversarial examples
Steps:
Example:
from art.defences.trainer import AdversarialTrainer
from art.attacks.evasion import ProjectedGradientDescent
from art.estimators.classification import PyTorchClassifier
# Create classifier
classifier = PyTorchClassifier(
model=model,
loss=criterion,
optimizer=optimizer,
input_shape=(1, 28, 28),
nb_classes=10,
clip_values=(0.0, 1.0)
)
# Define attack for training
attack = ProjectedGradientDescent(
classifier,
eps=0.3,
eps_step=0.01,
max_iter=40,
targeted=False
)
# Create adversarial trainer
adv_trainer = AdversarialTrainer(classifier, attacks=attack, ratio=0.5)
# Train with adversarial examples
adv_trainer.fit(x_train, y_train, nb_epochs=10, batch_size=128)
# Evaluate robustness
x_test_adv = attack.generate(x=x_test)
predictions = classifier.predict(x_test_adv)
accuracy = np.mean(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1))
print(f"Adversarial accuracy after training: {accuracy:.2%}")
Purpose: Identify if a model has been compromised by a backdoor attack during training
Steps:
Example:
from art.defences.detector.poison import ActivationDefence
from art.estimators.classification import KerasClassifier
# Assume we have a trained model on potentially poisoned data
classifier = KerasClassifier(model=model, clip_values=(0.0, 1.0))
# Create activation defense
defense = ActivationDefence(classifier, x_train, y_train)
# Detect poisoned samples
report, is_clean_lst = defense.detect_poison(nb_clusters=2, nb_dims=10, reduce='PCA')
# Analyze results
poisoned_indices = np.where(is_clean_lst == 0)[0]
print(f"Detected {len(poisoned_indices)} poisoned samples out of {len(x_train)}")
# Get clean data
x_train_clean = x_train[is_clean_lst == 1]
y_train_clean = y_train[is_clean_lst == 1]
# Retrain on clean data
classifier.fit(x_train_clean, y_train_clean, nb_epochs=10, batch_size=128)
Running tests:
# Run all tests
pytest tests/ -v
# Run specific test module
pytest tests/attacks/test_fast_gradient.py -v
# Run tests for specific framework
pytest tests/estimators/classification/test_pytorch.py -v
# Run with coverage
pytest tests/ --cov=art --cov-report=html
# Run specific test function
pytest tests/attacks/test_fast_gradient.py::TestFastGradientMethod::test_generate -v
Debugging tips:
import logging; logging.basicConfig(level=logging.DEBUG)summary_writer=True to attack constructorsAttack.is_estimator_valid()estimator.input_shapeestimator.loss_gradient() returns non-zero gradientsprojected_gradient_descent/ subdirectoriesThis section documents areas of the codebase that are incomplete or under development. Understanding these gaps helps new developers avoid confusion and identify potential contribution opportunities.
No classes containing only pass statements were found in the codebase. All classes have at least minimal implementations.
Methods and functions that explicitly raise NotImplementedError, marking planned but unimplemented functionality:
Abstract Base Class Methods (Expected)
art/estimators/estimator.py:190: BaseEstimator.clone_for_refitting() - Must be implemented by subclassesart/estimators/estimator.py:248: BaseEstimator.predict() - Abstract methodart/estimators/estimator.py:260: BaseEstimator.fit() - Abstract methodart/estimators/estimator.py:279: BaseEstimator.input_shape - Abstract propertyart/estimators/estimator.py:344: BaseEstimator.compute_loss() - Optional methodart/estimators/estimator.py:354: BaseEstimator.compute_loss_from_predictions() - Optional methodart/estimators/estimator.py:386: LossGradientsMixin.loss_gradient() - Abstract methodart/estimators/estimator.py:440: NeuralNetworkMixin.predict() - Abstract methodart/estimators/estimator.py:454: NeuralNetworkMixin.fit() - Abstract methodart/estimators/estimator.py:493: NeuralNetworkMixin.get_activations() - Abstract methodart/estimators/estimator.py:542: DecisionTreeMixin.get_trees() - Abstract methodAttack Base Classes (Expected)
art/attacks/attack.py:226: EvasionAttack.generate() - Must be implemented by concrete attacksart/attacks/attack.py:261: PoisoningAttack.poison() - Must be implemented by concrete attacksart/attacks/attack.py:291: PoisoningAttackGenerator.poison_estimator() - Must be implementedart/attacks/attack.py:325: PoisoningAttackTransformer.poison() - Must be implementedart/attacks/attack.py:335: PoisoningAttackTransformer.poison_estimator() - Must be implementedart/attacks/attack.py:368: PoisoningAttackObjectDetector.poison() - Must be implementedart/attacks/attack.py:392: PoisoningAttackBlackBox.poison() - Must be implementedart/attacks/attack.py:411: PoisoningAttackWhiteBox.poison() - Must be implementedart/attacks/attack.py:430: ExtractionAttack.extract() - Must be implementedart/attacks/attack.py:455: InferenceAttack.infer() - Must be implementedart/attacks/attack.py:485: AttributeInferenceAttack.infer() - Must be implementedart/attacks/attack.py:517: MembershipInferenceAttack.infer() - Must be implementedart/attacks/attack.py:551: ReconstructionAttack.reconstruct() - Must be implementedDefence Base Classes (Expected)
art/defences/preprocessor/preprocessor.py:91: Preprocessor.__call__() - Abstract methodart/defences/preprocessor/preprocessor.py:135: Preprocessor.forward() - Abstract methodart/defences/preprocessor/preprocessor.py:164: PreprocessorPyTorch.forward() - Abstract methodart/defences/preprocessor/preprocessor.py:261: PreprocessorTensorFlowV2.forward() - Abstract methodart/defences/postprocessor/postprocessor.py:82: Postprocessor.__call__() - Abstract methodart/defences/trainer/trainer.py:52: Trainer.fit() - Abstract methodFramework-Specific Limitations
art/attacks/poisoning/sleeper_agent_attack.py:172: SleeperAgentAttack only supports PyTorchart/attacks/poisoning/gradient_matching_attack.py:123: GradientMatchingAttack only supports PyTorchart/attacks/poisoning/adversarial_embedding_attack.py:187: AdversarialEmbeddingAttack only supports Kerasart/attacks/evasion/shadow_attack.py:248: ShadowAttack has framework-specific limitationsOptimization Limitations
art/attacks/evasion/projected_gradient_descent/projected_gradient_descent_pytorch.py:493: Weighted Lp ball projection not supported for finite normsart/attacks/evasion/projected_gradient_descent/projected_gradient_descent_pytorch.py:511: Finite norm_p >= 1 not supported with suboptimal=Falseart/attacks/evasion/projected_gradient_descent/projected_gradient_descent_pytorch.py:515: norm_p < 1 not supported with suboptimal=Falseart/attacks/evasion/projected_gradient_descent/projected_gradient_descent_tensorflow_v2.py:346: Momentum Iterative Attack disabled for TensorFlow (issue #2439)art/utils.py:540: Weighted Lp ball projection not supported for finite normsart/utils.py:555: norm_p > 1 (except 2 and inf) not supported with suboptimal=Falseart/utils.py:560: norm_p < 1 not supported with suboptimal=FalseModel-Specific Limitations
art/estimators/classification/xgboost.py:135: Some XGBoost functionality not implementedart/estimators/classification/scikitlearn.py:1244: SVM sigmoid kernel loss gradients not implementedart/estimators/object_detection/tensorflow_v2_faster_rcnn.py:238: Training mode doesn’t support loss_gradientart/estimators/object_detection/tensorflow_v2_faster_rcnn.py:316: Training mode doesn’t support predictionDevelopment notes and known issues documented in code comments:
Performance Optimizations Needed
art/attacks/evasion/imperceptible_asr/imperceptible_asr.py:845: TODO reduce for loop in masker computationart/attacks/evasion/pixel_threshold.py:1281: TODO: can be vectorized (parameter scaling)art/attacks/evasion/pixel_threshold.py:1333: TODO: can be vectorized (mutation)Algorithm Improvements
art/defences/detector/poison/activation_defence.py:801: TODO: address issue where if fewer samples than nb_dims this failsart/attacks/evasion/deepfool.py:118: TODO compute set of unique labels per batchart/attacks/evasion/brendel_bethge.py:2251: TODO: Implement more efficient search with breaking conditionart/attacks/evasion/brendel_bethge.py:2292: TODO: only perform forward pass on non-converged samplesFeature Enhancements
art/defences/detector/evasion/subsetscanning/scanner.py:139: TODO: some randomizing and only leave in a random number of rows of pvaluesart/estimators/poison_mitigation/neural_cleanse/neural_cleanse.py:211: TODO: explore different values for thresholdart/estimators/classification/hugging_face.py:326-329: TODO: refactor activation defence to not crash if non 2D inputs are providedCode Quality
art/attacks/evasion/graphite/graphite_whitebox_pytorch.py:496: TODO1 (unclear marker)art/attacks/evasion/pixel_threshold.py:495: TODO: Make the attack compatible with current version of SciPy OptimizeCommented Out Code
art/metrics/metrics.py:218: Commented TODO check if following computation is correctAll abstract methods in base classes are properly marked with @abc.abstractmethod and raise NotImplementedError. Concrete implementations are expected to override these methods. The architecture is well-designed with clear contracts between base classes and implementations.
Key Extension Points for New Contributions:
For new developers:
examples/ directory to understand basic workflowsRecommended learning path:
art/estimators/estimator.py and framework-specific implementationsexamples/get_started_pytorch.py and examples/mnist_cnn_fgsm.pyart/attacks/attack.py base classesHow to contribute:
CONTRIBUTING.md for contribution guidelinesgit commit -s -m "message"Contribution Ideas:
Documentation:
examples/ and notebooks/ directories in the repositoryExternal References:
Community:
Related Projects:
This onboarding guide was generated to help developers quickly understand and contribute to the Adversarial Robustness Toolbox. For questions or clarifications, please refer to the project maintainers through GitHub issues or the Slack channel.