Adversarial Examples
Inputs generated by adversaries to fool neural networks. Two types:
- Semantic based perturbations
- Restricted area to manipulate pixels
- Modify a specific area of the image
- Simulate real world scenarios
- Restricted area to manipulate pixels
- Pervasive Perturbations
- Full access to pixel alteration
- Modify the whole image
- Different distance metrics
- Full access to pixel alteration
Defense and Detections
Adversarial detections – determine whether input is adversarial or not.
Adversarial defenses – always produce the correct output (property) with a given input bound.
Feature Squeezing to detect adversarial
Input transformation technique that uses smoothing transformations of input features to undo adversarial perturbations. Its squeezing many images into a single image
Feature squeezing is a simple trick used to help detect when something tricky (called an adversarial attack) tries to confuse a machine learning model.
Imagine a machine learning model is like a person looking at a picture and saying, “This is a cat.” Sometimes, someone might mess with the picture by changing tiny details (like making the cat’s fur look weird in a way people can’t see easily) to trick the model into thinking it’s a dog or something else.
Feature squeezing is like looking at the picture in a simpler way, ignoring tiny details that might cause confusion. It works like this:
- Simplify the picture: Reduce the number of colors or make the image more basic. For example, instead of showing all the shades of green, it just shows “light green” or “dark green.”
- Compare results: The model looks at both the original picture and the simplified one. If the model suddenly gets confused or changes its answer between the two, it might mean the picture has been tampered with by an adversarial attack.
Think of it like squinting your eyes to see the “big picture” of something instead of getting distracted by small, tricky details. By doing this, feature squeezing helps figure out if someone is trying to fool the machine learning model!
Example of different types of perturbations
Attributed-steered detection of adversarial examples
This is mainly for image recognition
Neural Network Invariant Checking
Neural Network Invariant Checking is a method used to verify whether a neural network behaves as expected in certain situations. It checks invariants, which are rules or conditions that should always hold true during the network’s operation.
Neural Network Invariant Checking is a process used to verify whether a neural network satisfies specific properties or rules (called invariants) across its operation. This is crucial for ensuring reliability and safety in applications like autonomous driving, medical diagnosis, or any other critical systems.
What Are Invariants?
Invariants are properties or conditions that should always be true for a neural network. For example:
- For a classifier, if x belongs to class A, then the network should always assign the highest probability to class A.
- For a self-driving car, if the input image contains a red light, the output should always indicate “stop.”
Mathematically, an invariant can be expressed as a logical statement or a constraint:
P(x) => Q(f(x))
Where:
- P(x): A property about the input xx.
- f(x): The neural network function.
- Q(f(x)): A property about the output.
Provenance Invariant
Provenance Invariant refers to a property in systems, particularly in machine learning or data pipelines, ensuring that the origin, lineage, or history of data remains consistent and trustworthy across operations. This invariant guarantees that the transformations applied to the data do not alter or obscure its source and lineage in unintended ways.
Key Concepts
- Provenance:
- The history or origin of a piece of data, including where it came from, how it was created, and any transformations it has undergone.
- For example, in a machine learning pipeline, provenance includes information about the dataset source, preprocessing steps, and feature engineering.
- Invariant:
- A rule or condition that must hold true at all times during the system’s operation.
- A provenance invariant ensures that the lineage information of the data remains consistent and accurate throughout the system.
Why Is Provenance Important?
- Accountability: Ensures traceability of data for audits or compliance.
- Debugging: Identifies where issues in a pipeline originated.
- Security: Prevents unauthorized or unintended changes to data lineage.
- Fairness and Bias Detection: Ensures the source and handling of data are transparent.
Adversarial Sample Defenses
- Adversarial Training
- Gradient Masking
- Certified Adversarial
Adversarial Training Issues
- Requires small samples such as 32×32 images
- Only effective for tiny perturbations
- Training is 10-50x slower
- Does not work all the time, maybe 50%
Gradient Masking
A lot of defenses are based on gradient masking or obfuscated gradient.
Deliberately hide away the gradient or destroy the gradient so that gradient based attacks fail
Example techniques relying on gradient masking
- Distillation defense = changes the scaling of the last hidden layer in NN hindering the calculation of gradients
- Input preprocessing = desecretizes the images pixel values, or resizing and cropping, or smoothing
- Defense-GAN = uses GAN model to transform perturbed images into clean images
Certified Adversarial Robustness
- A certified adversarial defenes in l norm which scales to ImageNet
- Large random perturbations “drown out” small adversarial pertubations
References
eof