Adversary Knowledge
- White-box = adversary has complete knowledge of the targeted model, including its parameter values, architecture, training method and in some cases its training data
- Black-box = adversary has no knowledge about the ML model except input output samples of training data or input output pairings obtained using the target model as an oracle
Threat model for black-box attacks
- Adversarial capabilities = adversary has no knowledge of
- Training data
- Model architecture
- Model parameters
- Model scores
- However it can do query and output of the model. Example: attack models in GCP, AWS
- Adversarial goal = force a ML model remotely accessible through and API to misclassify (AWS, GCP)
Black-box attack steps
- Adversary queries remote ML system (AWS) and obtain labels of a set of inputs
- Advesary uses this labeled data to train a local substitute for the remote system
- Adversary selects new inputs for queries tpo the remote ML system
- Adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system
Adversarial Example Transferability
- Transferability = ability to attack, crafted against a surrogate model, to be effective against a different, unknown target model
- Adversarial examples have a transferability property: the ability of an adversarial example to remain effective even for the models other than the one used to generate it
Transferability Example
- Adversarial samples crafted to mislead model A are likely to mislead model B
- Intra-technique transferability = models A and B are both trained using the same ML technique,
- Cross-technique transferability = models A and B are different
Results of ML as a Service Systems
Train a substitute model with data obtained from a limited number of queries of target model, and use model to generate adversarial examples, and test on the remote platforms
Takeaways from Black-box attacks
- Both intra-technique and cross technique adversarial sample transferability are consistently strong phenomena across the space of ML
- Black-box attacks are possible in practical settings against any unknown ML classifier
- Black-box attacks against classifiers hotsted by Amazon and Google achieve high misclassfication rate, by training a substitute model
Backdoor Attacks on Neural Networks
Machine Learning pipeline and backdoor (trojan attack)
Definition of backdoor attacks:
- Hidden patterns trained into DNN models.
- Triggered by specific inputs, causing unexpected behavior.
Backdoor/Trojan Trigger:
- Small input data leading to target label generation.
Trojan Target Label:
- Desired output from the backdoor attack.
Backdoor / Trojaning Attack
Attacker Capabilities:
- Label manipulation.
- Input feature corruption.
Attacker Goals:
- Availability poison: Lower accuracy on benign inputs.
- Integrity poison: Misclassification on specific inputs.
Poisoning Attack
Attacker Capabilities
- Label Manipulation: The attacker can modify the training labels only
- Input Manipulation: The attacker is more powerful and can corrupt the input features of training points
Attacker Goals
- Availability poison: Decrease accuracy on benign input
- Integrity poison: Misclassifications on certain inputs
Poisoning Attack vs Backdoor Attack
Poisoning attacks:
- Availability poison: Decrease accuracy on benign input
- Integrity poison: Misclassifications on certain inputs
Backdoor attacks:
- Is a type of Integrity poison
- The key difference is trigger
- DNN backdoor is a hidden pattern trained into a DNN, which produces unexpected behavior if and only if a specific triggeris added to an input
Backdoor Attacks
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
Trojaning Attack on Neural Networks
References
eof