You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/1start/attacks4Components.md
+19-23Lines changed: 19 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,15 @@
1
-
Four Components of TextAttack Attacks
2
-
========================================
3
-
4
-
To unify adversarial attack methods into one system, We formulate an attack as consisting of four components: a **goal function** which determines if the attack has succeeded, **constraints** defining which perturbations are valid, a **transformation** that generates potential modifications given an input, and a **search method** which traverses through the search space of possible perturbations. The attack attempts to perturb an input text such that the model output fulfills the goal function (i.e., indicating whether the attack is successful) and the perturbation adheres to the set of constraints (e.g., grammar constraint, semantic similarity constraint). A search method is used to find a sequence of transformations that produce a successful adversarial example.
5
-
1
+
# Four Components of TextAttack Attacks
6
2
3
+
To unify adversarial attack methods into one system, We formulate an attack as consisting of four components: a **goal function** which determines if the attack has succeeded, **constraints** defining which perturbations are valid, a **transformation** that generates potential modifications given an input, and a **search method** which traverses through the search space of possible perturbations. The attack attempts to perturb an input text such that the model output fulfills the goal function (i.e., indicating whether the attack is successful) and the perturbation adheres to the set of constraints (e.g., grammar constraint, semantic similarity constraint). A search method is used to find a sequence of transformations that produce a successful adversarial example.
7
4
8
5
This modular design enables us to easily assemble attacks from the literature while re-using components that are shared across attacks. TextAttack provides clean, readable implementations of 16 adversarial attacks from the literature. For the first time, these attacks can be benchmarked, compared, and analyzed in a standardized setting.
9
6
10
-
11
7
- Two examples showing four components of two SOTA attacks
@@ -39,27 +34,20 @@ A `Transformation` takes as input an `AttackedText` and returns a list of possib
39
34
40
35
A `SearchMethod` takes as input an initial `GoalFunctionResult` and returns a final `GoalFunctionResult` The search is given access to the `get_transformations` function, which takes as input an `AttackedText` object and outputs a list of possible transformations filtered by meeting all of the attack’s constraints. A search consists of successive calls to `get_transformations` until the search succeeds (determined using `get_goal_results`) or is exhausted.
41
36
42
-
43
-
44
37
### On Benchmarking Attack Recipes
45
38
46
-
- Please read our analysis paper: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples at [EMNLP BlackBoxNLP](https://arxiv.org/abs/2009.06368).
47
-
48
-
- As we emphasized in the above paper, we don't recommend to directly compare Attack Recipes out of the box.
49
-
50
-
- This is due to that attack recipes in the recent literature used different ways or thresholds in setting up their constraints. Without the constraint space held constant, an increase in attack success rate could come from an improved search or a better transformation method or a less restrictive search space.
51
-
39
+
- Please read our analysis paper: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples at [EMNLP BlackBoxNLP](https://arxiv.org/abs/2009.06368).
52
40
41
+
- As we emphasized in the above paper, we don't recommend to directly compare Attack Recipes out of the box.
53
42
54
-
### Four components in Attack Recipes we have implemented
43
+
- This is due to that attack recipes in the recent literature used different ways or thresholds in setting up their constraints. Without the constraint space held constant, an increase in attack success rate could come from an improved search or a better transformation method or a less restrictive search space.
55
44
45
+
### Four components in Attack Recipes we have implemented
56
46
57
47
- TextAttack provides clean, readable implementations of 16 adversarial attacks from the literature.
58
48
59
49
- To run an attack recipe: `textattack attack --recipe [recipe_name]`
60
50
61
-
62
-
63
51
<tablestyle="width:100%"border="1">
64
52
<thead>
65
53
<trclass="header">
@@ -224,13 +212,21 @@ A `SearchMethod` takes as input an initial `GoalFunctionResult` and returns a fi
224
212
<td ><sub>Greedy attack with goal of changing every word in the output translation. Currently implemented as black-box with plans to change to white-box as done in paper (["Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples" (Cheng et al., 2018)](https://arxiv.org/abs/1803.01128)) </sub> </td>
<td><sub>Uses imperceptible character-level perturbations including homoglyph substitutions, Unicode reordering, deletions, and invisibles. Based on (["Bad Characters: Imperceptible NLP Attacks" (Boucher et al., 2021)](https://arxiv.org/abs/2106.09898)).</sub></td>
Copy file name to clipboardExpand all lines: docs/3recipes/attack_recipes_cmd.md
+23-10Lines changed: 23 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,23 +2,24 @@
2
2
3
3
We provide a number of pre-built attack recipes, which correspond to attacks from the literature.
4
4
5
-
6
5
## Help: `textattack --help`
7
6
8
7
TextAttack's main features can all be accessed via the `textattack` command. Two very
9
8
common commands are `textattack attack <args>`, and `textattack augment <args>`. You can see more
10
9
information about all commands using
10
+
11
11
```bash
12
12
textattack --help
13
13
```
14
+
14
15
or a specific command using, for example,
16
+
15
17
```bash
16
18
textattack attack --help
17
19
```
18
20
19
21
The [`examples/`](https://github.com/QData/TextAttack/tree/master/examples) folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file.
20
22
21
-
22
23
The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
23
24
24
25
## Running Attacks: `textattack attack --help`
@@ -29,17 +30,20 @@ The easiest way to try out an attack is via the command-line interface, `textatt
29
30
30
31
Here are some concrete examples:
31
32
32
-
*TextFooler on BERT trained on the MR sentiment classification dataset*:
33
+
_TextFooler on BERT trained on the MR sentiment classification dataset_:
@@ -55,7 +59,6 @@ We include attack recipes which implement attacks from the literature. You can l
55
59
56
60
To run an attack recipe: `textattack attack --recipe [recipe_name]`
57
61
58
-
59
62
<tablestyle="width:100%"border="1">
60
63
<thead>
61
64
<trclass="header">
@@ -220,23 +223,33 @@ To run an attack recipe: `textattack attack --recipe [recipe_name]`
220
223
<td ><sub>Greedy attack with goal of changing every word in the output translation. Currently implemented as black-box with plans to change to white-box as done in paper, from <ahref="https://arxiv.org/abs/1803.01128">"Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples" (Cheng et al., 2018)</a></sub> </td>
<td><sub>(Homoglyph, Invisible Characters, Reorderings, Deletions) Word Swap</sub> </td>
233
+
<td><sub>DifferentialEvolution</sub></td>
234
+
<td ><sub>Uses imperceptible character-level perturbations including homoglyph substitutions, Unicode reordering, deletions, and invisibles. Based on (["Bad Characters: Imperceptible NLP Attacks" (Boucher et al., 2021)](https://arxiv.org/abs/2106.09898)).</sub> </td>
235
+
</tr>
223
236
224
237
</tbody>
225
238
</font>
226
239
</table>
227
240
228
-
229
-
230
241
## Recipe Usage Examples
231
242
232
243
Here are some examples of testing attacks from the literature from the command-line:
0 commit comments