You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-*Oct 28th, 2024*: We just released OpenAutoCoder-Agentless 1.5!
21
20
-*July 1st, 2024*: We just released OpenAutoCoder-Agentless 1.0! **Agentless** currently is the best open-source approach on SWE-bench lite with 82 fixes (27.3%) and costing on average $0.34 per issue.
22
21
23
22
## 😺 About
24
23
25
-
**Agentless** is an *agentless* approach to automatically solve software development problems. To solve each issue, **Agentless** follows a simple two phase process: localization and repair.
26
-
- 🙀 Localization: **Agentless** employs a hierarchical process to first localize the fault to specific files, then to relevant classes or functions, and finally to fine-grained edit locations
27
-
- 😼 Repair : **Agentless** takes the edit locations and generates multiple candidate patches in a simple diff format, performs test filtering, and re-ranks all remaining patches to selects one to submit
24
+
**Agentless** is an *agentless* approach to automatically solve software development problems. To solve each issue, **Agentless** follows a simple three phase process: localization, repair, and patch validation.
25
+
- 🙀 **Localization**: Agentless employs a hierarchical process to first localize the fault to specific files, then to relevant classes or functions, and finally to fine-grained edit locations
26
+
- 😼 **Repair**: Agentless takes the edit locations and samples multiple candidate patches per bug in a simple diff format
27
+
- 😸 **Patch Validation**: Agentless selects the regression tests to run and generates additional reproduction test to reproduce the original error. Using the test results, Agentless re-ranks all remaining patches to selects one to submit
28
28
29
29
## 🐈 Setup
30
30
@@ -56,167 +56,11 @@ Then export your OpenAI API key
56
56
export OPENAI_API_KEY={key_here}
57
57
```
58
58
59
-
Now you are ready to run **Agentless** on the problems in SWE-bench! We now go through a step-by-step example of how to run **Agentless**.
59
+
Now you are ready to run **Agentless** on the problems in SWE-bench!
60
60
61
61
> [!NOTE]
62
62
>
63
-
> To reproduce the full SWE-bench lite experiments and follow our exact setup as described in the paper. Please see this [README](https://github.com/OpenAutoCoder/Agentless/blob/main/README_swebenchlite.md)
64
-
65
-
## 🙀 Localization
66
-
67
-
> [!TIP]
68
-
>
69
-
> For localization, you can use `--target_id` to specific a particular bug you want to target.
70
-
>
71
-
> For example `--target_id=django__django-11039`
72
-
73
-
In localization, the goal is find the locations in source code where we need to edit to fix the issues.
74
-
**Agentless** uses a 3-stage localization step to first localize to specific files, then to relevant code elements, and finally to fine-grained edit locations.
75
-
76
-
> [!TIP]
77
-
>
78
-
> Since for each issue in the benchmark we need to checkout the repository and process the files, you might want to save some time by downloading the preprocessed data here: [swebench_lite_repo_structure.zip](https://github.com/OpenAutoCoder/Agentless/releases/tag/v0.1.0)
79
-
>
80
-
> After downloading, please unzip and export the location as such `export PROJECT_FILE_LOC={folder which you saved}`
81
-
82
-
Run the following command to generate the edit locations:
This command saves the file-level localization in `results/file_level/loc_outputs.jsonl`, you can also check `results/file_level/localize.log` for detailed logs
125
-
126
-
#### Localize to related elements
127
-
128
-
Next, we localize to related elements within each of the files we localize
Here the `--start_file` refers to the previous file-level localization. `--top_n` argument indicates the number of files we want to consider.
138
-
139
-
Similar to the previous stage, this command saves the related-element localization in `results/related_level/loc_outputs.jsonl`, with logs in `results/related_level/localize.log`
140
-
141
-
#### Localize to edit locations
142
-
143
-
Finally, we take the related elements from the previous step and localize to the edit locations we want the LLM to generate patches for
Here the `--start_file` refers to the previous related-element localization. `--context_window` indicates the amount of lines before and after we provide to the LLM.
153
-
154
-
The final edit locations **Agentless** will perform repair on is saved in `results/edit_location/loc_outputs.jsonl`, with logs in `results/edit_location/localize.log`
155
-
156
-
157
-
#### Sampling edit locations multiple times and merging
158
-
159
-
For the last localization step of localizing to edit locations, we can also perform sampling to obtain multiple sets of edit locations.
This command will sample with temperature 0.8 and generate 4 edit location sets. We can then merge them together to form a bigger list of edit locations.
This will perform pair-wise merging of samples (i.e., sample 0 and 1 will be merged and sample 2 and 3 will be merged). Furthermore it will also merge all samples together.
181
-
182
-
The merged location files can be found in `results/edit_location_samples_merged/loc_merged_{st_id}-{en_id}_outputs.jsonl` where `st_id` and `en_id` indicates the samples that are being merged. The location file with all samples merged together can be found as `results/edit_location_samples_merged/loc_all_merged_outputs.jsonl`. Furthermore, we also include the location of each individual sample for completeness within the folder.
183
-
184
-
</div>
185
-
</details>
186
-
187
-
## 😼 Repair
188
-
189
-
Using the edit locations (i.e., `found_edit_locs`) from before, we now perform repair.
190
-
191
-
**Agentless** generates multiple patches per issue (controllable via parameters) and then perform majority voting to select the final patch for submission
192
-
193
-
Run the following command to generate the patches:
This command generates 10 samples (1 greedy and 9 via temperature sampling) as defined `--max_samples 10`. The `--context_window` indicates the amount of code lines before and after each localized edit location we provide to the model for repair. The repair results is saved in `results/repair/output.jsonl`, which contains the raw output of each sample as well as the any trajectory information (e.g., number of tokens). The complete logs are also saved in `results/repair/repair.log`
204
-
205
-
> [!NOTE]
206
-
>
207
-
> We also perform post-processing to generate the complete git-diff patch for each repair samples.
208
-
>
209
-
> You can find the individual patch in `results/repair/output_{i}_processed.jsonl` where `i` is the sample number.
210
-
211
-
Finally, we perform majority voting to select the final patch to solve each issue. Run the following command:
In this case, we use `--num_samples 10` to pick from the 10 samples we generated previously, `--deduplicate` to apply normalization to each patch for better voting, and `--plausible` to select patches that can pass the previous regression tests (*warning: this feature is not yet implemented*)
218
-
219
-
This command will produced the `all_preds.jsonl` that contains the final selected patch for each instance_id which you can then directly use your favorite way of testing SWE-bench for evaluation!
63
+
> To reproduce the full SWE-bench lite experiments and follow our exact setup as described in the paper. Please see this [README](https://github.com/OpenAutoCoder/Agentless/blob/main/README_swebench.md)
220
64
221
65
## 🧶 Comparison
222
66
@@ -228,10 +72,10 @@ Below shows the comparison graph between **Agentless** and the best open-source
228
72
229
73
## 🐈⬛ Artifacts
230
74
231
-
You can download the complete artifacts of **Agentless** in our [v0.1.0 release](https://github.com/OpenAutoCoder/Agentless/releases/tag/v0.1.0):
232
-
- 🐈⬛ agentless_logs: raw logs and trajectory information
233
-
- 🐈⬛ swebench_lite_repo_structure: preprocessed structure information for each SWE-Bench-lite problem
234
-
- 🐈⬛ 20240630_agentless_gpt4o: evaluated run of **Agentless** used in our paper
75
+
You can download the complete artifacts of **Agentless** in our [v1.5.0 release](https://github.com/OpenAutoCoder/Agentless/releases/tag/v1.5.0):
76
+
- 🐈⬛ agentless_swebench_lite: complete Agentless run on SWE-bench Lite
77
+
- 🐈⬛ agentless_swebench_verified: complete Agentless run on SWE-bench Verified
78
+
- 🐈⬛ swebench_repo_structure: preprocessed structure information for each SWE-Bench problem
235
79
236
80
You can also checkout `classification/` folder to obtain our manual classifications of SWE-bench-lite as well as our filtered SWE-bench-lite-*S* problems.
0 commit comments