Skip to content

Commit b150f28

Browse files
authored
Merge pull request #37 from brutalsavage/release
feat: Agentless 1.5 update
2 parents 00df349 + cb5c622 commit b150f28

25 files changed

+4110
-691
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,8 @@ cython_debug/
176176
/get_repo_structure/xarray
177177
get_repo_structure/test_repo/
178178
results/
179+
logs/
180+
dev/
179181

180182
# data files
181183
*.jsonl

README.md

Lines changed: 11 additions & 167 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,6 @@
88
<p align="center">
99
<big><a href="#-news">😽News</a></big> |
1010
<big><a href="#-setup">🐈Setup</a></big> |
11-
<big><a href="#-localization">🙀Localization</a></big> |
12-
<big><a href="#-repair">😼Repair</a></big> |
1311
<big><a href="#-comparison">🧶Comparison</a></big> |
1412
<big><a href="#-artifacts">🐈‍⬛Artifacts</a></big> |
1513
<big><a href="#-citation">📝Citation</a></big> |
@@ -18,13 +16,15 @@
1816

1917
## 😽 News
2018

19+
- *Oct 28th, 2024*: We just released OpenAutoCoder-Agentless 1.5!
2120
- *July 1st, 2024*: We just released OpenAutoCoder-Agentless 1.0! **Agentless** currently is the best open-source approach on SWE-bench lite with 82 fixes (27.3%) and costing on average $0.34 per issue.
2221

2322
## 😺 About
2423

25-
**Agentless** is an *agentless* approach to automatically solve software development problems. To solve each issue, **Agentless** follows a simple two phase process: localization and repair.
26-
- 🙀 Localization: **Agentless** employs a hierarchical process to first localize the fault to specific files, then to relevant classes or functions, and finally to fine-grained edit locations
27-
- 😼 Repair : **Agentless** takes the edit locations and generates multiple candidate patches in a simple diff format, performs test filtering, and re-ranks all remaining patches to selects one to submit
24+
**Agentless** is an *agentless* approach to automatically solve software development problems. To solve each issue, **Agentless** follows a simple three phase process: localization, repair, and patch validation.
25+
- 🙀 **Localization**: Agentless employs a hierarchical process to first localize the fault to specific files, then to relevant classes or functions, and finally to fine-grained edit locations
26+
- 😼 **Repair**: Agentless takes the edit locations and samples multiple candidate patches per bug in a simple diff format
27+
- 😸 **Patch Validation**: Agentless selects the regression tests to run and generates additional reproduction test to reproduce the original error. Using the test results, Agentless re-ranks all remaining patches to selects one to submit
2828

2929
## 🐈 Setup
3030

@@ -56,167 +56,11 @@ Then export your OpenAI API key
5656
export OPENAI_API_KEY={key_here}
5757
```
5858

59-
Now you are ready to run **Agentless** on the problems in SWE-bench! We now go through a step-by-step example of how to run **Agentless**.
59+
Now you are ready to run **Agentless** on the problems in SWE-bench!
6060

6161
> [!NOTE]
6262
>
63-
> To reproduce the full SWE-bench lite experiments and follow our exact setup as described in the paper. Please see this [README](https://github.com/OpenAutoCoder/Agentless/blob/main/README_swebenchlite.md)
64-
65-
## 🙀 Localization
66-
67-
> [!TIP]
68-
>
69-
> For localization, you can use `--target_id` to specific a particular bug you want to target.
70-
>
71-
> For example `--target_id=django__django-11039`
72-
73-
In localization, the goal is find the locations in source code where we need to edit to fix the issues.
74-
**Agentless** uses a 3-stage localization step to first localize to specific files, then to relevant code elements, and finally to fine-grained edit locations.
75-
76-
> [!TIP]
77-
>
78-
> Since for each issue in the benchmark we need to checkout the repository and process the files, you might want to save some time by downloading the preprocessed data here: [swebench_lite_repo_structure.zip](https://github.com/OpenAutoCoder/Agentless/releases/tag/v0.1.0)
79-
>
80-
> After downloading, please unzip and export the location as such `export PROJECT_FILE_LOC={folder which you saved}`
81-
82-
Run the following command to generate the edit locations:
83-
84-
```shell
85-
mkdir results # where we will save our results
86-
python agentless/fl/localize.py --file_level --related_level --fine_grain_line_level \
87-
--output_folder results/location --top_n 3 \
88-
--compress \
89-
--context_window=10
90-
```
91-
92-
This will save all the localized locations in `results/location/loc_outputs.jsonl` with the logs saved in `results/location/localize.log`
93-
94-
95-
<details><summary>⏬ Structure of `loc_outputs.jsonl` <i>:: click to expand ::</i> </summary>
96-
<div>
97-
98-
- `instance_id`: task ID of the issue
99-
- `found_files`: list of files localized by the model
100-
- `additional_artifact_loc_file`: raw output of the model during file-level localization
101-
- `file_traj`: trajectory of the model during file-level localization (e.g., \# of tokens)
102-
- `found_related_locs`: list of relevant code elements localized by the model
103-
- `additional_artifact_loc_related`: raw output of the model during relevant-code-level localization
104-
- `related_loc_traj`: trajectory of the model during relevant-code-level localization
105-
- `found_edit_locs`: list of edit locations localized by the model
106-
- `additional_artifact_loc_edit_location`: raw output of the model during edit-location-level localization
107-
- `edit_loc_traj`: trajectory of the model during edit-location-level localization
108-
109-
</div>
110-
</details>
111-
112-
<details><summary>🙀 Individual localization steps <i>:: click to perform the individual localization step ::</i> </summary>
113-
<div>
114-
115-
#### Localize to files
116-
117-
We first start by localization to specific files
118-
119-
```shell
120-
mkdir results # where we will save our results
121-
python agentless/fl/localize.py --file_level --output_folder results/file_level
122-
```
123-
124-
This command saves the file-level localization in `results/file_level/loc_outputs.jsonl`, you can also check `results/file_level/localize.log` for detailed logs
125-
126-
#### Localize to related elements
127-
128-
Next, we localize to related elements within each of the files we localize
129-
130-
```shell
131-
python agentless/fl/localize.py --related_level \
132-
--output_folder results/related_level \
133-
--start_file results/file_level/loc_outputs.jsonl \
134-
--top_n 3 --compress
135-
```
136-
137-
Here the `--start_file` refers to the previous file-level localization. `--top_n` argument indicates the number of files we want to consider.
138-
139-
Similar to the previous stage, this command saves the related-element localization in `results/related_level/loc_outputs.jsonl`, with logs in `results/related_level/localize.log`
140-
141-
#### Localize to edit locations
142-
143-
Finally, we take the related elements from the previous step and localize to the edit locations we want the LLM to generate patches for
144-
145-
```shell
146-
python agentless/fl/localize.py --fine_grain_line_level \
147-
--output_folder results/edit_location \
148-
--start_file results/related_level/loc_outputs.jsonl \
149-
--top_n 3 --context_window=10
150-
```
151-
152-
Here the `--start_file` refers to the previous related-element localization. `--context_window` indicates the amount of lines before and after we provide to the LLM.
153-
154-
The final edit locations **Agentless** will perform repair on is saved in `results/edit_location/loc_outputs.jsonl`, with logs in `results/edit_location/localize.log`
155-
156-
157-
#### Sampling edit locations multiple times and merging
158-
159-
For the last localization step of localizing to edit locations, we can also perform sampling to obtain multiple sets of edit locations.
160-
161-
```shell
162-
python agentless/fl/localize.py --fine_grain_line_level \
163-
--output_folder results/edit_location_samples \
164-
--start_file results/related_level/loc_outputs.jsonl \
165-
--top_n 3 --context_window=10 --temperature 0.8 \
166-
--num_samples 4
167-
```
168-
169-
This command will sample with temperature 0.8 and generate 4 edit location sets. We can then merge them together to form a bigger list of edit locations.
170-
171-
Run the following command to merge:
172-
173-
```shell
174-
python agentless/fl/localize.py --merge \
175-
--output_folder results/edit_location_samples_merged \
176-
--start_file results/edit_location_samples/loc_outputs.jsonl \
177-
--num_samples 4
178-
```
179-
180-
This will perform pair-wise merging of samples (i.e., sample 0 and 1 will be merged and sample 2 and 3 will be merged). Furthermore it will also merge all samples together.
181-
182-
The merged location files can be found in `results/edit_location_samples_merged/loc_merged_{st_id}-{en_id}_outputs.jsonl` where `st_id` and `en_id` indicates the samples that are being merged. The location file with all samples merged together can be found as `results/edit_location_samples_merged/loc_all_merged_outputs.jsonl`. Furthermore, we also include the location of each individual sample for completeness within the folder.
183-
184-
</div>
185-
</details>
186-
187-
## 😼 Repair
188-
189-
Using the edit locations (i.e., `found_edit_locs`) from before, we now perform repair.
190-
191-
**Agentless** generates multiple patches per issue (controllable via parameters) and then perform majority voting to select the final patch for submission
192-
193-
Run the following command to generate the patches:
194-
195-
```shell
196-
python agentless/repair/repair.py --loc_file results/location/loc_outputs.jsonl \
197-
--output_folder results/repair \
198-
--loc_interval --top_n=3 --context_window=10 \
199-
--max_samples 10 --cot --diff_format \
200-
--gen_and_process
201-
```
202-
203-
This command generates 10 samples (1 greedy and 9 via temperature sampling) as defined `--max_samples 10`. The `--context_window` indicates the amount of code lines before and after each localized edit location we provide to the model for repair. The repair results is saved in `results/repair/output.jsonl`, which contains the raw output of each sample as well as the any trajectory information (e.g., number of tokens). The complete logs are also saved in `results/repair/repair.log`
204-
205-
> [!NOTE]
206-
>
207-
> We also perform post-processing to generate the complete git-diff patch for each repair samples.
208-
>
209-
> You can find the individual patch in `results/repair/output_{i}_processed.jsonl` where `i` is the sample number.
210-
211-
Finally, we perform majority voting to select the final patch to solve each issue. Run the following command:
212-
213-
```shell
214-
python agentless/repair/rerank.py --patch_folder results/repair --num_samples 10 --deduplicate --plausible
215-
```
216-
217-
In this case, we use `--num_samples 10` to pick from the 10 samples we generated previously, `--deduplicate` to apply normalization to each patch for better voting, and `--plausible` to select patches that can pass the previous regression tests (*warning: this feature is not yet implemented*)
218-
219-
This command will produced the `all_preds.jsonl` that contains the final selected patch for each instance_id which you can then directly use your favorite way of testing SWE-bench for evaluation!
63+
> To reproduce the full SWE-bench lite experiments and follow our exact setup as described in the paper. Please see this [README](https://github.com/OpenAutoCoder/Agentless/blob/main/README_swebench.md)
22064
22165
## 🧶 Comparison
22266

@@ -228,10 +72,10 @@ Below shows the comparison graph between **Agentless** and the best open-source
22872

22973
## 🐈‍⬛ Artifacts
23074

231-
You can download the complete artifacts of **Agentless** in our [v0.1.0 release](https://github.com/OpenAutoCoder/Agentless/releases/tag/v0.1.0):
232-
- 🐈‍⬛ agentless_logs: raw logs and trajectory information
233-
- 🐈‍⬛ swebench_lite_repo_structure: preprocessed structure information for each SWE-Bench-lite problem
234-
- 🐈‍⬛ 20240630_agentless_gpt4o: evaluated run of **Agentless** used in our paper
75+
You can download the complete artifacts of **Agentless** in our [v1.5.0 release](https://github.com/OpenAutoCoder/Agentless/releases/tag/v1.5.0):
76+
- 🐈‍⬛ agentless_swebench_lite: complete Agentless run on SWE-bench Lite
77+
- 🐈‍⬛ agentless_swebench_verified: complete Agentless run on SWE-bench Verified
78+
- 🐈‍⬛ swebench_repo_structure: preprocessed structure information for each SWE-Bench problem
23579

23680
You can also checkout `classification/` folder to obtain our manual classifications of SWE-bench-lite as well as our filtered SWE-bench-lite-*S* problems.
23781

0 commit comments

Comments
 (0)