A benchmark for web API integration code generation. For more information, check out our AIware 2025 paper Benchmarking Web API Integration Code Generation. An appendix to the paper and all evaluation results are provided in our artifact.
Evaluation pipeline: wapiibench/evaluation.py
Dataset creation: wapiibench/dataset_generation.py
Full dataset: data/synthetic/all/test_data_final.json
API-specific subsets: data/synthetic/{api}/test_data_corrected.json
Codes, configs, logs, results: data/generated/{model}/{api}/{setup}/{setting}/
Aggregated results: data/generated/{model}/all/{setup}/{setting}/results.json
Data visualization: wapiibench/export_results.py
Running the evaluation pipeline:
python evaluation.py --models MODELS --apis APIS --setups SETUPS --settings SETTINGS
Example:
python evaluation.py --models 'bigcode/starcoder2-15b' --apis 'asana' 'google_calendar_v3' 'google_sheet_v4' 'slack' --setups 'invocation' 'endpoint' --settings 'unconstrained'
(Further options exist; see python evaluation.py --help)
Translation of OpenAPI specs to regex constraints: wapiibench/openapi_utils.py
Constrained decoding implementation: wapiibench/logits_processor.py
Retrieval of endpoint documentation from OpenAPI specs: wapiibench/rag_utils.py
The minimum Python version is 3.9. We recommend using a virtual environment.
Install/upgrade basic dependencies:
pip install --upgrade torch transformers accelerate numpy openapi3-parser pyyaml regex strenum tqdm
Additional optional dependencies:
openaifor running API-based modelsscikit-learnfor retrieval-augmented generationpandas matplotlibfor plotting resultsargillafor data curation
Special dependencies for certain models
transformers<4.50.0for codet5p-*b, instructcodet5p-16b, codegen-*B-multi, codegen2-*B_Ptransformers<4.41.0 flash-attnfor DeepSeek-Coder-V2-Lite-Base, DeepSeek-Coder-V2-Lite-Instruct (to installflash-attnwithpip, use the flags--use-pep517--no-build-isolation)
Dump dependencies to make setup reproducible:
pip freeze > requirements.txt
Install/update node using nvm:
nvm install --lts
Install JS dependencies (requires node):
npm install axios axios-mock-adapter
Upgrade JS dependencies:
npm update