Replication Package: AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development

This repository contains the replication package for the following paper:

Shyam Agarwal, Hao He, and Bogdan Vasilescu. 2026. AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development. In 23rd International Conference on Mining Software Repositories (MSR ’26), April 13–14, 2026, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3793302.3793589

Some additional files are available on Zenodo [Zenodo DOI placeholder - to be added]

MSR Challenge 2026

This replication package contains all data, code, and analysis scripts necessary to reproduce the results presented in our paper for the MSR Challenge 2026: "AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development."

Overview

This study investigates the causal impact of coding agents on software development outcomes by employing difference-in-differences (DiD) estimation methods to analyze how agent adoption affects various software quality and productivity metrics.

Package Organization

The replication package is organized into three main directories:

`data/`

Contains all datasets used throughout the analysis pipeline:

panel_event_monthly.csv: Main panel dataset for difference-in-differences analysis, containing monthly aggregated metrics for treatment and control repositories
repos_with_details.csv: Repository-level metadata including adoption dates, metrics, and classification flags
matching.csv: Propensity score matching results linking treatment repositories to matched control repositories
repo_events.csv / repo_events_control.csv: GitHub event-level data for treatment and control repositories
ts_repos_monthly.csv / ts_repos_control_monthly.csv: Monthly time series data for treatment and control groups
all_scraped_prs_final_list.csv: Pull request data collected during the study
agent_first.txt: List of repositories that adopted agents directly (without prior AI traces)
ide_first.txt: List of repositories that adopted agents after potentially using traditional AI tools

Note: Some data files may not be present in this repository due to size limitations. These additional data files can be found at: [Zenodo DOI placeholder - to be added]

`notebooks/`

R Markdown notebooks that reproduce all analyses, tables, and figures:

DiffinDiff.Rmd: Main difference-in-differences analysis for both AF and IF groups
- Generates static treatment effects table
- Creates dynamic treatment effects (event study) plot for six key outcomes
AdoptionTimeAnalysis.Rmd: Analysis of agent adoption timing patterns
- Generates adoption time distribution plot comparing AF and IF groups
RepoMetricsAnalysis.Rmd: Descriptive statistics and repository metrics comparison
- Generates LaTeX table with summary statistics (mean, min, median, max) for both AF and IF groups

`plots/`

Contains all figures generated by the notebooks:

dynamic_effects.pdf: Event study plot showing dynamic treatment effects across six outcomes
agent_adoption_time_combined.pdf: Bar chart showing adoption timing distribution by group

Data Collection and Scripts

Our data collection and processing workflows are based on adaptations of the scripts originally provided in the following replication package:

Hao He, Courtney Miller, Shyam Agarwal, Christian Kästner, and Bogdan Vasilescu. 2026. Speed at the Cost of Quality: How Cursor AI Increases Short-Term Velocity and Long-Term Complexity in Open-Source Projects. In 23rd International Conference on Mining Software Repositories (MSR ’26), April 13–14, 2026, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 19 pages. https://doi.org/10.1145/3793302.3793349

Source: https://zenodo.org/records/18368661

Specifically, we modified their scripts to support our study goals, including the detection of AI tool traces, event data collection, propensity score matching, and repository metric aggregation. Our adaptations focus on distinguishing between repositories with and without prior AI tool usage (similar to their robustness check), and analyzing the differential impact of agent adoption across these groups. Please refer to the source above for the original codebase.

Requirements

R Environment

All analyses were performed using R 4.3.3. Required R packages:

install.packages(c(
    "didimputation",    # Borusyak et al. DiD imputation estimator
    "ggplot2",         # Plotting
    "dplyr",           # Data manipulation
    "data.table",      # Fast data operations
    "readr",           # Reading CSV files
    "tidyr",           # Data tidying
    "stringr",         # String manipulation
    "purrr",           # Functional programming
    "lubridate",       # Date/time manipulation
    "knitr",           # Dynamic report generation
    "kableExtra",      # Enhanced table formatting
    "Cairo"            # High-quality graphics device
))

System Requirements

R: Version 4.3.3 or compatible
Cairo graphics library: Required for PDF generation (install system dependencies as needed)

Replication Instructions

Step 1: Obtain the Data

All required data files are included in the data/ folder. If any data files are missing from this repository (due to size limitations), they will be available on Zenodo: [Zenodo DOI placeholder - to be added]

Please download the full dataset from Zenodo and place all files in the data/ folder to ensure complete reproducibility.

Step 2: Set Up R Environment

Install R 4.3.3 and the required packages listed above. Ensure Cairo graphics support is available for PDF generation.

Step 3: Reproduce Results

Knit the notebooks in RStudio or using R's rmarkdown::render() function. The notebooks should be executed in the following order:

RepoMetricsAnalysis.Rmd
- Generates descriptive statistics table (LaTeX format)
- Output: LaTeX table printed to console
AdoptionTimeAnalysis.Rmd
- Analyzes adoption timing patterns
- Output: plots/agent_adoption_time_combined.pdf
DiffinDiff.Rmd
- Main DiD analysis for AF and IF control groups
- Output:
  - Static treatment effects table (displayed in HTML)
  - plots/dynamic_effects.pdf (event study plot)

Each notebook reads data from ../data/ and saves outputs to ../plots/ (relative to the notebook location).

Step 4: View Results

HTML outputs: Each notebook generates an HTML file with embedded tables and plots
PDF plots: High-quality PDF figures are saved in the plots/ directory
LaTeX tables: Descriptive statistics are printed in LaTeX format to the console

Notes

All data files are pre-processed and ready for analysis. The raw data collection scripts are available in the referenced Cursor AI study replication package.
Results may vary slightly due to R package version differences, but the overall findings should be consistent.
The notebooks are designed to be self-contained and reproducible with the provided data.

Citation

If you use this replication package, please cite:

Shyam Agarwal, Hao He, and Bogdan Vasilescu. 2026. AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development. In 23rd International Conference on Mining Software Repositories (MSR ’26), April 13–14, 2026, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3793302.3793589

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
notebooks		notebooks
plots		plots
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replication Package: AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development

Overview

Package Organization

`data/`

`notebooks/`

`plots/`

Data Collection and Scripts

Requirements

R Environment

System Requirements

Replication Instructions

Step 1: Obtain the Data

Step 2: Set Up R Environment

Step 3: Reproduce Results

Step 4: View Results

Notes

Citation

About

Uh oh!

Releases

Packages

shyamagarwal13/agentic-coding-impact

Folders and files

Latest commit

History

Repository files navigation

Replication Package: AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development

Overview

Package Organization

data/

notebooks/

plots/

Data Collection and Scripts

Requirements

R Environment

System Requirements

Replication Instructions

Step 1: Obtain the Data

Step 2: Set Up R Environment

Step 3: Reproduce Results

Step 4: View Results

Notes

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

`data/`

`notebooks/`

`plots/`

Packages