Skip to content

Replication package for MSR ’26: "AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development"

Notifications You must be signed in to change notification settings

shyamagarwal13/agentic-coding-impact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Replication Package: AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development

This repository contains the replication package for the following paper:

Shyam Agarwal, Hao He, and Bogdan Vasilescu. 2026. AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development. In 23rd International Conference on Mining Software Repositories (MSR ’26), April 13–14, 2026, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3793302.3793589

Some additional files are available on Zenodo [Zenodo DOI placeholder - to be added]

MSR Challenge 2026

This replication package contains all data, code, and analysis scripts necessary to reproduce the results presented in our paper for the MSR Challenge 2026: "AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development."

Overview

This study investigates the causal impact of coding agents on software development outcomes by employing difference-in-differences (DiD) estimation methods to analyze how agent adoption affects various software quality and productivity metrics.

Package Organization

The replication package is organized into three main directories:

data/

Contains all datasets used throughout the analysis pipeline:

  • panel_event_monthly.csv: Main panel dataset for difference-in-differences analysis, containing monthly aggregated metrics for treatment and control repositories
  • repos_with_details.csv: Repository-level metadata including adoption dates, metrics, and classification flags
  • matching.csv: Propensity score matching results linking treatment repositories to matched control repositories
  • repo_events.csv / repo_events_control.csv: GitHub event-level data for treatment and control repositories
  • ts_repos_monthly.csv / ts_repos_control_monthly.csv: Monthly time series data for treatment and control groups
  • all_scraped_prs_final_list.csv: Pull request data collected during the study
  • agent_first.txt: List of repositories that adopted agents directly (without prior AI traces)
  • ide_first.txt: List of repositories that adopted agents after potentially using traditional AI tools

Note: Some data files may not be present in this repository due to size limitations. These additional data files can be found at: [Zenodo DOI placeholder - to be added]

notebooks/

R Markdown notebooks that reproduce all analyses, tables, and figures:

  1. DiffinDiff.Rmd: Main difference-in-differences analysis for both AF and IF groups

    • Generates static treatment effects table
    • Creates dynamic treatment effects (event study) plot for six key outcomes
  2. AdoptionTimeAnalysis.Rmd: Analysis of agent adoption timing patterns

    • Generates adoption time distribution plot comparing AF and IF groups
  3. RepoMetricsAnalysis.Rmd: Descriptive statistics and repository metrics comparison

    • Generates LaTeX table with summary statistics (mean, min, median, max) for both AF and IF groups

plots/

Contains all figures generated by the notebooks:

  • dynamic_effects.pdf: Event study plot showing dynamic treatment effects across six outcomes
  • agent_adoption_time_combined.pdf: Bar chart showing adoption timing distribution by group

Data Collection and Scripts

Our data collection and processing workflows are based on adaptations of the scripts originally provided in the following replication package:

Hao He, Courtney Miller, Shyam Agarwal, Christian Kästner, and Bogdan Vasilescu. 2026. Speed at the Cost of Quality: How Cursor AI Increases Short-Term Velocity and Long-Term Complexity in Open-Source Projects. In 23rd International Conference on Mining Software Repositories (MSR ’26), April 13–14, 2026, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 19 pages. https://doi.org/10.1145/3793302.3793349

Source: https://zenodo.org/records/18368661

Specifically, we modified their scripts to support our study goals, including the detection of AI tool traces, event data collection, propensity score matching, and repository metric aggregation. Our adaptations focus on distinguishing between repositories with and without prior AI tool usage (similar to their robustness check), and analyzing the differential impact of agent adoption across these groups. Please refer to the source above for the original codebase.

Requirements

R Environment

All analyses were performed using R 4.3.3. Required R packages:

install.packages(c(
    "didimputation",    # Borusyak et al. DiD imputation estimator
    "ggplot2",         # Plotting
    "dplyr",           # Data manipulation
    "data.table",      # Fast data operations
    "readr",           # Reading CSV files
    "tidyr",           # Data tidying
    "stringr",         # String manipulation
    "purrr",           # Functional programming
    "lubridate",       # Date/time manipulation
    "knitr",           # Dynamic report generation
    "kableExtra",      # Enhanced table formatting
    "Cairo"            # High-quality graphics device
))

System Requirements

  • R: Version 4.3.3 or compatible
  • Cairo graphics library: Required for PDF generation (install system dependencies as needed)

Replication Instructions

Step 1: Obtain the Data

All required data files are included in the data/ folder. If any data files are missing from this repository (due to size limitations), they will be available on Zenodo: [Zenodo DOI placeholder - to be added]

Please download the full dataset from Zenodo and place all files in the data/ folder to ensure complete reproducibility.

Step 2: Set Up R Environment

Install R 4.3.3 and the required packages listed above. Ensure Cairo graphics support is available for PDF generation.

Step 3: Reproduce Results

Knit the notebooks in RStudio or using R's rmarkdown::render() function. The notebooks should be executed in the following order:

  1. RepoMetricsAnalysis.Rmd

    • Generates descriptive statistics table (LaTeX format)
    • Output: LaTeX table printed to console
  2. AdoptionTimeAnalysis.Rmd

    • Analyzes adoption timing patterns
    • Output: plots/agent_adoption_time_combined.pdf
  3. DiffinDiff.Rmd

    • Main DiD analysis for AF and IF control groups
    • Output:
      • Static treatment effects table (displayed in HTML)
      • plots/dynamic_effects.pdf (event study plot)

Each notebook reads data from ../data/ and saves outputs to ../plots/ (relative to the notebook location).

Step 4: View Results

  • HTML outputs: Each notebook generates an HTML file with embedded tables and plots
  • PDF plots: High-quality PDF figures are saved in the plots/ directory
  • LaTeX tables: Descriptive statistics are printed in LaTeX format to the console

Notes

  • All data files are pre-processed and ready for analysis. The raw data collection scripts are available in the referenced Cursor AI study replication package.
  • Results may vary slightly due to R package version differences, but the overall findings should be consistent.
  • The notebooks are designed to be self-contained and reproducible with the provided data.

Citation

If you use this replication package, please cite:

Shyam Agarwal, Hao He, and Bogdan Vasilescu. 2026. AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development. In 23rd International Conference on Mining Software Repositories (MSR ’26), April 13–14, 2026, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3793302.3793589

About

Replication package for MSR ’26: "AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published