Calculate year-over-year entry and exit rates for U.S. nonprofits using IRS Business Master File (BMF) data from the NCCS Data Archive.
This project analyzes nonprofit sector dynamics by tracking which organizations appear and disappear from the IRS Business Master File over time. The BMF is a cumulative listing of all organizations that have applied for and received tax-exempt status from the IRS.
- Source: NCCS Data Archive (National Center for Charitable Statistics)
- S3 Bucket:
nccsdata(publicly accessible, no credentials required) - Coverage: 1989-present (monthly snapshots)
The script automatically handles two BMF file formats:
| Format | Years | S3 Path | Filename Pattern |
|---|---|---|---|
| Legacy | 1989-2022 | legacy/bmf/ |
BMF-YYYY-MM-501CX-NONPROFIT-PX.csv |
| New | 2023+ | raw/bmf/ |
YYYY-MM-BMF.csv |
For each year transition (Year N to Year N+1), the script:
- Extracts all unique EINs from each year's BMF files
- Computes set differences to identify entries and exits
- Calculates rates as percentages of the current year's count
Formulas:
Exit Rate = |EINs in Year N but NOT in Year N+1| / |EINs in Year N|
Entry Rate = |EINs in Year N+1 but NOT in Year N| / |EINs in Year N|
"Exit" does not necessarily mean closure. An organization leaving the BMF may have:
- Actually dissolved or ceased operations
- Lost tax-exempt status (voluntary or involuntary)
- Merged with another organization
- Been removed due to IRS data corrections
"Entry" does not necessarily mean new formation. An organization appearing may have:
- Been newly formed and received tax-exempt determination
- Been reinstated after prior revocation
- Appeared due to IRS data corrections or processing delays
Temporal alignment: The script aggregates all monthly snapshots within a calendar year and takes the union of EINs to maximize coverage and minimize artifacts from snapshot timing.
- R >= 4.0.0
Install required packages:
install.packages(c("aws.s3", "data.table", "dplyr", "tidyr", "ggplot2", "purrr"))| Package | Purpose |
|---|---|
aws.s3 |
Access S3 bucket data |
data.table |
Fast CSV reading with fread() |
dplyr |
Data manipulation |
tidyr |
Data reshaping for visualization |
ggplot2 |
Visualization |
purrr |
Functional programming utilities |
- Internet connection (downloads data from AWS S3)
- Sufficient RAM (~4GB recommended for processing multiple years)
source("R/bmf_formation_rates.R")Or from the command line:
Rscript R/bmf_formation_rates.R| File | Description |
|---|---|
bmf_yoy_exit_rates.csv |
Year-over-year rates table |
bmf_entry_exit_rates.png |
Visualization of entry/exit rates |
| Column | Description |
|---|---|
year |
Starting year of comparison |
next_year |
Ending year of comparison |
eins_current |
Unique EINs in starting year |
eins_next |
Unique EINs in ending year |
eins_exited |
Count of EINs that exited |
eins_entered |
Count of new EINs that entered |
exit_rate |
eins_exited / eins_current |
entry_rate |
eins_entered / eins_current |
net_change |
eins_next - eins_current |
net_change_pct |
net_change / eins_current |
nonprofit_entry_exit_rates/
├── R/
│ └── bmf_formation_rates.R # Main analysis script
├── README.md # This file
├── LICENSE # MIT License
└── nonprofit_entry_exit_rates.Rproj # RStudio project file
MIT License - see LICENSE file.
If using this analysis, please cite:
- NCCS Data Archive: https://nccs.urban.org/
- Urban Institute: https://www.urban.org/