THIS PACKAGE IS CURRENTLY UNDER DEVELOPMENT.
A Stata package to easily load, filter, and merge datasets from the World Bank's Space2Stats initiative at the ADM 2, 1, and 0 level. The packages queries data from two Space2Stat Development Data Hub (DDH) repositories:
-
The main Space2Stats Database, to query data on:
- Population Demographics, 2020 (WorldPop)
- Degree of Urbanization (GHS-SMOD)
- Annual Nighttime Lights (2012 to present) (World Bank, Light Every Night)
- Flood Exposure (Fathom v3 and WorldPop)
-
The Space2Stats Database of Monthly and Annual Black Marble Nighttime Lights, to query data on:
- Annual Nighttime Lights (2012 to present) (NASA, Black Marble)
- Monthly Nighttime Lights (2012 to present) (NASA, Black Marble)
Note: The main Space2Stats database aggregates data at the H3 level to the ADM2 level, and the temporal resolution is annual. The Space2Stats Black Marble database is separate, as the data are aggregated from raw satellite imagery to the ADM2 level and include monthly data.
query_s2s simplifies access to multiple World Bank spatial datasets by:
- Loading data directly from the World Bank Development Data Hub (DDH)
- Filtering by country and date range
- Merging multiple datasets automatically
- Aggregating to different administrative levels (ADM0, ADM1, or ADM2)
- Handling temporal data (annual and monthly) efficiently
net install space2stats-stata, from("https://raw.githubusercontent.com/worldbank/space2stats-stata/main/src") replacequery_s2s, ///
datasets(string) ///
[
iso3(string) ///
date_start(string) ///
date_end(string) ///
adm_level(integer 2)
add_admin_names(integer 0)
]datasets(string): One or more datasets to load (space-separated)ntl_viirs_bm_annual- Nighttime lights (VIIRS, NASA Black Marble), annual (2012 to present)ntl_viirs_bm_monthly- Nighttime lights (VIIRS, NASA Black Marble), monthly (2012 to present)ntl_viirs_len_annual- Nighttime lights (VIIRS, World Bank Light Every Night), annual (2012 to present)flood_exposure- Flood exposure data (Fathom v3 and WorldPop)population_2020- Population data for 2020 (WorldPop)urbanization- Urbanization data (GHS-SMOD)
-
iso3(string): Filter by ISO3 country codes (space-separated)- Example:
iso3(USA MEX CAN) - If omitted, loads data for all countries
- Example:
-
date_start(string): Start date for temporal filtering- Format:
yyyy-mm-ddoryyyy - Only applies to NTL datasets
- Example:
date_start(2020-01-01)ordate_start(2020)
- Format:
-
date_end(string): End date for temporal filtering- Format:
yyyy-mm-ddoryyyy - Only applies to NTL datasets
- Example:
date_end(2023-12-31)ordate_end(2023)
- Format:
-
adm_level(integer): Administrative level for aggregation0- Country level (ADM0)1- First administrative division (ADM1, e.g., states/provinces)2- Second administrative division (ADM2, e.g., counties/districts) [default]
-
add_admin_names(integer): Add administrative level 1 and 2 names; names come from the World Bank Official Boundaries Admin 2 - Additional Attributes dataset0- False [default]1- True
- Cannot combine
ntl_viirs_bm_monthlywithntl_viirs_bm_annualorntl_viirs_len_annualdue to different temporal structures - All other dataset combinations are supported
- Date filtering only applies to NTL datasets
- Non-NTL datasets (flood_exposure, population_2020, urbanization) represent single time periods
query_s2s, datasets(population_2020) iso3(USA)query_s2s, datasets(population_2020 flood_exposure urbanization) iso3(USA MEX CAN)query_s2s, datasets(ntl_viirs_bm_annual) iso3(USA) date_start(2020) date_end(2023)query_s2s, datasets(ntl_viirs_bm_monthly) iso3(MEX) date_start(2020-01-01) date_end(2021-12-31)query_s2s, datasets(ntl_viirs_bm_annual population_2020) iso3(USA) adm_level(1)query_s2s, datasets(ntl_viirs_bm_annual flood_exposure) iso3(USA MEX CAN) adm_level(0)query_s2s, datasets(ntl_viirs_bm_annual ntl_viirs_len_annual) iso3(BRA) date_start(2015) date_end(2023)query_s2s, datasets(flood_exposure population_2020 urbanization) iso3(IND BGD PAK)query_s2s, datasets(ntl_viirs_bm_annual) iso3(CHN IND) date_start(2012) date_end(2024) adm_level(0)query_s2s, datasets(ntl_viirs_bm_annual flood_exposure population_2020 urbanization) iso3(USA)The function loads the requested datasets into Stata's memory with:
- Automatic merging based on administrative codes and temporal variables
- Variable labels indicating the source dataset
- A summary showing:
- Datasets loaded
- Administrative level
- Countries (if filtered)
- Date range (if specified)
- Number of observations
- List of variables
When adm_level is set to 1 or 0, the function aggregates data using:
- Sum: Population counts, night-time lights totals
- Mean: Average statistics, percentages
- Max: Maximum values
- Datasets with matching temporal structures (year or date) use 1:1 merges
- Static datasets (no time dimension) use m:1 merges with temporal datasets
- Merge keys adjust automatically based on
adm_level
Issue: Error loading data
- Solution: Check your internet connection; data is loaded from World Bank APIs
Issue: "Cannot combine monthly and annual datasets"
- Solution: Use separate queries for monthly vs. annual NTL data
Issue: Out of memory
- Solution: Filter by specific countries or date ranges to reduce data size
Issue: Variables not found after collapse
- Solution: Check that the requested datasets contain the expected variables
Stata package developed by Robert Marty (rmarty@worldbank.org) and Sahiti Sarva (ssarva@worldbank.org).
This project is licensed under the MIT License together with the World Bank IGO Rider. The Rider is purely procedural: it reserves all privileges and immunities enjoyed by the World Bank, without adding restrictions to the MIT permissions. Please review both files before using, distributing or contributing.
If you use this function in your research, please cite the underlying datasets from the World Bank Data Catalog and acknowledge the Space2Stats initiative.