A high-performance, dependency-free Go API for fuzzy searching Indonesian administrative regions using DuckDB. This service provides fast and accurate search capabilities for Indonesian provinces, cities, districts, and subdistricts.
- Features
- API Usage
- Configuration
- Quick Start
- Deployment
- Maintenance
- Makefile Commands
- Acknowledgements
- Project Structure
- BM25 Full-Text Search: Utilizes DuckDB's
match_bm25for fast and relevant full-text search across all administrative levels. - Fuzzy Search: Employs the Jaro-Winkler similarity algorithm for typo-tolerant searches on specific administrative levels (province, city, district, subdistrict).
- BPS Integration: Optionally search and respond with official BPS (Badan Pusat Statistik) codes and names.
- High Performance: Powered by DuckDB for fast querying of Indonesian administrative data.
- Lightweight: Minimal dependencies with the GoFiber web framework.
- Container Ready: Dockerized application for easy deployment.
- Clean Architecture: Delivery, use case, repository, and gateway layers are isolated to keep business rules portable.
- Configurable: Environment-based configuration for port, database path, and ingestion data directory.
The codebase follows a Clean Architecture layout:
cmd/api,cmd/ingestor– binary entrypoints that delegate to internal bootstrappers.internal/config– central wiring for loggers, DuckDB connections, Fiber apps, and use cases.internal/delivery/http– Fiber controllers, routes, and middleware for the public API.internal/delivery/worker– CLI-facing delivery adapter that runs dataset refresh workflows.internal/usecase– business rules for region search and dataset ingestion.internal/repository/duckdb– data-access implementations and administrative helpers for DuckDB.internal/gateway– filesystem loader and SQL normalizer abstractions used by the ingestion flow.internal/shared– cross-cutting concerns such as error taxonomy.
The general search endpoint supports both full-text and field-level fuzzy filters. Use any combination of the parameters below to narrow results.
- BM25 Full-Text Search:
quses DuckDBmatch_bm25over a combinedfull_textcolumn. - Field Fuzzy Filters:
subdistrict,district,city,provinceuse Jaro-Winkler (≥ 0.8). - Composable: Combine parameters; scores are aggregated and results ordered by total score.
- City matching:
citymatches both "Kota {name}" and "Kabupaten {name}" automatically. - Performance: Returns top 10 results.
GET /v1/search?q={query}&subdistrict={name}&district={name}&city={name}&province={name}
Parameters:
q(optional): Full-text query (e.g., "bandung").subdistrict(optional): Fuzzy filter for subdistrict.district(optional): Fuzzy filter for district.city(optional): Fuzzy filter for city/regency (no need to prefix with Kota/Kabupaten).province(optional): Fuzzy filter for province.limit(optional): Maximum records to return (defaults to 10, capped at 100).search_bps(optional): Whentrue, fuzzy comparisons use BPS (Badan Pusat Statistik) names.include_bps(optional): Whentrue, the response adds BPS codes and names for each level.include_scores(optional): Whentrue, the response adds the full-text score and per-field similarity scores.
Example Requests:
# Full-text only
curl "http://localhost:8080/v1/search?q=bandung"
# Combine full-text with province filter
curl "http://localhost:8080/v1/search?q=bandung&province=Jawa Barat"
# Field-only filters (no q)
curl "http://localhost:8080/v1/search?district=Cidadap&city=Bandung&province=Jawa Barat"
# Request BPS metadata and scoring
curl "http://localhost:8080/v1/search?q=bandung&include_bps=true&include_scores=true&limit=5"
# Search using BPS naming
curl "http://localhost:8080/v1/search?q=kemayoran&search_bps=true&include_bps=true"Example Response:
[
{
"id": "3273010001",
"subdistrict": "Sukasari",
"district": "Sukasari",
"city": "Kota Bandung",
"province": "Jawa Barat",
"postal_code": "40151",
"full_text": "40151 jawa barat kota bandung sukasari sukasari",
"bps": {
"subdistrict": {"code": "3273010001", "name": "Sukasari"},
"district": {"code": "3273010", "name": "Sukasari"},
"city": {"code": "3273", "name": "Bandung"},
"province": {"code": "32", "name": "Jawa Barat"}
},
"scores": {
"fts": 6.82,
"subdistrict": 0.96,
"district": 0.94,
"city": 0.91,
"province": 0.88
}
}
]In addition to the general search endpoint, the API provides specific search endpoints for each administrative level. All use Jaro-Winkler fuzzy matching (≥ 0.8), order by similarity, and return up to 10 results by default. They also honour the shared limit, include_bps, and include_scores toggles.
-
District Search:
/v1/search/district?q={district}&city={city}&province={province}&limit={n}&include_bps={bool}&include_scores={bool}qis required;cityandprovinceare optional narrowing filters.citymatches both Kota and Kabupaten prefixes automatically.
-
Subdistrict Search:
/v1/search/subdistrict?q={subdistrict}&district={district}&city={city}&province={province}&limit={n}&include_bps={bool}&include_scores={bool}qis required;district,city, andprovinceare optional narrowing filters.citymatches both Kota and Kabupaten prefixes automatically.
-
City Search:
/v1/search/city?q={city}&limit={n}&include_bps={bool}&include_scores={bool}qis required; matches both Kota and Kabupaten.
-
Province Search:
/v1/search/province?q={province}&limit={n}&include_bps={bool}&include_scores={bool}qis required.
GET /v1/search/district?q={district}&city={city}&province={province}&limit={n}&include_bps={bool}&include_scores={bool}
Parameters:
q(required): District name to match (e.g., "sukasari").city(optional): Narrow by city/regency (no need for Kota/Kabupaten).province(optional): Narrow by province.
Example Requests:
curl "http://localhost:8080/v1/search/district?q=Cidadap"
curl "http://localhost:8080/v1/search/district?q=Cidadap&city=Bandung&province=Jawa Barat"GET /v1/search/subdistrict?q={subdistrict}&district={district}&city={city}&province={province}&limit={n}&include_bps={bool}&include_scores={bool}
Parameters:
q(required): Subdistrict name (e.g., "sukasari").district(optional): Narrow by district.city(optional): Narrow by city/regency (Kota/Kabupaten handled automatically).province(optional): Narrow by province.
Example Requests:
curl "http://localhost:8080/v1/search/subdistrict?q=Sukasari"
curl "http://localhost:8080/v1/search/subdistrict?q=Sukasari&district=Sukasari&city=Bandung&province=Jawa Barat"GET /v1/search/city?q={query}&limit={n}&include_bps={bool}&include_scores={bool}
Parameters:
q(required): Search query string (e.g., "bandung")
Example Request:
curl "http://localhost:8080/v1/search/city?q=bandung"GET /v1/search/province?q={query}&limit={n}&include_bps={bool}&include_scores={bool}
Parameters:
q(required): Search query string (e.g., "jawa")
Example Request:
curl "http://localhost:8080/v1/search/province?q=jawa"GET /v1/search/postal/{postalCode}?limit={n}&include_bps={bool}&include_scores={bool}
Parameters:
postalCode(required): 5-digit postal code (e.g., "10110")
Example Request:
curl "http://localhost:8080/v1/search/postal/10110"Example Response:
[
{
"id": "3101010001",
"subdistrict": "Kepulauan Seribu Utara",
"district": "Kepulauan Seribu Utara",
"city": "Kabupaten Kepulauan Seribu",
"province": "DKI Jakarta",
"postal_code": "10110",
"full_text": "dki jakarta kabupaten kepulauan seribu kepulauan seribu utara kepulauan seribu utara"
}
]The postal code search endpoint:
- Takes a required
postalCodepath parameter containing a 5-digit postal code - Returns a JSON array of matching regions with that postal code
- Performs an exact match on the postal code
- Limits results to 10 items
- Returns the same Region structure as other search endpoints
- Returns a 404 error if no regions are found for the provided postal code
- Returns a 400 error if the postal code is not a valid 5-digit number
GET /healthz
Example Request:
curl "http://localhost:8080/healthz"Example Response:
{
"status": "ok",
"message": "Service is healthy"
}The application can be configured using the following environment variables:
| Variable | Description | Default Value |
|---|---|---|
PORT |
Port for the API server to listen on | 8080 |
DB_PATH |
Path to the DuckDB database file. The API opens it read-only; the ingestor opens it read-write. | data/regions.duckdb |
DATA_DIR |
Base directory containing SQL dumps used by the ingestor (wilayah.sql, wilayah_kodepos.sql, bps_wilayah.sql) |
data/ |
- Go 1.21 or higher
- curl (for downloading data)
- Docker (optional, for containerized deployment)
The easiest way to get started is by using the provided Makefile:
# Download the administrative data and prepare the database
make prepare-db
# Run the API server
make run-
Download the data:
curl -o data/wilayah.sql https://raw.githubusercontent.com/cahyadsn/wilayah/master/db/wilayah.sql
-
Prepare the database:
go run ./cmd/ingestor/main.go
-
Run the API server:
go run ./cmd/api/main.go
-
Build the Docker image:
docker build -t regions-api . -
Run the container:
docker run -p 8080:8080 regions-api
To build and push the Docker image to a container registry:
# Build the image
docker build -t your-registry/regions-api:latest .
# Push to container registry
docker push your-registry/regions-api:latest-
Install the Fly.io CLI
-
Create a fly.toml file:
app = "regions-api" [build] dockerfile = "Dockerfile" [env] PORT = "8080" [[services]] internal_port = 8080 protocol = "tcp" [[services.ports]] port = 80 handlers = ["http"] [[services.ports]] port = 443 handlers = ["tls", "http"]
-
Deploy:
flyctl launch
- Connect your GitHub repository to Railway
- Set environment variables in Railway dashboard:
- PORT: 8080
- Railway will automatically build and deploy using the Dockerfile
- Create a new app and connect your repository
- Set environment variables:
- PORT: 8080
- Set the build command to:
docker build -t regions-api . - Set the run command to:
docker run -p $PORT:8080 regions-api
To update the regions database with new administrative data:
-
Download the latest data:
make download-data
-
Reprocess the data:
make ingest
Or run the ingestor manually:
go run ./cmd/ingestor/main.go
This process will:
- Download the latest
wilayah.sqlfile - Create a new
regions.duckdbdatabase - Transform the hierarchical data into a denormalized table for efficient searching
- Clean up temporary tables to keep the database file small
| Command | Description |
|---|---|
make prepare-db |
Download data and run ingestor (recommended for first run) |
make run |
Run the API server |
make ingest |
Run the data ingestor |
make download-data |
Download the SQL data file |
make build |
Build the API binary |
make docker-build |
Build Docker image |
make docker-run |
Run Docker container |
make test |
Run tests |
make clean |
Clean build artifacts |
make deps |
Install dependencies |
make help |
Show help message |
We would like to express our gratitude to cahyadsn for contributing the Indonesian administrative regions data that powers this API. The data is sourced from the wilayah repository, which provides comprehensive and up-to-date information about Indonesian provinces, cities, districts, and subdistricts.
.
├── cmd/
│ ├── api/ # Main application entrypoint
│ └── ingestor/ # Data ingestion script
├── data/
│ ├── regions.duckdb # DuckDB database file (generated)
│ └── wilayah.sql # Raw SQL data file (downloaded)
├── internal/
│ └── api/ # API handlers and routing
├── Dockerfile # Docker configuration
├── Makefile # Build and run commands
├── go.mod # Go module file
└── go.sum # Go checksum file