🧬 NCBI Fetcher (PGX)

A small Go application that downloads gzipped FASTA sequences from the NCBI Short Reads Archive, extracts them, and stores them in a PostgreSQL database.

📦 Requirements

Make sure you have the following installed:

Go ≥ 1.20
Install Go
PostgreSQL ≥ 13
Install PostgreSQL
Internet access to reach NCBI’s trace.ncbi.nlm.nih.gov API

🛠️ Installation

Clone or unzip the project:

unzip ncbi-fetcher-pgx-updated.zip
cd ncbi-fetcher-pgx

Initialize Go modules and download dependencies:
```
go mod tidy
```
Create a PostgreSQL database (e.g., dna_sequences):
```
createdb dna_sequences
```

Configure environment variables
(these are used by the app to connect to PostgreSQL):

export DB_HOST=localhost
export DB_PORT=5432
export DB_USER=postgres
export DB_PASSWORD=yourpassword
export DB_NAME=dna_sequences

You can also edit config/config.go to set your defaults.

▶️ Running the App

Fetch sequences for a specific SRA accession number (e.g., SRR35830121):

go run main.go SRR35830121

The program will:

Download the gzipped FASTA file from NCBI.
Decompress it in memory.
Parse all sequences.
Create the sequences table (if it doesn’t exist).
Insert all sequences into PostgreSQL.

🧱 Database Schema

Column	Type	Description
id	SERIAL PK	Auto-incrementing ID
accession	TEXT	SRA accession number
header	TEXT	FASTA header line (no ">")
sequence	TEXT	Nucleotide sequence
source_file	TEXT	Original `.gz` file name
created_at	TIMESTAMP	Default `NOW()`

🧩 Example Query

To verify data insertion:

SELECT accession, header, LENGTH(sequence) AS seq_len
FROM sequences
LIMIT 5;

🧰 Troubleshooting

failed to fetch data: 404
→ The accession ID might not exist or NCBI is temporarily unavailable.
connection refused
→ Check your PostgreSQL connection parameters or pg_hba.conf.
gzip: invalid header
→ Ensure the endpoint returns a .gz file and not plain FASTA.

📘 Notes

You can modify the database connection defaults in config/config.go.
The app uses the pgx/v5 library for high-performance database access.
To improve speed for large datasets, you can later add batching or concurrent insert workers.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
config		config
db		db
models		models
ncbi		ncbi
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 NCBI Fetcher (PGX)

📦 Requirements

🛠️ Installation

▶️ Running the App

🧱 Database Schema

🧩 Example Query

🧰 Troubleshooting

📘 Notes

About

Uh oh!

Releases

Packages

Languages

lemmerelassal/ncbi-sra-fetcher

Folders and files

Latest commit

History

Repository files navigation

🧬 NCBI Fetcher (PGX)

📦 Requirements

🛠️ Installation

▶️ Running the App

🧱 Database Schema

🧩 Example Query

🧰 Troubleshooting

📘 Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages