A FastAPI-based service that extracts line item details from multi-page hospital bills
using OCR + OpenAI Vision.
The API processes PDFs or image URLs, reads every page, and returns structured JSON with:
- Page-wise line items
- Quantity, Rate, Amount mapping
- Page-type classification (Bill Detail / Final Bill / Pharmacy)
- Total item count
- Strict JSON format required by the Hackathon evaluation
This repository contains the code required to deploy and run the Bill Extraction API.
Create a virtual environment (optional) and install dependencies:
pip install -r requirements.txt
2. Create your .env file (β Do NOT upload this to GitHub)
Inside the root folder of the project, create a file named .env:
ini
Copy code
OPENAI_API_KEY=your_openai_api_key_here
This key is required for the Vision model to process images.
3. Start the FastAPI server
bash
Copy code
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
You should now see:
nginx
Copy code
Uvicorn running on http://127.0.0.1:8000
4. Open API Documentation
FastAPI provides Swagger UI at:
arduino
Copy code
http://127.0.0.1:8000/docs
You can test the endpoint directly from this page.
π§ API Details
π POST /extract-bill-data
Extracts line items from a given PDF or image URL.
Request Body
json
Copy code
{
"document": "https://example.com/path/to/bill.pdf"
}
Response Structure
json
Copy code
{
"is_success": true,
"token_usage": {
"total_tokens": 0,
"input_tokens": 0,
"output_tokens": 0
},
"data": {
"pagewise_line_items": [
{
"page_no": "1",
"page_type": "Bill Detail",
"bill_items": [
{
"item_name": "string",
"item_amount": 0.0,
"item_rate": 0.0,
"item_quantity": 0.0
}
]
}
],
"total_item_count": 0
}
}
This response strictly follows the Hackathon specification.
π How it Works Internally
The API follows this pipeline:
Download the PDF / Image
Convert PDF to page images
OCR-based preprocessing
Send each page to OpenAI Vision (gpt-4o-mini)
Parse and clean extracted line items
Reconcile amounts, remove headers/totals
Return strict JSON format
Each page is processed separately β accurate extraction for 1β10+ page bills.
π Project Structure
graphql
Copy code
BillExtractor-Git/
βββ app/
β βββ main.py # FastAPI routes
β βββ llm_extractor.py # OpenAI Vision extraction logic
β βββ layout_utils.py # PDF β image β page metadata
β βββ ocr_utils.py # OCR preprocessing
β βββ reconcile.py # Cleans & validates numbers, filters totals
β βββ rule_extractor.py # Optional rule-based logic
β βββ models.py # Pydantic models
β βββ __pycache__/ # Ignored by git
βββ requirements.txt
βββ .gitignore
βββ README.md
π Deployment via ngrok (for Hackathon Submission)
Start the FastAPI server:
bash
Copy code
uvicorn app.main:app --host 0.0.0.0 --port 8000
In another terminal:
bash
Copy code
ngrok http 8000
ngrok will give:
nginx
Copy code
Forwarding https://xxxxxx.ngrok-free.dev -> http://localhost:8000
Use that URL in the Hackathon Portal:
arduino
Copy code
https://xxxxxx.ngrok-free.dev/extract-bill-data
π Notes
.env must NOT be included in GitHub (your key stays private).
.venv/ or any virtual environment should also not be uploaded.
Final JSON output strictly matches the evaluation schema.
Supports multi-page PDFs with 2β10+ pages.
π Support
If you need help running the API or testing it, feel free to reach out.
Happy Extracting! π
yaml
Copy code
---
# β
Ready to go!
Just copy this entire block into your `README.md` file.
Let me know when you're ready β I can also generate:
β a `run.sh`
β a Windows `run.bat`
β a `start_server.py` auto-ngrok
β a clean `requirements.txt` optimized for size
Just say **"generate run script"** if you need it.