Skip to content

jayaram0528/bill-extraction-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Hospital Bill Extraction API

A FastAPI-based service that extracts line item details from multi-page hospital bills using OCR + OpenAI Vision.
The API processes PDFs or image URLs, reads every page, and returns structured JSON with:

  • Page-wise line items
  • Quantity, Rate, Amount mapping
  • Page-type classification (Bill Detail / Final Bill / Pharmacy)
  • Total item count
  • Strict JSON format required by the Hackathon evaluation

This repository contains the code required to deploy and run the Bill Extraction API.


πŸš€ Running the Project Locally

1. Install dependencies

Create a virtual environment (optional) and install dependencies:

pip install -r requirements.txt
2. Create your .env file (⚠ Do NOT upload this to GitHub)
Inside the root folder of the project, create a file named .env:

ini
Copy code
OPENAI_API_KEY=your_openai_api_key_here
This key is required for the Vision model to process images.

3. Start the FastAPI server
bash
Copy code
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
You should now see:

nginx
Copy code
Uvicorn running on http://127.0.0.1:8000
4. Open API Documentation
FastAPI provides Swagger UI at:

arduino
Copy code
http://127.0.0.1:8000/docs
You can test the endpoint directly from this page.

🧠 API Details
πŸ“Œ POST /extract-bill-data
Extracts line items from a given PDF or image URL.

Request Body
json
Copy code
{
  "document": "https://example.com/path/to/bill.pdf"
}
Response Structure
json
Copy code
{
  "is_success": true,
  "token_usage": {
    "total_tokens": 0,
    "input_tokens": 0,
    "output_tokens": 0
  },
  "data": {
    "pagewise_line_items": [
      {
        "page_no": "1",
        "page_type": "Bill Detail",
        "bill_items": [
          {
            "item_name": "string",
            "item_amount": 0.0,
            "item_rate": 0.0,
            "item_quantity": 0.0
          }
        ]
      }
    ],
    "total_item_count": 0
  }
}
This response strictly follows the Hackathon specification.

πŸ” How it Works Internally
The API follows this pipeline:

Download the PDF / Image

Convert PDF to page images

OCR-based preprocessing

Send each page to OpenAI Vision (gpt-4o-mini)

Parse and clean extracted line items

Reconcile amounts, remove headers/totals

Return strict JSON format

Each page is processed separately β†’ accurate extraction for 1–10+ page bills.

πŸ“ Project Structure
graphql
Copy code
BillExtractor-Git/
 β”œβ”€β”€ app/
 β”‚    β”œβ”€β”€ main.py               # FastAPI routes
 β”‚    β”œβ”€β”€ llm_extractor.py      # OpenAI Vision extraction logic
 β”‚    β”œβ”€β”€ layout_utils.py       # PDF β†’ image β†’ page metadata
 β”‚    β”œβ”€β”€ ocr_utils.py          # OCR preprocessing
 β”‚    β”œβ”€β”€ reconcile.py          # Cleans & validates numbers, filters totals
 β”‚    β”œβ”€β”€ rule_extractor.py     # Optional rule-based logic
 β”‚    β”œβ”€β”€ models.py             # Pydantic models
 β”‚    └── __pycache__/          # Ignored by git
 β”œβ”€β”€ requirements.txt
 β”œβ”€β”€ .gitignore
 └── README.md
🌍 Deployment via ngrok (for Hackathon Submission)
Start the FastAPI server:

bash
Copy code
uvicorn app.main:app --host 0.0.0.0 --port 8000
In another terminal:

bash
Copy code
ngrok http 8000
ngrok will give:

nginx
Copy code
Forwarding  https://xxxxxx.ngrok-free.dev -> http://localhost:8000
Use that URL in the Hackathon Portal:

arduino
Copy code
https://xxxxxx.ngrok-free.dev/extract-bill-data
πŸ“Œ Notes
.env must NOT be included in GitHub (your key stays private).

.venv/ or any virtual environment should also not be uploaded.

Final JSON output strictly matches the evaluation schema.

Supports multi-page PDFs with 2–10+ pages.

πŸ“ž Support
If you need help running the API or testing it, feel free to reach out.

Happy Extracting! πŸš€

yaml
Copy code

---

# βœ… Ready to go!
Just copy this entire block into your `README.md` file.

Let me know when you're ready β€” I can also generate:
βœ” a `run.sh`  
βœ” a Windows `run.bat`  
βœ” a `start_server.py` auto-ngrok  
βœ” a clean `requirements.txt` optimized for size  

Just say **"generate run script"** if you need it.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages