Hospital Bill Extraction API

A FastAPI-based service that extracts line item details from multi-page hospital bills using OCR + OpenAI Vision.
The API processes PDFs or image URLs, reads every page, and returns structured JSON with:

Page-wise line items
Quantity, Rate, Amount mapping
Page-type classification (Bill Detail / Final Bill / Pharmacy)
Total item count
Strict JSON format required by the Hackathon evaluation

This repository contains the code required to deploy and run the Bill Extraction API.

🚀 Running the Project Locally

1. Install dependencies

Create a virtual environment (optional) and install dependencies:

pip install -r requirements.txt
2. Create your .env file (⚠ Do NOT upload this to GitHub)
Inside the root folder of the project, create a file named .env:

ini
Copy code
OPENAI_API_KEY=your_openai_api_key_here
This key is required for the Vision model to process images.

3. Start the FastAPI server
bash
Copy code
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
You should now see:

nginx
Copy code
Uvicorn running on http://127.0.0.1:8000
4. Open API Documentation
FastAPI provides Swagger UI at:

arduino
Copy code
http://127.0.0.1:8000/docs
You can test the endpoint directly from this page.

🧠 API Details
📌 POST /extract-bill-data
Extracts line items from a given PDF or image URL.

Request Body
json
Copy code
{
  "document": "https://example.com/path/to/bill.pdf"
}
Response Structure
json
Copy code
{
  "is_success": true,
  "token_usage": {
    "total_tokens": 0,
    "input_tokens": 0,
    "output_tokens": 0
  },
  "data": {
    "pagewise_line_items": [
      {
        "page_no": "1",
        "page_type": "Bill Detail",
        "bill_items": [
          {
            "item_name": "string",
            "item_amount": 0.0,
            "item_rate": 0.0,
            "item_quantity": 0.0
          }
        ]
      }
    ],
    "total_item_count": 0
  }
}
This response strictly follows the Hackathon specification.

🔍 How it Works Internally
The API follows this pipeline:

Download the PDF / Image

Convert PDF to page images

OCR-based preprocessing

Send each page to OpenAI Vision (gpt-4o-mini)

Parse and clean extracted line items

Reconcile amounts, remove headers/totals

Return strict JSON format

Each page is processed separately → accurate extraction for 1–10+ page bills.

📁 Project Structure
graphql
Copy code
BillExtractor-Git/
 ├── app/
 │    ├── main.py               # FastAPI routes
 │    ├── llm_extractor.py      # OpenAI Vision extraction logic
 │    ├── layout_utils.py       # PDF → image → page metadata
 │    ├── ocr_utils.py          # OCR preprocessing
 │    ├── reconcile.py          # Cleans & validates numbers, filters totals
 │    ├── rule_extractor.py     # Optional rule-based logic
 │    ├── models.py             # Pydantic models
 │    └── __pycache__/          # Ignored by git
 ├── requirements.txt
 ├── .gitignore
 └── README.md
🌍 Deployment via ngrok (for Hackathon Submission)
Start the FastAPI server:

bash
Copy code
uvicorn app.main:app --host 0.0.0.0 --port 8000
In another terminal:

bash
Copy code
ngrok http 8000
ngrok will give:

nginx
Copy code
Forwarding  https://xxxxxx.ngrok-free.dev -> http://localhost:8000
Use that URL in the Hackathon Portal:

arduino
Copy code
https://xxxxxx.ngrok-free.dev/extract-bill-data
📌 Notes
.env must NOT be included in GitHub (your key stays private).

.venv/ or any virtual environment should also not be uploaded.

Final JSON output strictly matches the evaluation schema.

Supports multi-page PDFs with 2–10+ pages.

📞 Support
If you need help running the API or testing it, feel free to reach out.

Happy Extracting! 🚀

yaml
Copy code

---

# ✅ Ready to go!
Just copy this entire block into your `README.md` file.

Let me know when you're ready — I can also generate:
✔ a `run.sh`  
✔ a Windows `run.bat`  
✔ a `start_server.py` auto-ngrok  
✔ a clean `requirements.txt` optimized for size  

Just say **"generate run script"** if you need it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hospital Bill Extraction API

🚀 Running the Project Locally

1. Install dependencies

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

jayaram0528/bill-extraction-api

Folders and files

Latest commit

History

Repository files navigation

Hospital Bill Extraction API

🚀 Running the Project Locally

1. Install dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages