UNDER CONSTRUCTION PLEASE MOVE WITH CAUTION

A digitization pipeline for a typical Biochemistry Lab

Description

So this is a complete backend with event based arcihtecture for OCR. In this API a user can input "jpg,jpeg,png,pdfs" and this over the counter OCR will store them accordingly in mongodb after running ocr on each of them. Rigt now testing jpeg,jpeg,png. Later would move onto pdfs.

Setup

install docker You could follow a yt tutorial like this one: https://www.youtube.com/watch?v=-EXlfSsP49A&t=331s (windows guys get a mac pls :( ) or this is the official docker link to help you get started: https://docs.docker.com/desktop/
install mongodb

brew install mongodb-community@8.0

Once its successfully installed you could start with project specific setup

Clone this repo

git clone https://github.com/A1pha-Z3r0/Open-source-Chem-Lab-digitization-pipeline.git

Create a virtual environment and activate it

python -m venv venv
source venv/bin/activate

Install requirements

pip install -r requirements.txt

To start the app and celery

Start a rabbit-mq server on a docker container

docker run docker run --hostname rabbitmq --name rabbit-mq -p 15672:15672 -p 5672:5672 rabbitmq:3-management

If already has been created and stopped just do:

docker start rabbit-mq

# to stop
docker stop rabbit-mq

Now to log in to the server open your brave(it's 2025) and paste: http://localhost:15672

Start mongodb server

# make sure mongoDB is started
brew services start mongodb-community@8.0

# ADDITIONAL:
# To stop:
brew services stop mongodb-community@8.0

Run worker the celery worker

PYTHONPATH=./src celery -A celery_app worker --loglevel=INFO --queues=processing

Run celery beat

PYTHONPATH=./src celery -A celery_app beat --loglevel=info

Run fastAPI from inside src

uvicorn main:app --reload

# to kill port usage if its already in use
lsof -i :8000
kill -9 12345

File Structure

The main src folder contains these important folders

├── api                             # All the API routes
│   ├── file_upload_routes.py
│   └── keyword_search.py
├── main.py                         # Entry point
├── repositories                    # The DB and its functions
│   ├── db_config.py
│   └── db_utils.py
├── schemas                         # Schema Validators
│   └── validate.py
├── services                        # Main service logic for OCR
│   ├── __init__.py
│   ├── file_upload_to_db.py
│   ├── files_reader.py
│   ├── ocr_pipeline.py
│   ├── ocr.py
│   ├── task.py
│   └── test_normalize.py
└── utils                           # Genral utils for OCR
    ├── __init__.py
    ├── image_preprocess.py
    └── tensor_converter.py

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
src		src
test		test
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
workflow.md		workflow.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UNDER CONSTRUCTION PLEASE MOVE WITH CAUTION

A digitization pipeline for a typical Biochemistry Lab

Description

Setup

File Structure

About

Uh oh!

Releases

Packages

Languages

A1pha-Z3r0/Open-source-Chem-Lab-digitization-pipeline

Folders and files

Latest commit

History

Repository files navigation

UNDER CONSTRUCTION PLEASE MOVE WITH CAUTION

A digitization pipeline for a typical Biochemistry Lab

Description

Setup

File Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages