Carer Project

This project is designed to crawl, process, and query oncology-related articles from the web. It includes components for web scraping, data storage, summarization, keyword extraction, and vector-based search.

Installation

Clone the repository:

git clone https://github.com/suryansh2207/Carer-project.git
cd Carer-project

Usage

To run the entire pipeline, execute the run_all.sh script:

./run_all.sh

This script will:

Start the necessary services.
Set up the vector store.
Process and store articles.
Run a query interface for searching articles.

Components

Crawler

The crawler.py script is responsible for crawling oncology-related articles from the web and storing them in the MySQL database.

Summarizer

The summarizer.py script processes articles to generate summaries and extract keywords using pre-trained models.

Query

The query.py script provides functionality to search articles based on query text, including vector-based similarity search.

Vector Store

The vector_store.py script initializes the vector store and processes articles for vector-based search.

Configuration

The config.in file contains configuration settings for the project, including database connection details and Milvus settings.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
volumes		volumes
README.md		README.md
carer-sql.sql		carer-sql.sql
config.in		config.in
crawler.py		crawler.py
docker-compose.yml		docker-compose.yml
query.py		query.py
requirements.txt		requirements.txt
run_all.sh		run_all.sh
start_services.bat		start_services.bat
summarizer.py		summarizer.py
vector_store.py		vector_store.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Carer Project

Table of Contents

Installation

Usage

This script will:

Components

Configuration

About

Uh oh!

Releases

Packages

Languages

suryansh2207/Carer-project

Folders and files

Latest commit

History

Repository files navigation

Carer Project

Table of Contents

Installation

Usage

This script will:

Components

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages