Research Paper Extractor

A Python tool that automatically downloads research papers from arXiv based on topics you specify. Search by keywords, authors, categories, or specific paper IDs and download PDFs with ease!

Features

Smart Search: Search papers by keywords, topics, or phrases
Author Search: Find all papers by specific researchers
Category Filtering: Filter by arXiv categories (AI, ML, Computer Vision, etc.)
Recent Papers: Find papers published in the last N days
Batch Download: Download multiple papers at once
Specific Downloads: Download papers by arXiv ID
Preview Mode: See search results before downloading
Interactive Mode: User-friendly interactive interface
Auto-Organization: Automatic topic-based folder creation and file naming
Topic Folders: Each search creates its own organized folder

Installation

Clone or download this project to your computer
Navigate to the project directory:
```
cd "Research Paper Extractor"
```
Install dependencies:
```
pip install -r requirements.txt
```

Quick Start

Basic Search and Download

# Search for machine learning papers and download them
python main.py search "machine learning" --max-results 5

# Preview results without downloading
python main.py search "neural networks" --preview-only

# Auto-download without confirmation
python main.py search "computer vision" --auto-download

Download by arXiv ID

# Download a specific paper by its arXiv ID
python main.py download-by-id 2301.07041

Search by Author

# Find papers by a specific researcher
python main.py search-by-author "Geoffrey Hinton"

Interactive Mode (Recommended for beginners)

# Launch interactive mode for guided searching
python main.py interactive

Usage Examples

1. Search with Category Filters

# Search for AI papers in specific categories
python main.py search "artificial intelligence" \
  --categories cs.AI \
  --categories cs.LG \
  --max-results 10

2. Find Recent Papers

# Find papers from the last 7 days
python main.py search "deep learning" \
  --recent-days 7 \
  --max-results 5

3. Custom Download Directory

# Download to a specific folder
python main.py search "robotics" \
  --download-dir "/path/to/my/papers" \
  --max-results 3

4. Sort Results

# Sort by publication date (newest first)
python main.py search "nlp" \
  --sort-by submittedDate \
  --max-results 5

Available Commands

`search` - Main search command

Search and download papers by topic/keywords.

Options:

--max-results, -n: Number of papers to find (default: 10)
--download-dir, -d: Download directory (default: ./downloads)
--categories, -c: arXiv categories to search in
--sort-by: Sort by relevance, lastUpdatedDate, or submittedDate
--preview-only, -p: Only show results, don't download
--auto-download, -a: Download all without asking
--recent-days: Only show papers from last N days

`download-by-id` - Download specific paper

Download a paper using its arXiv ID.

Options:

--download-dir, -d: Download directory
--filename, -f: Custom filename

`search-by-author` - Author search

Find papers by a specific author.

Options:

--max-results, -n: Number of papers to find
--download-dir, -d: Download directory
--preview-only, -p: Only preview results

`categories` - List categories

Show all available arXiv categories.

`interactive` - Interactive mode

Launch guided interface for easy searching.

arXiv Categories

Common categories you can use with --categories:

Category	Description
`cs.AI`	Artificial Intelligence
`cs.LG`	Machine Learning
`cs.CV`	Computer Vision and Pattern Recognition
`cs.CL`	Computation and Language (NLP)
`cs.NE`	Neural and Evolutionary Computing
`stat.ML`	Machine Learning (Statistics)
`cs.CR`	Cryptography and Security
`cs.DB`	Databases
`cs.IR`	Information Retrieval
`cs.SE`	Software Engineering

See all categories: python main.py categories

Configuration

You can modify settings in config.py:

Download directory: Change DEFAULT_DOWNLOAD_DIR
Request delay: Adjust REQUEST_DELAY (be respectful to arXiv servers!)
Max results: Change DEFAULT_MAX_RESULTS
File naming: Modify sanitize_filename() function

File Organization

NEW! Each search topic automatically creates its own organized folder:

downloads/
├── machine_learning/
│   ├── Neural_Networks_2301.07041.pdf
│   └── Deep_Learning_Basics_2302.12345.pdf
├── computer_vision/
│   ├── Image_Recognition_2303.67890.pdf
│   └── Object_Detection_2304.11111.pdf
├── author_geoffrey_hinton/
│   └── Hinton_Research_2305.22222.pdf
└── paper_1706.03762/
    └── Attention_Is_All_You_Need_1706.03762.pdf

Tips & Best Practices

Start with preview mode (-p) to see what you'll get before downloading
Use specific keywords for better results
Combine categories to narrow down search scope
Be respectful - don't download hundreds of papers at once
Check recent papers using --recent-days for cutting-edge research
Use interactive mode if you're new to the tool

Troubleshooting

Common Issues

"No papers found"

Try broader keywords
Check spelling
Remove category filters and search again

"Download failed"

Check internet connection
Some papers might not have PDFs available
Try again later (temporary server issues)

"Permission denied"

Check write permissions in download directory
Try a different download directory

Error Logs

The tool provides detailed logging. Check the console output for specific error messages.

Dependencies

requests - HTTP requests
feedparser - Parse arXiv API responses
beautifulsoup4 - HTML parsing
lxml - XML/HTML processing
tqdm - Progress bars
click - Command-line interface
python-dateutil - Date handling

Contributing

Feel free to submit issues and enhancement requests!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Complete Usage Guide

For comprehensive usage examples and advanced features, see: USAGE_EXAMPLES.md

This detailed guide includes:

All command examples with explanations
Advanced usage patterns
Research workflow examples
Pro tips and best practices

Happy researching!

For more help: python main.py --help or python main.py [command] --help

Author

Name: Sreeram
Email: sreeram.lagisetty@gmail.com
GitHub: Sreeram5678
Instagram: @sreeram_3012

Repository: Research Paper Extractor

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
docs		docs
research_paper_extractor		research_paper_extractor
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
GITHUB_OPTIMIZATION.md		GITHUB_OPTIMIZATION.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PRE_LAUNCH_CHECKLIST.md		PRE_LAUNCH_CHECKLIST.md
README.md		README.md
USAGE_EXAMPLES.md		USAGE_EXAMPLES.md
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Research Paper Extractor

Features

Installation

Quick Start

Basic Search and Download

Download by arXiv ID

Search by Author

Interactive Mode (Recommended for beginners)

Usage Examples

1. Search with Category Filters

2. Find Recent Papers

3. Custom Download Directory

4. Sort Results

Available Commands

search - Main search command

download-by-id - Download specific paper

search-by-author - Author search

categories - List categories

interactive - Interactive mode

arXiv Categories

Configuration

File Organization

Tips & Best Practices

Troubleshooting

Common Issues

Error Logs

Dependencies

Contributing

License

Complete Usage Guide

Author

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`search` - Main search command

`download-by-id` - Download specific paper

`search-by-author` - Author search

`categories` - List categories

`interactive` - Interactive mode

Packages