A Python tool that automatically downloads research papers from arXiv based on topics you specify. Search by keywords, authors, categories, or specific paper IDs and download PDFs with ease!
- Smart Search: Search papers by keywords, topics, or phrases
- Author Search: Find all papers by specific researchers
- Category Filtering: Filter by arXiv categories (AI, ML, Computer Vision, etc.)
- Recent Papers: Find papers published in the last N days
- Batch Download: Download multiple papers at once
- Specific Downloads: Download papers by arXiv ID
- Preview Mode: See search results before downloading
- Interactive Mode: User-friendly interactive interface
- Auto-Organization: Automatic topic-based folder creation and file naming
- Topic Folders: Each search creates its own organized folder
- Clone or download this project to your computer
- Navigate to the project directory:
cd "Research Paper Extractor"
- Install dependencies:
pip install -r requirements.txt
# Search for machine learning papers and download them
python main.py search "machine learning" --max-results 5
# Preview results without downloading
python main.py search "neural networks" --preview-only
# Auto-download without confirmation
python main.py search "computer vision" --auto-download# Download a specific paper by its arXiv ID
python main.py download-by-id 2301.07041# Find papers by a specific researcher
python main.py search-by-author "Geoffrey Hinton"# Launch interactive mode for guided searching
python main.py interactive# Search for AI papers in specific categories
python main.py search "artificial intelligence" \
--categories cs.AI \
--categories cs.LG \
--max-results 10# Find papers from the last 7 days
python main.py search "deep learning" \
--recent-days 7 \
--max-results 5# Download to a specific folder
python main.py search "robotics" \
--download-dir "/path/to/my/papers" \
--max-results 3# Sort by publication date (newest first)
python main.py search "nlp" \
--sort-by submittedDate \
--max-results 5Search and download papers by topic/keywords.
Options:
--max-results, -n: Number of papers to find (default: 10)--download-dir, -d: Download directory (default: ./downloads)--categories, -c: arXiv categories to search in--sort-by: Sort by relevance, lastUpdatedDate, or submittedDate--preview-only, -p: Only show results, don't download--auto-download, -a: Download all without asking--recent-days: Only show papers from last N days
Download a paper using its arXiv ID.
Options:
--download-dir, -d: Download directory--filename, -f: Custom filename
Find papers by a specific author.
Options:
--max-results, -n: Number of papers to find--download-dir, -d: Download directory--preview-only, -p: Only preview results
Show all available arXiv categories.
Launch guided interface for easy searching.
Common categories you can use with --categories:
| Category | Description |
|---|---|
cs.AI |
Artificial Intelligence |
cs.LG |
Machine Learning |
cs.CV |
Computer Vision and Pattern Recognition |
cs.CL |
Computation and Language (NLP) |
cs.NE |
Neural and Evolutionary Computing |
stat.ML |
Machine Learning (Statistics) |
cs.CR |
Cryptography and Security |
cs.DB |
Databases |
cs.IR |
Information Retrieval |
cs.SE |
Software Engineering |
See all categories: python main.py categories
You can modify settings in config.py:
- Download directory: Change
DEFAULT_DOWNLOAD_DIR - Request delay: Adjust
REQUEST_DELAY(be respectful to arXiv servers!) - Max results: Change
DEFAULT_MAX_RESULTS - File naming: Modify
sanitize_filename()function
NEW! Each search topic automatically creates its own organized folder:
downloads/
├── machine_learning/
│ ├── Neural_Networks_2301.07041.pdf
│ └── Deep_Learning_Basics_2302.12345.pdf
├── computer_vision/
│ ├── Image_Recognition_2303.67890.pdf
│ └── Object_Detection_2304.11111.pdf
├── author_geoffrey_hinton/
│ └── Hinton_Research_2305.22222.pdf
└── paper_1706.03762/
└── Attention_Is_All_You_Need_1706.03762.pdf
- Start with preview mode (
-p) to see what you'll get before downloading - Use specific keywords for better results
- Combine categories to narrow down search scope
- Be respectful - don't download hundreds of papers at once
- Check recent papers using
--recent-daysfor cutting-edge research - Use interactive mode if you're new to the tool
"No papers found"
- Try broader keywords
- Check spelling
- Remove category filters and search again
"Download failed"
- Check internet connection
- Some papers might not have PDFs available
- Try again later (temporary server issues)
"Permission denied"
- Check write permissions in download directory
- Try a different download directory
The tool provides detailed logging. Check the console output for specific error messages.
requests- HTTP requestsfeedparser- Parse arXiv API responsesbeautifulsoup4- HTML parsinglxml- XML/HTML processingtqdm- Progress barsclick- Command-line interfacepython-dateutil- Date handling
Feel free to submit issues and enhancement requests!
This project is licensed under the MIT License - see the LICENSE file for details.
For comprehensive usage examples and advanced features, see: USAGE_EXAMPLES.md
This detailed guide includes:
- All command examples with explanations
- Advanced usage patterns
- Research workflow examples
- Pro tips and best practices
Happy researching!
For more help: python main.py --help or python main.py [command] --help
Name: Sreeram
Email: sreeram.lagisetty@gmail.com
GitHub: Sreeram5678
Instagram: @sreeram_3012
Repository: Research Paper Extractor