Automated research paper summarization from ArXiv with LLM-powered rating, multi-format delivery, and comprehensive configuration management.
The fastest way to get started is with GitHub Actions, which automates the entire pipeline. This method uses repository secrets to dynamically configure the application without needing to commit any configuration files.
Prerequisites:
- A GitHub account
- API keys for your chosen LLM providers
- An SMTP email account for receiving summaries
-
Fork the Repository
Click the "Fork" button at the top-right of this page to create a copy of this repository in your own GitHub account.
-
Configure Repository Secrets
Navigate to your forked repository's Settings > Secrets and variables > Actions. Add the following secrets to configure the pipeline. Only
SUMMARIZER_API_KEY,RATER_API_KEY,SMTP_SERVER,SENDER_EMAIL,RECIPIENT_EMAIL, andSMTP_PASSWORDare strictly required.Secret Required Description SUMMARIZER_API_KEY✅ API key for the summarization LLM provider (modelscope by default). RATER_API_KEY✅ API key for the rating LLM provider (modelscope by default). SMTP_SERVER✅ Your email provider's SMTP server (e.g., smtp.163.com).SENDER_EMAIL✅ The email address for sending summaries. RECIPIENT_EMAIL✅ The email address for receiving summaries. SMTP_PASSWORD✅ The password or app password for your sender email. ARXIV_CATEGORIES❌ Comma-separated ArXiv categories (e.g., cs.AI,cs.CV, default tocs.AI,cs.CV,cs.RO).MAX_PAPERS❌ The maximum number of papers to summarize (default to 5). SUMMARIZER_PROVIDER❌ The LLM provider for summarization (e.g., openai, default tomodelscope).RATER_PROVIDER❌ The LLM provider for rating (e.g., anthropic, default tomodelscope). -
Enable and Run the Workflow
- Go to the Actions tab in your repository.
- If prompted, enable the workflows.
- Select the ArXiv AutoSumm Daily workflow and click Run workflow.
That's it! The workflow will now run on its schedule, delivering summaries to your inbox. For more advanced setups, including local installation and detailed configuration, see our full documentation.
- Automated Paper Processing: Fetches, rates, summarizes, and delivers papers daily.
- Multiple Output Formats: Supports PDF, HTML, Markdown, and AZW3 (Kindle).
- Advanced Caching: Avoids re-processing papers with an SQLite-based cache.
- Flexible Rating: Choose between LLM, embedding, or hybrid rating strategies.
- VLM Parsing: Optional Vision Language Model support for enhanced PDF analysis.
The pipeline processes papers in the following sequence:
- Fetch: Downloads metadata from ArXiv.
- Rate: Selects the most relevant/interesting papers.
- Parse: Extracts content from the PDFs.
- Summarize: Generates summaries with a powerful LLM.
- Render: Creates outputs in your desired formats.
- Deliver: Sends the summaries to you via email.
- Installation Guide: Detailed setup instructions for GitHub Actions and local environments.
- Configuration Guide: Comprehensive reference for all configuration options.
- Troubleshooting & Q&A: Solutions for common issues and frequently asked questions.
This project is licensed under the MIT License - see the LICENSE file for details.