π Automatically updated every Monday via GitHub Actions
π Live demo: https://hqrrr.github.io/weekly-paper-report/
Weekly Paper Report is an automated weekly academic paper monitoring and visualization tool. It automatically retrieves the latest papers from Crossref based on keywords and authors of interest (supports ORCID), generating a static HTML report that can be directly published to GitHub Pages.
This tool is ideal for:
- Tracking the latest literature in specific research areas
- Monitoring recent publications by particular scholars
- Quickly grasping the distribution and trends of research topics within a week
- Weekly Paper Report
-
Automated weekly literature monitoring:
Automatically queries recent publications (rolling time window, default: 7 days) based on user-defined keywords and followed authors.
-
Crossref-based search with ORCID support:
Uses the Crossref API for open bibliographic data retrieval and supports author tracking via ORCID identifiers.
-
Keyword-driven relevance ranking:
Results are ranked by Crossref relevance score, helping surface the most relevant papers first.
-
Topic clustering of search results:
Applies TF-IDF-based clustering on paper titles to group results into thematic clusters and highlight key research directions.
-
Self-contained static HTML report:
Outputs a fully self-contained report (
index.html+ assets) that can be hosted on GitHub Pages or shared offline. -
π Paper title translation:
Paper titles shown in the report are now automatically translated using the DeepL API. Translation is disabled automatically if no API key is provided. A free DeepL API key is available from the official DeepL website, with a monthly quota of 500,000 characters, which is typically sufficient for a weekly paper report.
An example weekly report is available at:
π Weekly Paper Report
π Last report update: 2026-02-16 04:26
- Fork this repository to your own GitHub account.
- Open your fork -> Actions tab -> enable workflows if GitHub asks you to
(scheduled workflows are often disabled by default on forks).
In your fork:
- Go to Settings -> Pages
- Set Source to GitHub Actions
- After the workflow finishes, your report will be published to your GitHub Pages site.
- (Optional) Trigger a manual run once: Actions -> Build and Deploy Report to GitHub Pages -> Run workflow, to verify everything works.
Where to find your report (GitHub Pages URL)
After the workflow has finished successfully, your report will be available at:
https://<your-github-username>.github.io/<repository-name>/You can also find the exact URL in:
- Settings -> Pages (shown under "Your site is live at β¦"), or
- the Deployments page (shown under "github-pages").
Edit these files in your fork (examples see below):
./config/keywords.yaml- keywords used for Crossref search./config/followed_authors.yaml- followed authors (recommended: include ORCID)
Commit and push the changes to your default branch to regenerate the report.
- Example:
keywords.yaml
# List of keywords used for literature search
keywords:
- indoor environmental quality
- IEQ
- thermal comfort
- indoor air quality
- user behaviorEach keyword is queried against Crossref. Use full terms and common abbreviations where appropriate.
- Example:
followed_authors.yaml
# List of followed authors
authors:
- name: Andrew Persily
orcid: "0000-0002-9542-3318"
names:
- Andrew Persily
- A. Persily
nameis used for display and matching in keyword search results.orcidenables precise author-based lookup via Crossref and is strongly recommended.- If
orcidis not provided, the author will only be matched against keyword search results using the names listed innames. In this case, author-based lookup via Crossref is not performed and results may be incomplete.
Crossref recommends providing a contact email (mailto) for polite API usage.
In your fork:
- Go to Settings -> Secrets and variables -> Actions
- Create a new Repository secret:
- Name:
WPR_MAILTO - Secret: your email address
- Name:
You may also get a free DeepL API key from DeepL to configure for translating the paper title.
In your fork:
- Go to Settings -> Secrets and variables -> Actions
- Create a new Repository secret:
- Name:
TRANSLATION_DEEPL_API_KEY - Secret: your DeepL API key
- Name:
Weekly schedule notes (important)
- The workflow is configured to run on a weekly schedule (via
on: schedule).- GitHub may automatically disable scheduled runs for repositories with no activity for ~60 days.
This repository performs a small automated update on each run to keep the scheduled workflow active.
If your weekly updates stop, simply:
- make a small commit (e.g., edit README), and/or
- re-enable the workflow in the Actions tab.
About the update schedule
The update frequency of this report is defined in the GitHub Actions workflow file
(.github/workflows/update_github_pages.yml).The workflow is currently scheduled to run once per week at a fixed time (UTC), as defined in the
on: schedulecron configuration:# every Monday at 02:00 UTC # German time: 03:00 in winter / 04:00 in summer # China time: 10:00 - cron: "0 2 * * 1"If you would like to change the update frequency or execution time, you can edit the
on: schedule(cron) section accordingly.
This Weekly Paper Report tool groups papers into rough "topics" based on their titles (not abstracts), by following steps:
- Text preprocessing + stop words: Weekly Paper Report vectorizes paper titles with TF-IDF and removes:
- scikit-learn's built-in English stop words
- additional domain stop words (e.g., "study", "method", "analysis", "model", "energy", "system", ...), see
stop_words.py(you may customize your own stop words). ngram_range = (1, 2)(unigrams + bigrams)
- Two clustering methods are tried:
- K-Means: For each k β {3,4,5,6,7}, compute cosine silhouette score. Pick the smallest k whose silhouette is within a small margin of the best score (preference for simpler clustering).
- HDBSCAN: First reduce TF-IDF with TruncatedSVD and L2-normalize vectors. The number of components is automatically selected based on explained variance (default: retain 50% of variance,
svd_var_target = 0.5). Then run HDBSCAN withmin_cluster_size = 5.
- For each candidate clustering, compute:
- cosine silhouette (computed on non-noise points for HDBSCAN)
- number of clusters (excluding noise)
- noise ratio (fraction of points labeled
-1) - min cluster size, max cluster share (to detect overly imbalanced solutions)
- Choosing the "best" clustering result. Select the final clustering with a pragmatic rule:
- Reject overly noisy results: if HDBSCAN labels too many points as noise (default
noise_ratio > 0.40), it wonβt be selected. - Compare cosine silhouette:
- HDBSCAN must outperform K-Means by a small margin (default
+0.03) to win. - Otherwise, default to K-Means (more stable and assigns all papers).
- HDBSCAN must outperform K-Means by a small margin (default
- Reject overly noisy results: if HDBSCAN labels too many points as noise (default
This project is intended as a lightweight literature monitoring and exploration tool.
Please consider the following points when using or extending it:
-
Avoid excessive request frequency
Do not configure very short update intervals (e.g., hourly or daily runs). Excessive automated requests may violate the fair-use expectations of public APIs such as Crossref. A weekly schedule is strongly recommended.
-
Results may be incomplete
The report relies on open bibliographic metadata and keyword-based queries. Not all relevant publications may be indexed, linked, or returned by the data sources. Absence of a paper in the report does not imply absence in the literature.
-
Author matching is imperfect
Author-based tracking is most reliable when ORCID identifiers are available. Name-based matching may produce false positives or miss relevant works.
-
Clustering and ranking are heuristic
Topic clustering and relevance ranking are automatically derived from titles and metadata using statistical methods. These results are approximate and should be interpreted as exploratory aids rather than authoritative classifications.
-
Not a substitute for systematic review
This tool is designed to support awareness and discovery, not to replace systematic literature reviews or expert judgment.
Users are encouraged to treat the generated reports as decision-support material and to verify important findings through primary sources.
The visual appearance of the report is controlled by a CSS theme.
By default, the report uses the light theme.
You can select a theme when generating the report in app.py:
# app.py
## Report theme
THEME = "light"Available themes are loaded from the themes/ directory.
| Light | Dark | Paper Light | Soft Blue |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
To add a custom theme:
- Copy an existing theme file, for example:
themes/light.css -> themes/dark.css - Modify the CSS styles in the new file
- Set the theme name accordingly in
app.py:THEME = "dark"
If the specified theme name cannot be found, the report will automatically fall back to the default
lighttheme.
You can also generate the report locally without GitHub Actions:
- Clone the repository:
git clone https://github.com/hqrrr/weekly-paper-report.git cd weekly-paper-report - Create and configure
.envfile (for local runs):# .env WPR_MAILTO=youremail@example.com TRANSLATION_DEEPL_API_KEY=your.deepl.api.key - Install dependencies:
pip install -r requirements.txt - Run the weekly paper report app:
python app.py - Open the generated report in a browser:
./report/index.html
The local workflow is equivalent to the GitHub Actions workflow, except that secrets are provided via
.env.
This project is licensed under the MIT License. See the LICENSE file for details.
The generated reports and retrieved bibliographic metadata are subject to the terms of their respective data providers.




