Skip to content

Export your full publication list using python and Google scholar‬

Notifications You must be signed in to change notification settings

YouvenZ/Auto-Publication-List

Repository files navigation

📚 Publication Pipeline (Google Scholar export + metrics)

Simple pipeline to fetch Google Scholar metrics and export a bibliography file. The script downloads a user's Google Scholar export (BibTeX-like text), writes it to disk, and writes a LaTeX metrics snippet (metrics.tex) using Scholar metrics.

This repository contains a lightweight tool implemented in pipeline_list_publication.py.


🔎 What it does

  • Fetches Google Scholar author data using scholarly (author id).
  • Downloads a citations export from Google Scholar (via an export URL) and saves it (default: own-bib.bib).
  • Generates/updates a LaTeX metrics file (metrics.tex) containing citations, h-index, i10-index, and placeholders for publication counts.
  • Writes logs to publication_pipeline.log.

Requirements

  • Python 3.8+
  • Packages:
    • requests
    • scholarly
    • (built-in) re, os, datetime, json, logging

Install dependencies:

pip install requests scholarly

Note: scholarly may require additional setup depending on your environment/version.


Files produced / important paths

  • own-bib.bib — downloaded bibliography (overwritten each run)
  • metrics.tex — LaTeX snippet with updated metrics
  • publication_pipeline.log — pipeline logging output

Configuration

To customize, create a config.json in the project root and update scholar_id and citsig (and output file names) accordingly. Example config.json:

{
  "scholar_id": "YOUR_SCHOLAR_ID",
  "citsig": "YOUR_CITSIG_IF_REQUIRED",
  "output_bib": "own-bib.bib",
  "metrics_file": "metrics.tex"
}

You need to export all you citations in bibtex and export the ids from the URL

  • scholar_id: Google Scholar user id (the user= value from profile URL).
  • citsig: optional export signature sometimes required for the export URL (the citsig= value from profile URL).

For more detail look at the video here


Usage

Run the pipeline from the project folder (Windows):

python pipeline_list_publication.py

Or, to pass a custom config file (modify the script invocation or edit the initializer in code).

Console/log output indicates success/failure and produced files. On success the script prints the metrics and file names.


Notes & Troubleshooting

  • Google Scholar may block or require CAPTCHA for automated requests. If exports fail, check publication_pipeline.log for details.
  • The scholarly library scrapes Google Scholar and may break if Scholar changes its HTML. Keep scholarly up-to-date.
  • If requests.get for the export returns HTML (captcha) instead of the citations export, you may need a valid citsig, session cookies, or manual export.
  • The script performs minimal cleaning of the downloaded content (removes HTML tags and compresses blank lines). Validate the resulting own-bib.bib if used by BibTeX tools.

Extending

  • Add parsing of the downloaded bibliography to compute publication-type counts (journals, conferences, preprints, patents) and fill the LaTeX placeholders.
  • Replace the default config loader to actually read a config.json file from disk.
  • Add retry/backoff and proxy support for robust fetching.

License

MIT

About

Export your full publication list using python and Google scholar‬

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published