Simple pipeline to fetch Google Scholar metrics and export a bibliography file. The script downloads a user's Google Scholar export (BibTeX-like text), writes it to disk, and writes a LaTeX metrics snippet (metrics.tex) using Scholar metrics.
This repository contains a lightweight tool implemented in pipeline_list_publication.py.
- Fetches Google Scholar author data using
scholarly(author id). - Downloads a citations export from Google Scholar (via an export URL) and saves it (default:
own-bib.bib). - Generates/updates a LaTeX metrics file (
metrics.tex) containing citations, h-index, i10-index, and placeholders for publication counts. - Writes logs to
publication_pipeline.log.
- Python 3.8+
- Packages:
- requests
- scholarly
- (built-in) re, os, datetime, json, logging
Install dependencies:
pip install requests scholarlyNote: scholarly may require additional setup depending on your environment/version.
- own-bib.bib — downloaded bibliography (overwritten each run)
- metrics.tex — LaTeX snippet with updated metrics
- publication_pipeline.log — pipeline logging output
To customize, create a config.json in the project root and update scholar_id and citsig (and output file names) accordingly. Example config.json:
{
"scholar_id": "YOUR_SCHOLAR_ID",
"citsig": "YOUR_CITSIG_IF_REQUIRED",
"output_bib": "own-bib.bib",
"metrics_file": "metrics.tex"
}You need to export all you citations in bibtex and export the ids from the URL
scholar_id: Google Scholar user id (theuser=value from profile URL).citsig: optional export signature sometimes required for the export URL (thecitsig=value from profile URL).
For more detail look at the video here
Run the pipeline from the project folder (Windows):
python pipeline_list_publication.pyOr, to pass a custom config file (modify the script invocation or edit the initializer in code).
Console/log output indicates success/failure and produced files. On success the script prints the metrics and file names.
- Google Scholar may block or require CAPTCHA for automated requests. If exports fail, check
publication_pipeline.logfor details. - The
scholarlylibrary scrapes Google Scholar and may break if Scholar changes its HTML. Keepscholarlyup-to-date. - If
requests.getfor the export returns HTML (captcha) instead of the citations export, you may need a validcitsig, session cookies, or manual export. - The script performs minimal cleaning of the downloaded content (removes HTML tags and compresses blank lines). Validate the resulting
own-bib.bibif used by BibTeX tools.
- Add parsing of the downloaded bibliography to compute publication-type counts (journals, conferences, preprints, patents) and fill the LaTeX placeholders.
- Replace the default config loader to actually read a
config.jsonfile from disk. - Add retry/backoff and proxy support for robust fetching.
MIT