📚 Publication Pipeline (Google Scholar export + metrics)

Simple pipeline to fetch Google Scholar metrics and export a bibliography file. The script downloads a user's Google Scholar export (BibTeX-like text), writes it to disk, and writes a LaTeX metrics snippet (metrics.tex) using Scholar metrics.

This repository contains a lightweight tool implemented in pipeline_list_publication.py.

🔎 What it does

Fetches Google Scholar author data using scholarly (author id).
Downloads a citations export from Google Scholar (via an export URL) and saves it (default: own-bib.bib).
Generates/updates a LaTeX metrics file (metrics.tex) containing citations, h-index, i10-index, and placeholders for publication counts.
Writes logs to publication_pipeline.log.

Requirements

Python 3.8+
Packages:
- requests
- scholarly
- (built-in) re, os, datetime, json, logging

Install dependencies:

pip install requests scholarly

Note: scholarly may require additional setup depending on your environment/version.

Files produced / important paths

own-bib.bib — downloaded bibliography (overwritten each run)
metrics.tex — LaTeX snippet with updated metrics
publication_pipeline.log — pipeline logging output

Configuration

To customize, create a config.json in the project root and update scholar_id and citsig (and output file names) accordingly. Example config.json:

{
  "scholar_id": "YOUR_SCHOLAR_ID",
  "citsig": "YOUR_CITSIG_IF_REQUIRED",
  "output_bib": "own-bib.bib",
  "metrics_file": "metrics.tex"
}

You need to export all you citations in bibtex and export the ids from the URL

scholar_id: Google Scholar user id (the user= value from profile URL).
citsig: optional export signature sometimes required for the export URL (the citsig= value from profile URL).

For more detail look at the video here

Usage

Run the pipeline from the project folder (Windows):

python pipeline_list_publication.py

Or, to pass a custom config file (modify the script invocation or edit the initializer in code).

Console/log output indicates success/failure and produced files. On success the script prints the metrics and file names.

Notes & Troubleshooting

Google Scholar may block or require CAPTCHA for automated requests. If exports fail, check publication_pipeline.log for details.
The scholarly library scrapes Google Scholar and may break if Scholar changes its HTML. Keep scholarly up-to-date.
If requests.get for the export returns HTML (captcha) instead of the citations export, you may need a valid citsig, session cookies, or manual export.
The script performs minimal cleaning of the downloaded content (removes HTML tags and compresses blank lines). Validate the resulting own-bib.bib if used by BibTeX tools.

Extending

Add parsing of the downloaded bibliography to compute publication-type counts (journals, conferences, preprints, patents) and fill the LaTeX placeholders.
Replace the default config loader to actually read a config.json file from disk.
Add retry/backoff and proxy support for robust fetching.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Publication_lists.tex		Publication_lists.tex
metrics.tex		metrics.tex
overleaf_zip.py		overleaf_zip.py
own-bib.bib		own-bib.bib
pipeline_list_publication.py		pipeline_list_publication.py
readme.md		readme.md
settings.sty		settings.sty

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 Publication Pipeline (Google Scholar export + metrics)

🔎 What it does

Requirements

Files produced / important paths

Configuration

Usage

Notes & Troubleshooting

Extending

License

About

Uh oh!

Releases

Packages

Languages

YouvenZ/Auto-Publication-List

Folders and files

Latest commit

History

Repository files navigation

📚 Publication Pipeline (Google Scholar export + metrics)

🔎 What it does

Requirements

Files produced / important paths

Configuration

Usage

Notes & Troubleshooting

Extending

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages