Tools for managing BibTeX bibliographies: automatically update preprints to published versions, validate references against external databases, and filter to only cited references.
pip install bibtex-updater
# With Google Scholar support
pip install bibtex-updater[scholarly]
# With Zotero support
pip install bibtex-updater[zotero]
# All optional dependencies
pip install bibtex-updater[all]git clone https://github.com/rpatrik96/bibtexupdater.git
cd bibtexupdater
uv sync --extra dev --extra allRun directly without cloning using uv:
# Run any command directly
uv run --with "bibtex-updater[all]" bibtex-update references.bib -o updated.bib
# Or use the provided wrapper script
./scripts/bibtex-x update references.bib -o updated.bib
./scripts/bibtex-x check references.bib
./scripts/bibtex-x filter paper.tex -b references.bib -o filtered.bib| Command | Description |
|---|---|
bibtex-update |
Replace preprints with published versions |
bibtex-check |
Validate references exist with correct metadata |
bibtex-filter |
Filter to only cited entries |
bibtex-zotero |
Update preprints in Zotero library |
bibtex-zotero-organize |
Organize Zotero items into collections by research taxonomy |
bibtex-obsidian-keywords |
AI-powered keyword generation for Obsidian paper notes |
# Update preprints to published versions
bibtex-update references.bib -o updated.bib
# Preview changes (dry run)
bibtex-update references.bib --dry-run --verbose# Check if references exist and have correct metadata
bibtex-check references.bib --report report.json
# Strict mode: exit with error if hallucinated/not-found entries
bibtex-check references.bib --strict# Filter to only cited entries
bibtex-filter paper.tex -b references.bib -o filtered.bib
# Multiple tex files
bibtex-filter *.tex -b references.bib -o filtered.bib# Set credentials (get from zotero.org/settings/keys)
export ZOTERO_LIBRARY_ID="your_user_id"
export ZOTERO_API_KEY="your_api_key"
# Preview changes
bibtex-zotero --dry-run
# Apply updates
bibtex-zoteroWhen updating a .bib file, you can simultaneously update matching entries in your Zotero library:
# Set Zotero credentials
export ZOTERO_LIBRARY_ID="your_user_id"
export ZOTERO_API_KEY="your_api_key"
# Update bib file AND sync to Zotero
bibtex-update references.bib -o updated.bib --zotero
# Preview Zotero changes only (bib changes still apply)
bibtex-update references.bib -o updated.bib --zotero --zotero-dry-run
# Limit to a specific Zotero collection
bibtex-update references.bib -o updated.bib --zotero --zotero-collection ABCD1234The sync matches bib entries to Zotero items by:
- arXiv ID - Most reliable for preprints
- DOI - For preprints with DOIs (e.g., bioRxiv)
- Title + Author - Fuzzy matching as fallback
For environments without pip (e.g., Overleaf), filter_bibliography.py can be used directly as it has no dependencies:
# Copy the script and run directly
python filter_bibliography.py paper.tex -b references.bib -o filtered.bib| Document | Description |
|---|---|
| docs/BIBTEX_UPDATER.md | Full BibTeX updater documentation |
| docs/REFERENCE_FACT_CHECKER.md | Full reference fact-checker documentation |
| docs/ZOTERO_UPDATER.md | Full Zotero updater documentation |
| docs/FILTER_BIBLIOGRAPHY.md | Full filter documentation |
| docs/LANDSCAPE.md | Databases, competing tools, and ecosystem landscape |
| examples/ | Example workflows and configuration files |
Both tools integrate with Overleaf via GitHub Actions or latexmkrc.
- Enable GitHub sync in Overleaf (Menu -> Sync -> GitHub)
- Copy a workflow from examples/workflows/ to
.github/workflows/ - Changes synced from Overleaf automatically trigger updates
For filter_bibliography.py only (no dependencies required):
- Upload
filter_bibliography.pyto your Overleaf project - Create
.latexmkrcbased on examples/latexmkrc - Recompile - filtered bibliography appears in your file list
- Multi-source resolution: arXiv, OpenAlex, Europe PMC, Crossref, DBLP, ACL Anthology, Semantic Scholar, Google Scholar
- High accuracy: Title and author fuzzy matching with confidence thresholds
- ACL Anthology support: Zero-overhead resolution for NLP papers (ACL, EMNLP, NAACL, etc.)
- Batch processing: Multiple files with concurrent workers (default: 8)
- Deduplication: Merge duplicates by DOI or normalized title+authors
- Smart caching: On-disk cache + semantic resolution cache with TTL
- Per-service rate limiting: Optimized rate limits per API (Crossref, S2, DBLP, ACL Anthology, arXiv, OpenAlex, Europe PMC)
- Batch API support: Faster bulk lookups via arXiv/S2/Crossref batch endpoints
- Resolution tracking:
--mark-resolvedtags updated entries to skip on re-runs
- Direct Zotero integration: Fetches and updates items via Zotero API
- Same resolution pipeline: Uses the same multi-source resolution
- Preserves metadata: Keeps notes, tags, and attachments intact
- Idempotent: Already-published papers are automatically skipped
- Dry-run mode: Preview changes before applying
- Tag-based chunking: Track processing state with
preprint-upgraded/preprint-checked/preprint-errortags
- AI-powered taxonomy: Organize items into hierarchical collections automatically
- Multiple backends: Claude, OpenAI, or local embeddings for classification
- Caching: Classification results cached to reduce API calls
- Batch processing: Configurable limits and dry-run mode
- AI-powered keywords: Generate
[[wikilinks]]for Obsidian paper notes - Multiple backends: Claude, OpenAI, or local embeddings
- Smart skipping:
--min-keywordsto skip notes that already have enough keywords - Topics file: Provide existing topics for consistent tagging across notes
- Dry-run mode: Preview changes before modifying files
- Multi-source validation: Crossref, DBLP, Semantic Scholar
- Detailed mismatch detection: Title, author, year, venue comparisons
- Hallucination detection: Identifies likely fabricated references
- Structured reports: JSON and JSONL output formats
- CI/CD integration: Strict mode with exit codes for automation
- Zero dependencies: Uses only Python standard library
- Works on Overleaf: No pip install needed
- Multiple bib files: Merge and filter from multiple sources
- Citation detection: Supports natbib, biblatex, and standard LaTeX citations
from bibtex_updater import Detector, Resolver, Updater, HttpClient, RateLimiter, DiskCache
# Create HTTP client with rate limiting and caching
rate_limiter = RateLimiter(req_per_min=30)
cache = DiskCache(".cache.json")
http_client = HttpClient(
timeout=30.0,
user_agent="bibtex-updater/0.5.0",
rate_limiter=rate_limiter,
cache=cache
)
# Detect preprints
detector = Detector()
detection = detector.detect(entry)
if detection.is_preprint:
# Resolve to published version
resolver = Resolver(http_client)
candidate = resolver.resolve(detection)
if candidate and candidate.confidence >= 0.9:
# Update the entry
updater = Updater()
updated_entry = updater.update_entry(entry, candidate.record, detection)# Clone and install in development mode
git clone https://github.com/rpatrik96/bibtexupdater.git
cd bibtexupdater
uv sync --extra dev --extra all
# Run tests
uv run pytest tests/ -v
# Run tests with coverage
uv run pytest tests/ -v --cov=bibtex_updater --cov-report=term-missing
# Code quality
pre-commit run --all-files
# Build package
uv build
# Check package
uv run twine check dist/*MIT License - see LICENSE for details.




