Linkwarden Tag Cleanup Tools

Complete toolkit for cleaning up and maintaining consistent tags in Linkwarden instances, especially those using LLM-based auto-tagging.

Problem

Auto-tagging with small LLMs (like gemma3b) creates severe tag inconsistencies:

Case duplicates: "Music" vs "music", "AI" vs "ai" (43 found)
Semantic overlaps: "AI", "Machine Learning", "ML", "LLM" all meaning similar things
Tag proliferation: 84% of tags used on ≤3 links
Junk tags: Non-substantive tags like "Avoid", "Sign Up", "Room", "Feel"
No reuse: LLM creates new tags instead of reusing existing ones

Result: 4,866 tags where only ~200 are actually useful.

Solution

This toolkit provides four complementary tools:

Tag Analyzer - Identify duplicates, overlaps, and low-usage tags
Tag Consolidator - Merge duplicates and clean up existing tags
Tag Normalizer - Prevent future tag proliferation (ongoing service)
Junk Remover - Remove non-substantive tags that provide no value

Quick Start

Installation

# Clone the repository
git clone https://github.com/roelven/linkwarden-tag-cleanup.git
cd linkwarden-tag-cleanup

# Install dependencies
pip3 install -r requirements.txt

# Configure
cp .env.example .env
nano .env  # Add your Linkwarden API URL and token

Basic Usage

# 1. Analyze your tags
bin/run_analysis.sh

# 2. Consolidate duplicates (dry-run first)
bin/run_consolidation.sh
bin/run_consolidation.sh --no-dry-run

# 3. Remove junk tags
bin/run_junk_removal.sh --analyze
bin/run_junk_removal.sh

# 4. Set up ongoing normalization (cron)
crontab -e
# Add: */5 * * * * cd /path/to/linkwarden-cleanup && bin/run_normalization.sh >> normalization.log 2>&1

See QUICKSTART.md for detailed setup instructions.

Features

Tag Analysis

✅ Identify case-insensitive duplicates
✅ Find semantic overlaps
✅ Detect low-usage tags
✅ Generate consolidation mappings
✅ Automatic backup before changes

Tag Consolidation

✅ Merge case variants (music → Music)
✅ Merge semantic duplicates (AI/ML/LLM → AI)
✅ Delete low-usage tags (<3 uses)
✅ Update all affected links
✅ Dry-run mode for safety

Tag Normalization (Ongoing)

✅ Fuzzy matching (85% similarity)
✅ Automatic case normalization
✅ Reuse existing tags
✅ Runs via cron/systemd
✅ Configurable thresholds

Junk Tag Removal

✅ Remove non-substantive tags
✅ 200+ built-in junk patterns
✅ Custom blocklist support
✅ Smart acronym detection
✅ Usage-based filtering

Project Structure

linkwarden-cleanup/
├── bin/                    # Wrapper scripts (run these)
│   ├── run_analysis.sh
│   ├── run_consolidation.sh
│   ├── run_normalization.sh
│   └── run_junk_removal.sh
├── scripts/                # Core Python scripts
│   ├── analyze_tags.py
│   ├── consolidate_tags.py
│   ├── normalize_new_tags.py
│   └── remove_junk_tags.py
├── config/                 # Configuration files
│   ├── config.example.json
│   └── junk_tags_blocklist.txt
├── docs/                   # Documentation
│   ├── QUICKSTART.md
│   ├── JUNK_TAGS_GUIDE.md
│   ├── TESTING.md
│   ├── IMPLEMENTATION_SUMMARY.md
│   └── deployment/         # Systemd setup
└── examples/               # Example configs and debug scripts

Expected Results

Before Cleanup

Total tags: 4,866
Single-use: 3,041 (62.5%)
Low-use (≤3): 4,089 (84%)
Junk tags: ~1,000
Average usage: 1.5 links/tag

After Cleanup

Total tags: 150-250
Tag reduction: 85%
Case consistency: 100%
Average usage: 15+ links/tag
Ongoing prevention: 80-90% of future duplicates

Documentation

Quick Start Guide - Get up and running in 5 minutes
Junk Tags Guide - Remove non-substantive tags
Testing Guide - Comprehensive test suite
Deployment Guide - Systemd setup

Configuration

Environment Variables (.env)

LINKWARDEN_API_URL=https://your-linkwarden.example.com/api/v1
LINKWARDEN_TOKEN=your_api_token_here
LOW_USE_THRESHOLD=3
SIMILARITY_THRESHOLD=0.85
LOOKBACK_MINUTES=15

Get Your API Token

Log in to Linkwarden
Go to Settings → API Tokens
Create new token with read/write permissions
Copy token to .env file

Advanced Usage

Custom Tag Consolidations

Edit the consolidation mapping before applying:

bin/run_analysis.sh
nano consolidation_mapping.json  # Review and customize
bin/run_consolidation.sh --no-dry-run

Custom Junk Tag Blocklist

Add your own junk tags:

echo "placeholder" >> config/junk_tags_blocklist.txt
echo "example" >> config/junk_tags_blocklist.txt
bin/run_junk_removal.sh --analyze

Partial Consolidation

# Only case normalizations
python3 scripts/consolidate_tags.py --skip-semantic --skip-delete

# Only semantic consolidations
python3 scripts/consolidate_tags.py --skip-case --skip-delete

Requirements

Python 3.7+
requests library
Linkwarden instance with API access
API token with read/write permissions

Safety Features

Automatic backups - Tags saved before changes
Dry-run mode - Preview changes before applying
Confirmation prompts - Prevents accidental deletions
Rate limiting - Avoids API throttling
Error handling - Graceful failure recovery

Troubleshooting

Authentication Errors

# Verify your token works
curl -H "Authorization: Bearer YOUR_TOKEN" \
     https://your-linkwarden.example.com/api/v1/tags

No Recent Links Found

Normal if no links were recently added. The normalization service will catch new links on the next run.

Tag Not Found Errors

Tag was already deleted or renamed. Safe to ignore.

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

MIT License - see LICENSE file for details.

Credits

Created to solve tag proliferation issues with LLM-based auto-tagging in Linkwarden.

Related Projects

Linkwarden - Self-hosted bookmark manager
Linkwarden Docs - Official documentation

Support

For issues, questions, or feature requests:

Open an issue on GitHub
Check the documentation
Review the testing guide

Note: This toolkit works with Linkwarden v2.x. Always backup your data before running cleanup operations.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bin		bin
config		config
docs		docs
examples/debug		examples/debug
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

roelven/linkwarden-tag-cleanup

Folders and files

Latest commit

History

Repository files navigation

Linkwarden Tag Cleanup Tools

Problem

Solution

Quick Start

Installation

Basic Usage

Features

Tag Analysis

Tag Consolidation

Tag Normalization (Ongoing)

Junk Tag Removal

Project Structure

Expected Results

Before Cleanup

After Cleanup

Documentation

Configuration

Environment Variables (.env)

Get Your API Token

Advanced Usage

Custom Tag Consolidations

Custom Junk Tag Blocklist

Partial Consolidation

Requirements

Safety Features

Troubleshooting

Authentication Errors

No Recent Links Found

Tag Not Found Errors

Contributing

License

Credits

Related Projects

Support

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages