Tapio is a RAG (Retrieval Augmented Generation) tool for extracting, processing, and querying information from websites like Migri.fi (Finnish Immigration Service). It provides complete workflow capabilities including web crawling, content parsing, vectorization, and an interactive chatbot interface.
- Multi-site support - Configurable site-specific crawling and parsing
- End-to-end pipeline - Crawl → Parse → Vectorize → Query workflow
- Local LLM integration - Uses Ollama for private, local inference
- Semantic search - ChromaDB vector database for relevant content retrieval
- Interactive chatbot - Web interface for natural language queries
- Flexible crawling - Configurable depth and domain restrictions
- Comprehensive testing - Full test suite for reliability
Primary Users: EU and non-EU citizens navigating Finnish immigration processes
- Students seeking education information
- Workers exploring employment options
- Families pursuing reunification
- Refugees and asylum seekers needing guidance
Core Needs:
- Finding relevant, accurate information quickly
- Practice conversations on specific topics (family reunification, work permits, etc.)
- Clone and setup:
git clone https://github.com/Finntegrate/tapio.git
cd tapio
uv sync- Install required Ollama model:
ollama pull llama3.2Tapio provides a four-step workflow:
- crawl - Collect HTML content from websites
- parse - Convert HTML to structured Markdown
- vectorize - Create vector embeddings for semantic search
- tapio-app - Launch the interactive chatbot interface
Use uv run -m tapio.cli --help to see all commands or uv run -m tapio.cli <command> --help for command-specific options.
Complete workflow for the Migri website:
# 1. Crawl content (uses site configuration)
uv run -m tapio.cli crawl migri --depth 2
# 2. Parse HTML to Markdown
uv run -m tapio.cli parse migri
# 3. Create vector embeddings
uv run -m tapio.cli vectorize
# 4. Launch chatbot interface
uv run -m tapio.cli tapio-appTo list configured sites:
uv run -m tapio.cli list-sitesTo view detailed site configurations:
uv run -m tapio.cli list-sites --verboseFor technical details on site configurations, programmatic API usage, and adding new sites, see CONTRIBUTING.md.
See CONTRIBUTING.md for development guidelines, code style requirements, and how to submit pull requests.
Licensed under the European Union Public License version 1.2. See LICENSE for details.
Thanks goes to these wonderful people (emoji key):
Brylie Christopher Oxley 🚇 |
AkiKurvinen 🔣 💻 |
ResendeTech 💻 |
This project follows the all-contributors specification. Contributions of any kind welcome!