Skip to content

brylie/tapio

Repository files navigation

Tapio

All Contributors

Tapio is a RAG (Retrieval Augmented Generation) tool for extracting, processing, and querying information from websites like Migri.fi (Finnish Immigration Service). It provides complete workflow capabilities including web crawling, content parsing, vectorization, and an interactive chatbot interface.

Features

  • Multi-site support - Configurable site-specific crawling and parsing
  • End-to-end pipeline - Crawl → Parse → Vectorize → Query workflow
  • Local LLM integration - Uses Ollama for private, local inference
  • Semantic search - ChromaDB vector database for relevant content retrieval
  • Interactive chatbot - Web interface for natural language queries
  • Flexible crawling - Configurable depth and domain restrictions
  • Comprehensive testing - Full test suite for reliability

Target Use Cases

Primary Users: EU and non-EU citizens navigating Finnish immigration processes

  • Students seeking education information
  • Workers exploring employment options
  • Families pursuing reunification
  • Refugees and asylum seekers needing guidance

Core Needs:

  • Finding relevant, accurate information quickly
  • Practice conversations on specific topics (family reunification, work permits, etc.)

Installation and Setup

Prerequisites

  • Python 3.10 or higher
  • uv - Fast Python package installer
  • Ollama - For local LLM inference

Quick Start

  1. Clone and setup:
git clone https://github.com/Finntegrate/tapio.git
cd tapio
uv sync
  1. Install required Ollama model:
ollama pull llama3.2

Usage

CLI Overview

Tapio provides a four-step workflow:

  1. crawl - Collect HTML content from websites
  2. parse - Convert HTML to structured Markdown
  3. vectorize - Create vector embeddings for semantic search
  4. tapio-app - Launch the interactive chatbot interface

Use uv run -m tapio.cli --help to see all commands or uv run -m tapio.cli <command> --help for command-specific options.

Quick Example

Complete workflow for the Migri website:

# 1. Crawl content (uses site configuration)
uv run -m tapio.cli crawl migri --depth 2

# 2. Parse HTML to Markdown
uv run -m tapio.cli parse migri

# 3. Create vector embeddings
uv run -m tapio.cli vectorize

# 4. Launch chatbot interface
uv run -m tapio.cli tapio-app

Available Sites

To list configured sites:

uv run -m tapio.cli list-sites

To view detailed site configurations:

uv run -m tapio.cli list-sites --verbose

For technical details on site configurations, programmatic API usage, and adding new sites, see CONTRIBUTING.md.

Contributing

See CONTRIBUTING.md for development guidelines, code style requirements, and how to submit pull requests.

License

Licensed under the European Union Public License version 1.2. See LICENSE for details.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Brylie Christopher Oxley
Brylie Christopher Oxley

🚇 ⚠️ 📖 🐛 💼 🖋 🤔 🚧 🧑‍🏫 📆 📣 🔬 👀 💻
AkiKurvinen
AkiKurvinen

🔣 💻
ResendeTech
ResendeTech

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

About

Helpful chat companion for Finnish immigrants.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages