Abstract Wiki Architect (V2)

Industrial-grade NLG system for Abstract Wikipedia and Wikifunctions.

Abstract Wiki Architect is a family-based, data-driven Natural Language Generation (NLG) toolkit. Instead of writing one renderer per language (“300 scripts for 300 languages”), this project builds:

Shared Family Engines: ~15 universal engines (Romance, Slavic, Bantu, etc.) implemented as Adapters.
Configuration Cards: Hundreds of per-language JSON configurations (grammar matrices).
Hexagonal Core: A pure Python domain layer containing semantic frames and cross-linguistic constructions.
Lexicon Subsystem: A robust persistence layer with bridges to Wikidata.
Background Worker: An async system for compiling and onboarding languages.

The goal is to provide a professional, testable architecture for rule-based NLG, aligned with Abstract Wikipedia but usable as a standalone API service.

🏛️ Architecture Overview (Hexagonal)

The system has moved from a flat script structure to a Modular Monolith organized by technical capability.

app/
├── core/                   # 🧠 THE BRAIN (Pure Python, No Infrastructure)
│   ├── domain/             # Models (Frames, Sentences) & Events
│   ├── ports/              # Interfaces (IMessageBroker, IGrammarEngine)
│   └── use_cases/          # Business Logic (GenerateText, BuildLanguage)
│
├── adapters/               # 🔌 THE PLUGS (Infrastructure)
│   ├── api/                # FastAPI (Driving Adapter)
│   ├── worker/             # Background Worker (Driving Adapter)
│   ├── messaging/          # Redis Pub/Sub (Driven Adapter)
│   ├── persistence/        # FileSystem & Wikidata (Driven Adapters)
│   └── engines/            # Grammar Engines (GF & Python Wrappers)
│
└── shared/                 # 🛠️ SHARED UTILITIES
    ├── container.py        # Dependency Injection
    └── config.py           # Settings (Pydantic)

💡 Intuition: Consoles, Cartridges, and the Router

Think of each sentence as a game you want to play.

Old way: Build one console per game (one monolithic renderer per language).
Abstract Wiki Architect:

The Console (Core/Engine): Universal logic (Romance, Slavic, etc.).
The Cartridge (Config/Lexicon): Per-language JSON files loaded dynamically.
The Router (API/Use Case): Plugs the right cartridge into the console based on the request.

Example (Romance Family):

The Romance Engine (Adapter) knows how to feminize nouns and apply plural rules generically.
The Italian Cartridge (data/lexicon/it.json) tells it: "-o" -> "-a" for feminine.
The Spanish Cartridge tweaks only what differs: Indefinite articles differ, but pluralization is similar.

🧩 Components

1. Semantic Frames (The Input)

Located in app/core/domain/models.py. These are the abstract representations of intent, independent of language.

Entity Frames: People, Organizations, Places.
Event Frames: Actions with participants and time.
Relational Frames: Definitions, attributes, measurements.

Example Payload:

{
  "frame_type": "bio",
  "subject": { "name": "Marie Curie", "qid": "Q7186" },
  "properties": { "profession": "physicist", "nationality": "polish" }
}

2. Constructions (Sentence Patterns)

Located in app/core/domain/constructions/ (Conceptually). These are family-agnostic patterns that orchestrate the generation:

copula_equative: "X is Y"
transitive_event: "X did Y to Z"
passive_event: "Z was done by X"

3. Grammar Engines (The Generators)

Located in app/adapters/engines/. We support multiple backend engines:

GF (Grammatical Framework): For high-precision, resource-heavy generation (Full Strategy).
Python/Jinja (Simple): For rapid prototyping and pidgin generation (Fast Strategy).

4. Lexicon Subsystem

Located in app/adapters/persistence/.

FileSystemRepo: Loads local JSON lexicons.
WikidataAdapter: Fetches live data from SPARQL endpoints to hydrate missing lexemes.

🚀 Quick Start (Docker)

The easiest way to run the full stack (API + Worker + Redis) is via Docker Compose.

1. Start the System

docker-compose up --build

Unified UI: http://localhost:4000/abstract_wiki_architect/
API Docs: http://localhost:8000/docs
Redis: localhost:6379

2. Verify Health

curl http://localhost:8000/api/v1/health/ready
# {"broker":"up", "storage":"up", "engine":"up"}

💻 API Usage

Instead of calling Python functions directly, you now interact via REST API.

1. Generate Text (Synchronous)

POST /api/v1/generate/{lang_code}

curl -X POST http://localhost:8000/api/v1/generate/fra \
  -H "x-api-key: secret" \
  -H "Content-Type: application/json" \
  -d '{
    "frame_type": "bio",
    "subject": {"name": "Marie Curie"},
    "properties": {"profession": "physicist", "nationality": "polish"}
  }'

Response: Marie Curie est une physicienne polonaise.

2. Onboard New Language (Async Saga)

POST /api/v1/languages/

Triggers the Onboarding Saga which:

Registers the language in the system.
Scaffolds initial JSON configuration files.
Dispatches a build event to the Background Worker.

curl -X POST http://localhost:8000/api/v1/languages/ \
  -H "x-api-key: secret" \
  -H "Content-Type: application/json" \
  -d '{"code": "zul", "name": "Zulu", "family": "Bantu"}'

🛠️ Development (Local)

If you are developing core logic without Docker:

# 1. Install
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,api]"

# 2. Run API (Shim)
python -m app.main

# 3. Run Worker (Requires Redis)
# Use 'arq' CLI to watch for file changes
arq app.workers.worker.WorkerSettings --watch app

🧪 Testing

We use pytest with a strict separation of Unit and Integration tests.

# Run all tests
pytest

# Run only Core Unit tests (Fast, Mocked Infrastructure)
pytest tests/core

# Run Integration tests (Requires Redis/Internet)
pytest tests/integration

🗺️ Mapping to Wikifunctions

The system includes utilities to mock Wikifunctions Z-Objects, facilitating future export.

Z-Object Mock

Located in app/shared/wikifunctions_mock.py. Wraps Python dictionaries in Z-Object structures (Z6 for strings, Z9 for references) to simulate how the Abstract Wikipedia renderer calls functions.

Config Extraction

You can extract the internal JSON configurations to use as Z-Data on Wikifunctions:

python -m app.utils.config_extractor it

Outputs the Italian configuration JSON compatible with Z-Function inputs.

🌐 Related Projects & Ecosystem

Abstract Wiki Architect is designed to work in concert with a suite of tools for information deconstruction and secure exchange.

SenTient: A powerful integration of Falcon 2.0, OpenTapioca, and OpenRefine. It deconstructs information to improve system circulation and acts as the intelligence layer alongside Architect.
Orgo: A closed-loop, secure application for resilience. Architect and SenTient operate within Orgo to ensure robust internal operations. (Note: Orgo is an independent project with distinct organizational affiliations outside the scope of the Wikimedia Foundation).
Konnaxion: The open counterpart to Orgo, focused on constructive, philanthropic exchanges solidly anchored in ethical principles.
The Senior Architect's Codex: Advanced Jupyter notebooks and utilities for AI empowerment.
Core Modules: Ariane (Navigation) and Ame-Artificielle.

🔮 Roadmap & Status

Current Status (V2.1 - Dec 2025):

✅ Hexagonal Architecture: Full separation of concerns.
✅ Async Worker: Long-running compilations no longer block the API.
✅ Unified API: One canonical entrypoint (/api/v1) for all clients.
✅ Biography Generation: BioFrame supported across Romance and Germanic families.
✅ Dockerized: One-command deploy.

Upcoming:

LLM Refiner: Post-processing step to smooth rule-based output.
Web UI: Next.js frontend for managing languages (In Progress).
Observability: OpenTelemetry tracing.

🔗 Links

Repository: github.com/Rejean-McCormick/abstract-wiki-architect
Wiki: Architecture deep dives and frame definitions.
Meta-Wiki: Abstract Wikipedia Tools Hub

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
ai_services		ai_services
alembic		alembic
app		app
architect_frontend		architect_frontend
builder		builder
data		data
deploy		deploy
discourse		discourse
docker		docker
docs		docs
generated/src		generated/src
gf-rgl		gf-rgl
gf		gf
nlg		nlg
registry		registry
schemas/frames		schemas/frames
scripts		scripts
tests		tests
tools		tools
utils		utils
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.smartignore		.smartignore
AbstractWiki.gfo		AbstractWiki.gfo
Entity		Entity
Fact		Fact
GitSink.bat		GitSink.bat
Makefile		Makefile
Modifier		Modifier
Predicate		Predicate
Property		Property
README.md		README.md
Run-Architect.ps1		Run-Architect.ps1
StartWSL.bat		StartWSL.bat
alembic.ini		alembic.ini
check_models.py		check_models.py
context_gatherer.py		context_gatherer.py
debug_matrix.py		debug_matrix.py
disable_broken_compile.sh		disable_broken_compile.sh
docker-compose.yml		docker-compose.yml
fix_config.py		fix_config.py
fix_filenames.py		fix_filenames.py
fix_grammar_files.py		fix_grammar_files.py
generate_path_map.py		generate_path_map.py
link_libraries.py		link_libraries.py
manage.py		manage.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
smoke_test.py		smoke_test.py
stdout		stdout
sync_config_from_gf.py		sync_config_from_gf.py
tempo.py		tempo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract Wiki Architect (V2)

🏛️ Architecture Overview (Hexagonal)

💡 Intuition: Consoles, Cartridges, and the Router

🧩 Components

1. Semantic Frames (The Input)

2. Constructions (Sentence Patterns)

3. Grammar Engines (The Generators)

4. Lexicon Subsystem

🚀 Quick Start (Docker)

1. Start the System

2. Verify Health

💻 API Usage

1. Generate Text (Synchronous)

2. Onboard New Language (Async Saga)

🛠️ Development (Local)

🧪 Testing

🗺️ Mapping to Wikifunctions

Z-Object Mock

Config Extraction

🌐 Related Projects & Ecosystem

🔮 Roadmap & Status

🔗 Links

About

Uh oh!

Releases

Packages

Languages

Rejean-McCormick/abstract-wiki-architect

Folders and files

Latest commit

History

Repository files navigation

Abstract Wiki Architect (V2)

🏛️ Architecture Overview (Hexagonal)

💡 Intuition: Consoles, Cartridges, and the Router

🧩 Components

1. Semantic Frames (The Input)

2. Constructions (Sentence Patterns)

3. Grammar Engines (The Generators)

4. Lexicon Subsystem

🚀 Quick Start (Docker)

1. Start the System

2. Verify Health

💻 API Usage

1. Generate Text (Synchronous)

2. Onboard New Language (Async Saga)

🛠️ Development (Local)

🧪 Testing

🗺️ Mapping to Wikifunctions

Z-Object Mock

Config Extraction

🌐 Related Projects & Ecosystem

🔮 Roadmap & Status

🔗 Links

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages