Petros Raptopoulos, Giorgos Filandrianos, Maria Lymperaiou, Giorgos Stamou
Making Contract Review Accessible to Everyone Through AI
Try PAKTON | View Evaluation and Experiments | Read Paper | View Poster | View Recording | Underline
Reviewing contracts is often slow, complex, and requires expert legal knowledge. Legal language can be vague and open to interpretation, making it hard for non-experts to understand. On top of that, contracts are usually private, which limits the use of proprietary AI tools and calls for open-source solutions.
PAKTON solves these problems with an open-source, end-to-end framework for automated contract review. It uses a team of LLM agents working together, along with smart retrieval tools (RAG), to make legal document analysis easier, more private, and customizable.
PAKTON was published at the Main Conference of EMNLP 2025 and presented orally by Petros Raptopoulos.
π Live Deployed Version at pakton.site
β οΈ Important Note: The deployed version and the code currently in the repository are missing a few components that will be added shortly. These updates are being organized to ensure a clean and robust push.
PAKTON employs a sophisticated multi-agent architecture that orchestrates specialized AI agents to handle different aspects of contract analysis. The framework leverages collaborative agent workflows combined with advanced retrieval-augmented generation (RAG) to provide comprehensive, accurate, and explainable contract review.
We evaluated PAKTON using both qualitative and quantitative methods to ensure its effectiveness in real-world legal tasks. You can explore all experiment results and details at: π https://pakton.site/evaluation
- Experiments Overview - Complete evaluation framework and methodology
- Human Evaluation - Human assessment methodology and results
- GEVAL Assessment - Automated qualitative evaluation using LLM-as-a-judge
- Statistical Agreement - Statistical validation of alignment between LLM and human evaluations
- ContractNLI Classification - Classification Performance
- LegalBenchRAG Performance - Retrieval ability Performance
- Superior Generation Quality: Outperforms baseline methods on the ContractNLI dataset
- State-of-the-Art Retrieval: RAG component (Researcher) leads performance on LegalBenchRAG benchmark
- Human-Preferred: Chosen by human evaluators over ChatGPT for contract analysisβespecially for Explainability and Completeness.
- LLM Validation: GEVAL evaluations show consistent preference for PAKTON over GPT-4o
- Statistical Validation: Strong statistical agreement (cosine similarity 0.88-0.92) between automated and human evaluation methods confirms reliability of assessment results
- Privacy-First: Fully open-source with on-premise deployment capabilities
- Robust: According to our robustness analysis, it bridges performance gaps between small and large LLMs, enabling smaller open-source models to rival larger proprietary ones
- Plug-and-Play: Modular architecture for seamless extension and custom workflow integration
- Transparent Design: Explainable outputs that contrast with typical black-box AI models
PAKTON/
βββ LICENSE # License information
βββ README.md # This file
βββ CONTRIBUTING.md # Contributing Guidelines
βββ Docs/ # Documentation and research papers
β βββ ACL_Anthology_version.pdf # ACL Anthology published version
β βββ EMNLP 2025_Poster.pdf # Conference poster
β βββ Preprint_May_25.pdf # Research preprint
βββ deployment/ # Deployment configurations
β βββ development/ # Development environment configs
β βββ production/ # Production environment configs
β βββ nginx/ # Nginx server configurations
βββ PAKTON Framework/ # Core framework implementation
β βββ API/ # Backend API service
β βββ Archivist/ # Archivist agent implementation
β βββ Interrogator/ # Interrogator agent implementation
β βββ Researcher/ # Researcher agent (RAG component)
β βββ Frontend/ # Frontend applications
βββ Experiments and Evaluation/ # All experimental work and evaluation
β βββ Frontend/ # Frontend for experiments visualization
β βββ Qualitative/ # Qualitative evaluation methods
β β βββ Human Evaluation/ # Human assessment results
β β βββ LLM as a judge - GEVAL/ # Automated evaluation using GEVAL
β β βββ Statistical Agreement/ # Statistical validation between LLM and human evaluations
β βββ Quantitative/ # Quantitative performance evaluation
β βββ Classification Performance - ContractNLI/ # ContractNLI experiments
β βββ RAG Performance - LegalBenchRAG/ # LegalBenchRAG experiments
βββ Machine Learning Experimentation/ # Additional ML experiments (not mentioned in the paper)
PAKTON is dedicated to making contractual obligations clearer and more accessible to everyone. We believe in the power of community-driven development and welcome contributors (ideas, code, feedback).
Join our vibrant Discord community where developers, researchers, and legal tech enthusiasts come together to:
- Share ideas and get instant feedback
- Troubleshoot and solve implementation challenges
- Find collaborators for new features and research
- Stay ahead with the latest updates and releases
Whether you're fixing bugs, adding features, improving documentation, or sharing use cases, your contribution matters! To get started, please review our Contributing Guidelines.
Ways to contribute:
- Report bugs and issues
- Suggest new features or improvements
- Improve documentation
- Submit pull requests
- Help with translations and accessibility
- Share PAKTON with others who might benefit
This project is licensed under the terms specified in the LICENSE file.
Democratizing contract analysis
π Star this repository if PAKTON helped you! π



