An End-to-End AI agent system for public procurement analysis, achieving a 96.4% F1-score by leveraging a fine-tuned RoBERTa, RAG, and LLMs. This project demonstrates a full-cycle development process from ideation and data collection to model optimization and API deployment.
This project was born from a real-world business challenge: the inefficient and error-prone process of manually reviewing hundreds of public procurement bids daily at a civil engineering firm. This manual process not only consumed significant human resources but also risked missing critical business opportunities.
This repository documents the journey of building an End-to-End solution single-handedly. It covers everything from data collection and refinement and core AI model development and optimization, to the final deployment of a real-time, production-ready API.
The project culminates in two key deliverables:
- A Stable Production API: A highly optimized dual-model system built for speed, stability, and immediate business integration.
- A High-Performance R&D Agent: An intelligent agent system designed to push the boundaries of accuracy, mimicking the decision-making process of a human expert.
The entire project is structured as a systematic pipeline, flowing from data foundation to model development, optimization, and final service delivery.
graph TD
subgraph "Phase 1: Data Foundation"
A[RAW DATA<br/>Procurement Bid Collection] -->|Cleanse & Standardize| B[High-Quality Labeled Dataset]
B -->|Stratified Sampling| C[Train/Valid/Test Sets]
end
subgraph "Phase 2: Core AI Model Development"
C --> D[Train Primary Review AI<br/>Binary Classifier: FT RoBERTa + LoRA]
C --> E[Train Secondary Review AI<br/>Multi-Class Classifier: FT RoBERTa + LoRA]
C --> F[Build Knowledge Base<br/>Semantic Search DB: SBERT + Faiss]
end
subgraph "Phase 3: Production Optimization"
D --> G[Model Quantization & Optimization<br/>ONNX Conversion + INT8 Quantization]
E --> G
G --> H[Optimized Models<br/>2.35× Faster, 75% Smaller]
end
H --> I[Strategic Fork]
subgraph "Path 1: Production API System"
I --> J[Focus: Speed & Stability<br/>Build Real-time Service<br/>Dual-Model API with FastAPI]
J --> K[FINAL API SERVICE<br/>POST /classify_batch<br/>Delivers reliable business value]
end
subgraph "Path 2: R&D Agent System"
I --> L[Focus: Peak Accuracy<br/>Build Expert Committee AI<br/>3-Agent System: FT Model・RAG・LLM]
F --> L
D --> L
L --> M[Ablation Study<br/>FT Model vs. Agent System]
M --> N[Achieved 0.964 F1-Score<br/>Proved 88% error reduction]
end
%% Styling
style A fill:#0277BD,color:#fff,stroke:#F5F5F5,stroke-width:2px
style B fill:#039BE5,color:#fff,stroke:#F5F5F5,stroke-width:2px
style C fill:#29B6F6,color:#000,stroke:#333,stroke-width:2px
style D fill:#388E3C,color:#fff,stroke:#F5F5F5,stroke-width:2px
style E fill:#4CAF50,color:#fff,stroke:#F5F5F5,stroke-width:2px
style F fill:#66BB6A,color:#000,stroke:#333,stroke-width:2px
style G fill:#F9A825,color:#000,stroke:#333,stroke-width:2px
style H fill:#FBC02D,color:#000,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style I fill:#8E24AA,color:#fff,stroke:#F5F5F5,stroke-width:4px
style J fill:#D32F2F,color:#fff,stroke:#F5F5F5,stroke-width:2px
style K fill:#F44336,color:#fff,stroke:#F5F5F5,stroke-width:4px
style L fill:#303F9F,color:#fff,stroke:#F5F5F5,stroke-width:2px
style M fill:#3F51B5,color:#fff,stroke:#F5F5F5,stroke-width:2px
style N fill:#5C6BC0,color:#fff,stroke:#F5F5F5,stroke-width:3px,stroke-dasharray: 5 5
This project evolved in two stages: first, building a robust automation system, and second, proving the potential of a more intelligent expert system that overcomes the limitations of the first.
- Reliable Task Handling: A core engine was developed by fine-tuning
klue/roberta-large, achieving a high F1-Score of 0.97 in determining bid eligibility. This model forms the backbone of the fast and reliable production API.
- Mimicking Human Decision-Making: Simple fine-tuned models can be brittle on ambiguous, boundary-case problems. To solve this, a 3-agent system was designed, where a Fine-Tuned Model (Initial Review) + RAG (Case Search) + LLM (Final Judgment) collaborate to make a decision.
- Breakthrough Performance: This agent-based approach dramatically improved judgment accuracy on complex cases, boosting the overall system's F1-Score from a baseline of 0.7045 to 0.9639. This represents an 88% reduction in the error rate compared to the standalone fine-tuned model on challenging data.
A powerful model is useless if it's not fast and efficient in a real-world service environment. Targeting a CPU-only server deployment, the trained PyTorch models were optimized through ONNX conversion and INT8 quantization.
(Insert your quantization performance comparison table here.)
| Metric | FP32 PyTorch (Baseline) | INT8 ONNX (Quantized) | Delta (Change) |
|---|---|---|---|
| F1-Score | 0.9719 | 0.9739 | +0.0020 (+0.20%) |
| Model Size (MB) | 1280.75 | 323.24 | -74.76% |
| Latency (ms) | 103.98 | 44.15 | -57.54% (2.35x faster) |
Analysis: Remarkably, the optimization process resulted in zero performance degradation—in fact, the F1-Score slightly increased. We achieved a 2.35x speedup in inference and a 75% reduction in model size, making the system perfectly viable for a cost-effective CPU-only API service.
The project is designed with a modular, role-based structure for clarity and scalability.
Bid-Analysis-Agent/
├── src/
│ ├── core_training/ # The "Brain Factory": Scripts for training core models
│ ├── production_api/ # The "Workplace": Production-ready API logic
│ ├── research_agent_system/ # The "Lab": R&D agent system logic
│ └── shared_utils/ # The "Heart": Common utilities (data, models, etc.)
├── data/
├── notebooks/ # EDA and initial experiments
├── output/ # All artifacts: models, results, vector DB
├── config.py
└── README.md
- Setup Environment:
pip install -r requirements.txt
- Process Data:
python src/shared_utils/data_processing.py
- Train Core Models:
python src/core_training/train_binary.py python src/core_training/train_multiclass.py
- Run API Server:
python src/production_api/main.py
- Language: Python
- Core Libraries: PyTorch, Transformers, PEFT(LoRA), LangGraph
- API & Deployment: FastAPI, Uvicorn, ONNX Runtime
- Data Handling: Pandas, Faiss, Scikit-learn
- Environment: Conda
This project is licensed under the MIT License. See the LICENSE file for details.
- Harim Choi (HarimxChoi)
- Email: 2.harim.choi@gmail.com