|
| 1 | +# Hybrid RAG with Critic–Refiner Workflow (Qwen2.5 + LAmini) |
| 2 | + |
| 3 | +## 1. 🎯Goal |
| 4 | + |
| 5 | +This project implements a **Retrieval-Augmented Generation (RAG)** pipeline enhanced |
| 6 | + with a **dual-stage Critic–Refiner architecture**. |
| 7 | + |
| 8 | +The main objective was to create a **highly accurate, context-grounded, and reliable |
| 9 | + question-answering system**, combining: |
| 10 | + |
| 11 | +- **Qwen2.5-7B-Instruct** (cloud-based Critic) |
| 12 | +- **LAmini (local GGUF model)** (Refiner) |
| 13 | +- **LlamaIndex** (retrieval engine) |
| 14 | + |
| 15 | +The system rigorously evaluates draft answers using a critic model, detects |
| 16 | +factual errors or missing context, and then rewrites them using a local refiner |
| 17 | + model. |
| 18 | +This produces answers that are **trustworthy**, **grounded**, and **fully derived |
| 19 | + from source documents**. |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## 2. 🤖 About the Models Used |
| 24 | + |
| 25 | +### 2.1 Qwen2.5-7B-Instruct (Critic Model) |
| 26 | + |
| 27 | +Qwen2.5-7B is a powerful instruction-tuned LLM developed by Alibaba Cloud. |
| 28 | +It was chosen as the **Critic** for these reasons: |
| 29 | + |
| 30 | +- **High factual reliability:** Qwen models consistently score high in truthfulness |
| 31 | +- and instruction-following benchmarks. |
| 32 | +- **Ideal for evaluation:** As a cloud-based model on Hugging Face Inference API, |
| 33 | +- it is fast, stable, and accurate. |
| 34 | +- **Excellent reasoning capabilities:** Perfect for evaluating alignment between |
| 35 | +- retrieved context and generated draft answers. |
| 36 | + |
| 37 | +### 2.2 LAmini (Local Refiner Model) |
| 38 | + |
| 39 | +LAmini is a compact, efficient, open-source model designed for rewriting and |
| 40 | +stylistic refinement. |
| 41 | +It was selected as the **Refiner** because: |
| 42 | + |
| 43 | +- **Small and fast:** Runs comfortably on consumer hardware in `.gguf` format. |
| 44 | +- **Excellent at rewriting:** Ideal for polishing or correcting drafts based on |
| 45 | +- reviewer feedback. |
| 46 | +- **Local privacy:** No online requests; all refinement happens locally. |
| 47 | +- **Lightweight:** Fits the project's goal of low-cost, local execution. |
| 48 | + |
| 49 | +### 2.3 Why a Critic–Refiner System? |
| 50 | + |
| 51 | +This architecture ensures: |
| 52 | + |
| 53 | +- The **Critic** checks for correctness, consistency, and missing facts. |
| 54 | +- The **Refiner** rewrites only the necessary corrections. |
| 55 | +- The workflow minimizes hallucinations and guarantees source-grounded answers. |
| 56 | + |
| 57 | +This structure is heavily inspired by **self-correcting LLM systems** and |
| 58 | + **Human-in-the-Loop editorial workflows**, but automated. |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +## 3. 🛠️ Methodology: Retrieval-Augmented Generation (RAG) |
| 63 | + |
| 64 | +To answer questions based on documents not included in the LLM’s training data, |
| 65 | + RAG augments the model’s knowledge using retrieval. |
| 66 | + |
| 67 | +The pipeline works as follows: |
| 68 | + |
| 69 | +1. **Retrieval:** |
| 70 | + User question → Convert to embedding → Search vector index → Retrieve relevant |
| 71 | + text chunks. |
| 72 | + |
| 73 | +2. **Draft Generation:** |
| 74 | + The retrieved context + question are used to generate a **draft answer**. |
| 75 | + |
| 76 | +3. **Critic Evaluation (Qwen2.5):** |
| 77 | + The critic compares the draft answer against the retrieved context and returns: |
| 78 | + - `[OK]` — Draft is accurate |
| 79 | + - `[REVISE]` — Draft contains errors/missing info |
| 80 | + - plus a bulleted list of required corrections. |
| 81 | + |
| 82 | +4. **Refinement (LAmini):** |
| 83 | + LAmini rewrites the draft based **only on the critic’s feedback**, producing |
| 84 | + the final polished answer. |
| 85 | + |
| 86 | +This ensures accuracy and consistency with the source documents. |
| 87 | + |
| 88 | +### Implementation Details |
| 89 | + |
| 90 | +- **Framework:** `LlamaIndex` |
| 91 | +- **Local Model Loader:** `llama-cpp-python` |
| 92 | +- **Embedding Model:** `HuggingFaceEmbedding` (e.g., BAAI/bge-small) |
| 93 | +- **Critic Model:** `Qwen/Qwen2.5-7B-Instruct` via HuggingFace Inference API |
| 94 | +- **Refiner Model:** `LAmini-Chat` in `.gguf` format |
| 95 | +- **Energy Tracking:** CodeCarbon (`OfflineEmissionsTracker`) |
| 96 | + |
| 97 | +--- |
| 98 | + |
| 99 | +## 4. 📑 Prompt Engineering: The Editorial Workflow |
| 100 | + |
| 101 | +### 4.1 Critic Prompt |
| 102 | + |
| 103 | +The Critic acts like a strict editor. |
| 104 | + |
| 105 | +It must: |
| 106 | + |
| 107 | +- Judge the draft answer |
| 108 | +- Compare it with the source context |
| 109 | +- Output `[OK]` or `[REVISE]` |
| 110 | +- Provide bullet-point feedback only when necessary |
| 111 | + |
| 112 | +Example behavior: |
| 113 | +[REVISE] |
| 114 | + |
| 115 | +The draft added information not found in the source context. |
| 116 | + |
| 117 | +Missing key fact about X. |
| 118 | + |
| 119 | +### 4.2 Refiner Prompt (LAmini) |
| 120 | + |
| 121 | +The Refiner receives: |
| 122 | + |
| 123 | +- Draft answer |
| 124 | +- Editor (Critic) feedback |
| 125 | + |
| 126 | +It rewrites the answer accordingly, following strict rules: |
| 127 | + |
| 128 | +- Only fix issues the Critic highlighted |
| 129 | +- No new information allowed |
| 130 | +- Must produce a complete final answer |
| 131 | + |
| 132 | +This avoids adding hallucinations and ensures correctness. |
| 133 | + |
| 134 | +--- |
| 135 | + |
| 136 | +## 5. 📊 Sample Workflow (Prompts & Responses) |
| 137 | + |
| 138 | +You can include your own examples below. |
| 139 | + |
| 140 | +### Example: [Summarization] |
| 141 | + |
| 142 | +- **Prompt:** |
| 143 | + > In your own words, describe what happened when the computer alarms |
| 144 | + appeared during the landing. |
| 145 | + |
| 146 | +### **Cycle 1:** |
| 147 | + |
| 148 | +- **Draft Answer:** |
| 149 | + >As the descent began, Armstrong and Aldrin found themselves |
| 150 | + passing landmarks on the surface two or three seconds early. The problem could |
| 151 | + have been mascons, concentrations of high mass in a region or regions of the |
| 152 | + Moon's crust that contains a gravitational anomaly, potentially altering Eagle's |
| 153 | + trajectory. Five minutes into the descent, the burn and 6,000 feet (1,800 m) |
| 154 | + above the surface of the moon, the LM guidance computer (LGC) distracted the |
| 155 | + crew with the first of several unexpected 1201 and 1202 program alarms. Inside |
| 156 | + Mission Control Center, computer engineer Jack Garman told Guidance Officer |
| 157 | + Steve Bales it was safe to continue the descent. The program alarm indicated |
| 158 | + "executive overflows," meaning the guidance computer could not complete all its |
| 159 | + tasks in real-time and had to postpone some of them. Margaret Hamilton, the |
| 160 | + Director of Apollo Flight Computer Programming at the MIT Charles Stark Draper |
| 161 | + Laboratory later recalled: "To blame the computer for the Apollo 11 problems is |
| 162 | + like blaming the person who spots a fire and calls the fire department. |
| 163 | + Actually, the computer was programmed to do more than recognize error |
| 164 | + conditions. A complete set of recovery programs was incorporated into the |
| 165 | + software. The software's action, in this case, was to eliminate lower priority |
| 166 | + tasks and re-establish the more important ones. The computer, rather than almost |
| 167 | + forcing an abort, prevented an abourt. If the computer hadn't recognized this |
| 168 | + problem and took recovery action, I doubt if Apollo 11 would have been the |
| 169 | + successful Moon landing it was. |
| 170 | + |
| 171 | +- **Critic Response:** |
| 172 | + |
| 173 | + >- The draft does not accurately describe the computer alarms that appeared |
| 174 | + during the landing. The draft should include the specific computer alarms |
| 175 | + (1201 and 1202) and the crew's reaction to them. |
| 176 | + >- The draft does not mention the specific actions taken by the crew in |
| 177 | + response to the computer alarms, such as Armstrong taking semi-automatic control. |
| 178 | + >- The draft does not include the details about Mission Control's role in |
| 179 | + addressing the alarms and the reassurance given to the crew. |
| 180 | + >- The draft does not mention the post-mission analysis showing the real fuel |
| 181 | + remaining was probably closer to 50 seconds, which is critical information. |
| 182 | + |
| 183 | +### **Cycle 2:** |
| 184 | + |
| 185 | +- **Draft Answer:** |
| 186 | + >The LM guidance computer (LGC) distracted the crew with the first |
| 187 | + of several unexpected 1201 and 1202 program alarms during the Apollo 11 mission. |
| 188 | + The program alarm indicated "executive overflows," meaning the guidance computer |
| 189 | + could not complete all its tasks in real-time and had to postpone some of them. |
| 190 | + The recovery programs were incorporated into the software, eliminating lower |
| 191 | + priority tasks and re-establishing the more important ones. Margaret Hamilton, |
| 192 | + the Director of Apollo Flight Computer Programming at the MIT Charles Stark |
| 193 | + Draper Laboratory later recalled that to blame the computer for the Apollo 11, |
| 194 | + problems is like blaming the person who spots a fire and calls the fire |
| 195 | + department. Actually, the computer was programmed to do more than recognize |
| 196 | + error conditions. The software's action, in this case, was to eliminate lower |
| 197 | + priority jobs and regain the more significant ones. If the computer hadn't |
| 198 | + recognized this problem and took recovery action, I doubt if Apollo 11 would |
| 199 | + have been the successful Moon landing it was. |
| 200 | + |
| 201 | +- **Critic Response:** |
| 202 | + |
| 203 | + >- The draft does not mention the specific computer alarms (1201 and 1202) |
| 204 | + that appeared during the landing. |
| 205 | + >- The draft does not reference the exact timing of the alarms (five minutes |
| 206 | + into the descent, 6,000 feet above the surface). |
| 207 | + >- The draft does not include the specific actions taken by Mission Control |
| 208 | + to address the alarms. |
| 209 | + >- The draft does not mention the critical role of the guidance computer in |
| 210 | + safely continuing the descent despite the alarms. |
| 211 | + >- The draft does not include the perspective of the astronauts experiencing |
| 212 | + the alarms. |
| 213 | + >- The draft does not provide the exact quote from Jack Garman to Steve Bales. |
| 214 | + >- The draft does not mention Margaret Hamilton's role or her specific |
| 215 | + recollection of the event. |
| 216 | + |
| 217 | +- **Refined Answer (Final):** |
| 218 | + |
| 219 | + >The LM guidance computer (LGC) distracted the crew with the first |
| 220 | + of several unexpected 1201 and 1202 program alarms during the Apollo 11 mission. |
| 221 | + The program alarm indicated "executive overflows," meaning the guidance computer |
| 222 | + could not complete all its tasks in real-time and had to postpone some of them. |
| 223 | + The recovery programs were incorporated into the software, eliminating lower |
| 224 | + priority tasks and re-establishing the more important ones. Margaret Hamilton, |
| 225 | + the Director of Apollo Flight Computer Programming at the MIT Charles Stark |
| 226 | + Draper Laboratory later recalled that to blame the computer for the Apollo 11, |
| 227 | + problems is like blaming the person who spots a fire and calls the fire |
| 228 | + department. Actually, the computer was programmed to do more than recognize |
| 229 | + error conditions. The software's action, in this case, was to eliminate lower |
| 230 | + priority jobs and regain the more significant ones. If the computer hadn't |
| 231 | + recognized this problem and took recovery action, I doubt if Apollo 11 would |
| 232 | + have been the successful Moon landing it was. |
| 233 | + |
| 234 | +--- |
| 235 | + |
| 236 | +## 6. 🌱 Environmental Tracking |
| 237 | + |
| 238 | +We used **CodeCarbon** to measure local compute emissions and energy usage. |
| 239 | + |
| 240 | +This enables: |
| 241 | + |
| 242 | +- Transparency regarding energy cost |
| 243 | +- Comparison with API-based approaches |
| 244 | +- Understanding environmental impact on local hardware |
| 245 | + |
| 246 | +--- |
| 247 | + |
| 248 | +## 7. 📚 References (Reputable Sources) |
| 249 | + |
| 250 | +All documentation used: |
| 251 | + |
| 252 | +- Hugging Face Inference API |
| 253 | + <https://huggingface.co/docs/api-inference> |
| 254 | + |
| 255 | +- LlamaIndex Documentation |
| 256 | + <https://docs.llamaindex.ai> |
| 257 | + |
| 258 | +- LAmini Models |
| 259 | + <https://huggingface.co/LinkSoul/LAmini-Chat> |
| 260 | + |
| 261 | +- Qwen2.5 Models |
| 262 | + <https://huggingface.co/Qwen> |
| 263 | + |
| 264 | +- LlamaCPP / GGUF Models |
| 265 | + <https://github.com/ggerganov/llama.cpp> |
| 266 | + |
| 267 | +- CodeCarbon |
| 268 | + <https://mlco2.github.io/codecarbon/> |
| 269 | + |
| 270 | +--- |
| 271 | + |
| 272 | +## 8. ✅ Summary |
| 273 | + |
| 274 | +This project demonstrates a powerful hybrid RAG architecture that blends cloud |
| 275 | + reasoning and local refinement. |
| 276 | +Using a Critic–Refiner pipeline dramatically increases accuracy, reduces |
| 277 | + hallucinations, and ensures answers remain faithful to the source documents. |
| 278 | + |
| 279 | +LAmini provides fast, private, offline rewriting, while Qwen2.5 guarantees |
| 280 | + high-quality factual evaluation. |
| 281 | + |
| 282 | +Together, they form a reliable, cost-efficient, and production-ready RAG system. |
0 commit comments