|
| 1 | +--- |
| 2 | +title: "2025 Year-End Reflections: CompilerResearch Group (Personal Perspective)" |
| 3 | +layout: post |
| 4 | +excerpt: | |
| 5 | + A personal look at 2025: we moved from prototypes to infrastructure people can |
| 6 | + run on real code: shipping Clad; maturing CppInterOp; and growing a community. |
| 7 | + This recap focuses on the engineering, mentorship, and cross-domain impact |
| 8 | + that made that shift possible. |
| 9 | +sitemap: true |
| 10 | +author: Vassil Vassilev |
| 11 | +permalink: blogs/cr25_recap/ |
| 12 | +banner_image: /images/blog/vv-2025-recap.webp |
| 13 | +date: 2026-01-07 |
| 14 | +tags: [2025, recap, year-in-review, compiler-research, open-source, community] |
| 15 | +--- |
| 16 | + |
| 17 | +If 2024 was the year we sketched the map, 2025 was the year we started paving |
| 18 | +roads on it: not always smoothly, but in places where people actually needed to |
| 19 | +walk. We deliberately shifted from "research prototypes" to "infrastructure |
| 20 | +people can try on real code": releases you can install, tape designs that |
| 21 | +survive long runs, device-side building blocks for gradients, and interactive |
| 22 | +C++ that you can step through in a notebook. That shift was the story of the |
| 23 | +year: technical grit plus mentorship, repeated many times over. |
| 24 | + |
| 25 | +The year did not feel dramatic while we were living it. There was no single |
| 26 | +breakthrough moment, no clean narrative arc. Instead, there were releases that |
| 27 | +almost worked, benchmarks that failed for reasons we didn’t yet understand, and |
| 28 | +long stretches where progress looked like deleting code rather than adding it. |
| 29 | + |
| 30 | +And yet, by the end of the year, something had shifted. People were no longer |
| 31 | +asking *“can this work?”*, they were asking *“how do I use this?”* That quiet |
| 32 | +transition from possibility to expectation defined 2025 for |
| 33 | +compiler-research.org. |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## Differentiable Programming |
| 38 | + |
| 39 | +Clad had been a compelling idea for years: automatic differentiation implemented |
| 40 | +*by the compiler itself*, operating directly on C++ ASTs rather than via |
| 41 | +operator overloading or purely runtime tapes. The more features we |
| 42 | +added the more we saw the benefits of an compiler-based AD system being a first |
| 43 | +class citizen in static languages. |
| 44 | + |
| 45 | +Our integration efforts demonstrated significant speedups of various physics |
| 46 | +workflows based on the RooFit system -- up to 10x faster likelihood evaluation |
| 47 | +which sped up people's workflows without them changing a single line of |
| 48 | +code[\[0\]][ref0]! |
| 49 | + |
| 50 | +### From promising ideas to software that breaks loudly |
| 51 | + |
| 52 | +In 2025 we finally had to pay the cost of our ambitious plan. Putting Clad into |
| 53 | +the hands of users forced us to confront problems we had been politely ignoring: |
| 54 | +tape memory pressure, allocation churn, subtle thread-safety interactions with |
| 55 | +OpenMP, multi-platform packaging, and a hundred ways in which generated code can |
| 56 | +be correct in theory and brittle in practice. The striking lesson of the year |
| 57 | +was that *theory is cheap; engineering is expensive*. |
| 58 | + |
| 59 | +So we did the expensive work. Aditi Milind Joshi rethought the tape. Together |
| 60 | +with Parth Aurora they introduced layered (slab) allocation and small-buffer |
| 61 | +optimizations so that tiny computations stay stack-local while larger workloads |
| 62 | +spill into contiguous heap slabs -- lowering allocation overhead and improving |
| 63 | +cache utilization for the backward pass[\[1\]][ref1]. Petro Zarytskyi reworked |
| 64 | +scheduling so reverse passes do less redundant work and produce smaller, more |
| 65 | +stable adjoint code[\[2\]][ref2]. Those are boring sentences on a page, however, |
| 66 | +they make the difference between a demo and a run that works efficiently on a |
| 67 | +real dataset. Galin Bistrev worked on the adoption of automatic differentiation |
| 68 | +in CMS Combine[\[11\]][ref11]. |
| 69 | + |
| 70 | + |
| 71 | +### GPU differentiation: turning challenges into progress |
| 72 | + |
| 73 | +CPU reverse-mode felt like careful negotiation; GPU reverse-mode was an |
| 74 | +opportunity to learn and improve. |
| 75 | + |
| 76 | +The work of Christina Koutsou and Abdelrhman Elrawy enabled users to write |
| 77 | +high-level device code (Thrust, device vectors) while still computing |
| 78 | +gradients. This meant implementing custom pullbacks for many Thrust |
| 79 | +primitives—reduce, transform, scan, inner_product—and validating them with heavy |
| 80 | +benchmarks like RSBench and LULESH. Along the way, subtle behaviors emerged: |
| 81 | +data races, memory aliasing, and tricky index assumptions[\[3\]][ref3]. Maksym |
| 82 | +Andriichuk implemented a set of analyses that help reducing the conservative |
| 83 | +atomic synchronization points making the CUDA generated code more |
| 84 | +optimal[\[4\]][ref4]. |
| 85 | + |
| 86 | +Far from setbacks, these discoveries guided our roadmap. We added thread-safety |
| 87 | +checks for injective index patterns, deterministic memory policies for device |
| 88 | +allocations, and a verified catalog of Thrust pullbacks. The result? GPU |
| 89 | +differentiation moved from a research goal to practical, reliable functionality. |
| 90 | + |
| 91 | + |
| 92 | +### Cladtorch & compiler-driven ML: where compilers and ML talk seriously |
| 93 | + |
| 94 | +Rohan Timmaraju led one of the year's more provocative efforts was to see |
| 95 | +whether compiler-driven AD in C++ could be a practical path for training |
| 96 | +medium-sized networks[\[5\]][ref5]. |
| 97 | + |
| 98 | +The early versions were elegant but slow. Abstractions (temporary objects, RAII, |
| 99 | +high-level tensor wrappers) were doing what abstractions always do: hiding costs |
| 100 | +that matter at tight loops. The experiment that changed things pivoted to a |
| 101 | +simple truth: if the compiler can see everything and the data layout is optimal, |
| 102 | +it can produce lower-overhead code than a heavy Python runtime. |
| 103 | + |
| 104 | +Concretely, that meant moving from an object-oriented C++ tensor library to a |
| 105 | +minimalist, arena-style engine: a single, contiguous pre-allocated buffer that |
| 106 | +held parameters, activations, and gradients. That design removed most of the |
| 107 | +allocation and context-switching overhead and gave the compiler a global |
| 108 | +allocation layout to optimize. In CPU-bound tests the arena-based approach |
| 109 | +reduced overhead and produced iteration speeds competitive with tuned Python |
| 110 | +stacks on some workloads [\[4\]][ref4]. The result was not "we beat PyTorch |
| 111 | +everywhere" but it was a concrete demonstration that compile-time AD has real |
| 112 | +leverage when memory layout and kernel fusion are designed for it. Next in the |
| 113 | +plan is porting that work from CPU to GPU. |
| 114 | + |
| 115 | +That experience taught us how to think about co-design: compiler optimizations |
| 116 | +plus memory layout plus tight kernels. The lesson will inform both our ML |
| 117 | +experiments and how we approach HPC workloads going forward. |
| 118 | + |
| 119 | +--- |
| 120 | + |
| 121 | + |
| 122 | +## Compiler as a service: the tooling that makes C++ alive |
| 123 | + |
| 124 | +A quiet but consequential part of 2025 enhancing interactive C++. Clang-Repl |
| 125 | +continued to evolve in a stable and predictable manner. |
| 126 | + |
| 127 | +### Xeus |
| 128 | + |
| 129 | +Anutosh Bhat pushed browser-side experiments in xeus-cpp using a Wasm |
| 130 | +incremental executor approach (compile small units to standalone Wasm modules |
| 131 | +and link at runtime) so that C++ REPL sessions can run without a |
| 132 | +server[\[5\]][ref5].That made classroom demos and quick experiments far more |
| 133 | +accessible. |
| 134 | + |
| 135 | +At the same time, Abhinav Kumar implemented LLDB/DAP integration for the |
| 136 | +notebook/Jupyter flow so people can set breakpoints, step through generated |
| 137 | +code, and inspect variables. The change is subtle: once users can *debug* |
| 138 | +generated code, they stop treating it as magic and start contributing |
| 139 | +fixes[\[6\]][ref6]. |
| 140 | + |
| 141 | + |
| 142 | +### CppInterOp |
| 143 | + |
| 144 | +CppInterOp matured to a point where it became a backbone of C++ interoperability |
| 145 | +in the newly developed jank-lang[\[7\]][ref7]. The jank-lang author Jeaye |
| 146 | +Wilkerson collaborated with our team and donated to sponsor some of our |
| 147 | +developments. |
| 148 | + |
| 149 | +Aaron Jomy led the integration of the library in the ROOT framework while Vipul |
| 150 | +Cariappa led its integration within the cppyy ecosystem. |
| 151 | + |
| 152 | +Sahil Patidar quietly and persistently shaped the supporting LLVM and Clang |
| 153 | +infrastructure and committed downstream code to the LLVM mainline. |
| 154 | + |
| 155 | +Matthew Barton kept our infrastructure sane and reduced the CI noise to minimum |
| 156 | +this year which greatly helped our overall development. |
| 157 | + |
| 158 | +--- |
| 159 | + |
| 160 | + |
| 161 | +## Cross-disciplinary work: where system engineering matters |
| 162 | + |
| 163 | +In 2025, we deliberately expanded our cross-domain engagement. Our goal was to |
| 164 | +understand where our technologies could have impact beyond their original |
| 165 | +context and to invest in making them usable in those settings. One of the most |
| 166 | +rewarding outcomes was seeing our tools not just support, but improve if not |
| 167 | +reshape, domain-specific workflows. |
| 168 | + |
| 169 | +- **Genomics (RAMTools):** Aditya Pandey adapted RNTuple-style columnar storage |
| 170 | + concepts from high-energy physics to genomic alignment queries. The result |
| 171 | + was measurable speedups for several analytic workloads and, in some cases, |
| 172 | + reduced storage overhead. What began as a student project now highlights |
| 173 | + practical data-engineering synergies between HEP and genomics[\[8\]][ref8]. |
| 174 | + |
| 175 | +- **Cancer simulation (CARTopiaX):** Salvador de la Torre Gonzalez developed an |
| 176 | + agent-based CAR-T simulator on top of BioDynaMo, using our tooling to |
| 177 | + accelerate simulations and improve experimental reproducibility. While |
| 178 | + modest in scope, this work represents a concrete step toward tissue-aware |
| 179 | + digital twins for preclinical research[\[9\]][ref9]. |
| 180 | + |
| 181 | +- **Disaster response (NEO-FLOOD):** Rohan Timmaraju applied compiler and |
| 182 | + systems thinking contributed to a NASA-recognized project that demonstrates |
| 183 | + low-power, on-satellite inference pipelines using neuromorphic processors |
| 184 | + for rapid flood mapping, showing how our work can touch mission-critical |
| 185 | + applications when integrated properly[\[10\]][ref10]. |
| 186 | + |
| 187 | +None of these efforts were accidental. They emerged from sustained collaboration |
| 188 | +between domain scientists and systems engineers—and from a shared confidence in |
| 189 | +the tools we build. |
| 190 | + |
| 191 | +--- |
| 192 | + |
| 193 | +## Broader impact |
| 194 | + |
| 195 | +### The people — mentor, ship, repeat |
| 196 | + |
| 197 | +One of the clearest signals that we are doing something right is watching people |
| 198 | +grow into the work. In 2025 we saw contributors arrive cautiously fixing a small |
| 199 | +bug, asking careful questions, and leave the year owning real subsystems. For |
| 200 | +many of them, this was not just another open-source contribution. It became |
| 201 | +something concrete they could point to: a body of work that shaped interviews, |
| 202 | +graduate school applications, and their own sense of what they were capable of |
| 203 | +building. |
| 204 | + |
| 205 | +That kind of growth does not happen by accident. It only happens when mentorship |
| 206 | +is present, patient, and deeply technical. |
| 207 | + |
| 208 | +Jonas Rembser's steady guidance, both mathematical and practical, was |
| 209 | +essential in helping us confront the hardest performance questions in the |
| 210 | +RooFit-driven Clad use cases. When things became subtle or ambiguous, Jonas |
| 211 | +helped anchor discussions in first principles without losing sight of real |
| 212 | +constraints. |
| 213 | + |
| 214 | +Harshitha Menon brought a calm, scientific clarity to our benchmarking and |
| 215 | +workflow analysis. Her ability to methodically dissect performance behavior and |
| 216 | +suggest meaningful optimizations helped turn noisy measurements into actionable |
| 217 | +improvements. |
| 218 | + |
| 219 | +Luciana Melina Luque's deep understanding of agent-based modeling and CAR-T cell |
| 220 | +therapy shaped the CARTopiaX work in ways we could not have faked. Her domain |
| 221 | +expertise ensured that the simulations we built were not just faster, but |
| 222 | +scientifically grounded. |
| 223 | + |
| 224 | +Martin Vassilev played a key role in shaping RAMTools, helping bridge ideas from |
| 225 | +high-energy physics data handling into a genomics context that demanded both |
| 226 | +rigor and pragmatism. |
| 227 | + |
| 228 | +Vipul Cariappa and Anutosh Bhat brought consistency and hard-won knowledge of |
| 229 | +low-level tooling to the xeus-cpp debugging infrastructure. Their work quietly |
| 230 | +but decisively raised the bar for what interactive C++ debugging can feel like |
| 231 | +in practice. |
| 232 | + |
| 233 | +Parth Arora's deep command of data structures and algorithms made a tangible |
| 234 | +difference in the tape infrastructure. His contributions helped us simplify, |
| 235 | +tighten, and reason about some of the most performance-critical paths in the |
| 236 | +system. |
| 237 | + |
| 238 | +Looking back, it is clear that the year's technical progress is inseparable from |
| 239 | +these human investments. Code shipped because people were supported. Systems |
| 240 | +matured because knowledge was shared. And the next generation of contributors |
| 241 | +emerged not by being shielded from complexity, but by being trusted with it. |
| 242 | + |
| 243 | +That cycle is the mechanism by which this work continues to exist. |
| 244 | + |
| 245 | + |
| 246 | +### Community and leadership |
| 247 | + |
| 248 | +In 2025, our engagement with the broader community became more intentional. We |
| 249 | +did not just report progress: we used workshops and meetings as places to test |
| 250 | +ideas in public, invite criticism, and ground our research in real use cases. |
| 251 | + |
| 252 | +We shared work across several established venues. CARTopiaX and |
| 253 | +CppInterOp-powered cppyy were presented at the ROOT Users Workshop, where |
| 254 | +discussions with ROOT developers and users directly shaped follow-up |
| 255 | +work. CARTopiaX was also presented at the Foundations of Oncological Digital |
| 256 | +Twins workshop in Cambridge, where clinical and modeling perspectives helped us |
| 257 | +sharpen both the technical assumptions and the scientific framing. Our progress |
| 258 | +on automatic differentiation and CUDA was presented at MODE 2025, alongside |
| 259 | +updates on RooFit autodiff work that were also discussed at CMS CAT |
| 260 | +meetings. These venues were particularly valuable because they exposed our |
| 261 | +compiler-centric ideas to domain experts who are quick to ask the hard, |
| 262 | +practical questions. |
| 263 | + |
| 264 | +Beyond participating, we also stepped into a convening role. This year we |
| 265 | +organized the first edition of CompilerResearchCon, a small, focused conference |
| 266 | +designed to bring together contributors, users, and curious newcomers. |
| 267 | +[CompilerResearchCon](/crcon2025/) became a focal point for the project. Its |
| 268 | +success confirmed something we suspected that our community benefits most from |
| 269 | +formats that are compact, technical, and conversation-driven. |
| 270 | + |
| 271 | +We were also honored to organize the |
| 272 | +[EuroAD](https://indico.cern.ch/e/EuroAD-2025) workshop, which brought together |
| 273 | +researchers working on automatic differentiation from compiler, ML, and |
| 274 | +scientific computing perspectives. There, we presented our work on |
| 275 | +differentiating object-oriented C++ code and shared experiences on teaching |
| 276 | +differentiable programming to students. More importantly, EuroAD created space |
| 277 | +for aligning expectations between theory and practice — exactly the kind of |
| 278 | +alignment our work depends on. |
| 279 | + |
| 280 | +--- |
| 281 | + |
| 282 | +## Looking ahead: where the work continues |
| 283 | + |
| 284 | +If 2025 taught us anything, it is that infrastructure is never "done". It either |
| 285 | +hardens under real use, or it quietly erodes. |
| 286 | + |
| 287 | +There are three areas where we know that the work must continue in 2026. |
| 288 | + |
| 289 | +First, GPU reverse-mode at scale. The Thrust primitives and end-to-end demos we |
| 290 | +built this year are real progress, but they are still building blocks rather |
| 291 | +than a turnkey solution. Arbitrary kernels, complex memory access patterns, and |
| 292 | +predictable performance remain open problems. Benchmarks like RSBench and LULESH |
| 293 | +are no longer aspirational demos for us; they are acceptance tests, and they |
| 294 | +will continue to be the standard we measure ourselves against. |
| 295 | + |
| 296 | +Second, packaging and cross-platform reliability. macOS and Windows failures, |
| 297 | +fragile upstream test matrices, and dependency churn still consume an outsized |
| 298 | +amount of maintainer time. None of this work is glamorous, but all of it |
| 299 | +determines whether someone can actually try our tools without giving up. A |
| 300 | +focused investment here would likely unlock more adoption than any single new |
| 301 | +feature. |
| 302 | + |
| 303 | +Third, shared JIT and interoperability hardening. The idea of a shared JIT model |
| 304 | +between CppInterOp, Numba, and notebook environments continues to show real |
| 305 | +promise for interactive performance and usability. But symbol resolution, thread |
| 306 | +safety, and long-running session stability need careful, disciplined engineering |
| 307 | +-- and far more integration testing -- before that promise becomes something |
| 308 | +users can rely on. |
| 309 | + |
| 310 | +These are not research risks. They are engineering commitments. |
| 311 | + |
| 312 | + |
| 313 | +## Epilogue: why this matters — beyond code |
| 314 | + |
| 315 | +We did not spend 2025 chasing visibility or novelty. We spent it making things |
| 316 | +that bend workflows. We turned student curiosity into real engineering |
| 317 | +capacity. And we ended the year with something that feels different from before: |
| 318 | +weight. |
| 319 | + |
| 320 | +Once a compiler primitive becomes reliable enough to use, it reshapes design |
| 321 | +choices in other projects. It becomes a lever that domain scientists pull |
| 322 | +without thinking about compilers at all. And, quietly, it creates career paths: |
| 323 | +for students who learn to debug generated code; for contributors who become |
| 324 | +maintainers; and for researchers who discover that infrastructure work can carry |
| 325 | +scientific weight. |
| 326 | + |
| 327 | +The tools we maintain now matter in other people's pipelines. They surface real |
| 328 | +problems. They attract collaborators. They are no longer purely speculative. |
| 329 | + |
| 330 | +If you read this and want to help you can submit bug report, contribute a test, |
| 331 | +or look at the list of [open projects](/open_projects) -- that kind of |
| 332 | +contribution is exactly how fragile, useful tools turn into durable |
| 333 | +infrastructure. |
| 334 | + |
| 335 | +[ref0]: https://root.cern/blog/roofit-ad/ |
| 336 | +[ref1]: /blogs/gsoc25_aditi_final_blog/ |
| 337 | +[ref2]: /blogs/2025_petro_zarytskyi_introduction_blog/ |
| 338 | +[ref3]: /presentations/#MODE2025CUDA |
| 339 | +[ref4]: /blogs/gsoc25_andriichuk_final_blog/ |
| 340 | +[ref5]: /blogs/gsoc25_rohan_final_blog/ |
| 341 | +[ref5]: https://blog.jupyter.org/c-in-jupyter-interpreting-c-in-the-web-c9d93542f20b |
| 342 | +[ref6]: /blogs/gsoc25_abhinav_kumar_final_blog/ |
| 343 | +[ref7]: https://jank-lang.org/blog/2025-06-06-next-phase-of-interop/ |
| 344 | +[ref8]: /blogs/gsoc25_aditya_pandey_final_blog/ |
| 345 | +[ref9]: /blogs/gsoc25_salvador_wrapup_blog/ |
| 346 | +[ref10]: /blogs/rohan-timmaraju-neo-flood-nasa/ |
| 347 | +[ref11]: /blogs/2025_galin_bistrev_results_blog/ |
0 commit comments