Skip to content

Commit d1b2bcf

Browse files
committed
Add a 2025 recap blog post
1 parent 63a9ea7 commit d1b2bcf

File tree

3 files changed

+348
-1
lines changed

3 files changed

+348
-1
lines changed

_posts/2025-11-14-using-root-in-the-field-of-genome-sequencing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ We benchmarked RAMTools using the HG00154 sample from the 1000 Genomes Project,
7474

7575
RNTuple's columnar architecture shows significant speedups, especially for large region queries, when compared to the older ROOT TTree format and CRAM (industry-standard compressed format).
7676

77-
![Region Query Performance](/images/blog/genome_query_time.png)
77+
![Region Query Performance](/images/blog/genome_query_time.png){: .img-responsive }
7878

7979
The benchmarks demonstrate performance across three query sizes:
8080

_posts/2026-01-07-year_recap.md

Lines changed: 347 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,347 @@
1+
---
2+
title: "2025 Year-End Reflections: CompilerResearch Group (Personal Perspective)"
3+
layout: post
4+
excerpt: |
5+
A personal look at 2025: we moved from prototypes to infrastructure people can
6+
run on real code: shipping Clad; maturing CppInterOp; and growing a community.
7+
This recap focuses on the engineering, mentorship, and cross-domain impact
8+
that made that shift possible.
9+
sitemap: true
10+
author: Vassil Vassilev
11+
permalink: blogs/cr25_recap/
12+
banner_image: /images/blog/vv-2025-recap.webp
13+
date: 2026-01-07
14+
tags: [2025, recap, year-in-review, compiler-research, open-source, community]
15+
---
16+
17+
If 2024 was the year we sketched the map, 2025 was the year we started paving
18+
roads on it: not always smoothly, but in places where people actually needed to
19+
walk. We deliberately shifted from "research prototypes" to "infrastructure
20+
people can try on real code": releases you can install, tape designs that
21+
survive long runs, device-side building blocks for gradients, and interactive
22+
C++ that you can step through in a notebook. That shift was the story of the
23+
year: technical grit plus mentorship, repeated many times over.
24+
25+
The year did not feel dramatic while we were living it. There was no single
26+
breakthrough moment, no clean narrative arc. Instead, there were releases that
27+
almost worked, benchmarks that failed for reasons we didn’t yet understand, and
28+
long stretches where progress looked like deleting code rather than adding it.
29+
30+
And yet, by the end of the year, something had shifted. People were no longer
31+
asking *“can this work?”*, they were asking *“how do I use this?”* That quiet
32+
transition from possibility to expectation defined 2025 for
33+
compiler-research.org.
34+
35+
---
36+
37+
## Differentiable Programming
38+
39+
Clad had been a compelling idea for years: automatic differentiation implemented
40+
*by the compiler itself*, operating directly on C++ ASTs rather than via
41+
operator overloading or purely runtime tapes. The more features we
42+
added the more we saw the benefits of an compiler-based AD system being a first
43+
class citizen in static languages.
44+
45+
Our integration efforts demonstrated significant speedups of various physics
46+
workflows based on the RooFit system -- up to 10x faster likelihood evaluation
47+
which sped up people's workflows without them changing a single line of
48+
code[\[0\]][ref0]!
49+
50+
### From promising ideas to software that breaks loudly
51+
52+
In 2025 we finally had to pay the cost of our ambitious plan. Putting Clad into
53+
the hands of users forced us to confront problems we had been politely ignoring:
54+
tape memory pressure, allocation churn, subtle thread-safety interactions with
55+
OpenMP, multi-platform packaging, and a hundred ways in which generated code can
56+
be correct in theory and brittle in practice. The striking lesson of the year
57+
was that *theory is cheap; engineering is expensive*.
58+
59+
So we did the expensive work. Aditi Milind Joshi rethought the tape. Together
60+
with Parth Aurora they introduced layered (slab) allocation and small-buffer
61+
optimizations so that tiny computations stay stack-local while larger workloads
62+
spill into contiguous heap slabs -- lowering allocation overhead and improving
63+
cache utilization for the backward pass[\[1\]][ref1]. Petro Zarytskyi reworked
64+
scheduling so reverse passes do less redundant work and produce smaller, more
65+
stable adjoint code[\[2\]][ref2]. Those are boring sentences on a page, however,
66+
they make the difference between a demo and a run that works efficiently on a
67+
real dataset. Galin Bistrev worked on the adoption of automatic differentiation
68+
in CMS Combine[\[11\]][ref11].
69+
70+
71+
### GPU differentiation: turning challenges into progress
72+
73+
CPU reverse-mode felt like careful negotiation; GPU reverse-mode was an
74+
opportunity to learn and improve.
75+
76+
The work of Christina Koutsou and Abdelrhman Elrawy enabled users to write
77+
high-level device code (Thrust, device vectors) while still computing
78+
gradients. This meant implementing custom pullbacks for many Thrust
79+
primitives—reduce, transform, scan, inner_product—and validating them with heavy
80+
benchmarks like RSBench and LULESH. Along the way, subtle behaviors emerged:
81+
data races, memory aliasing, and tricky index assumptions[\[3\]][ref3]. Maksym
82+
Andriichuk implemented a set of analyses that help reducing the conservative
83+
atomic synchronization points making the CUDA generated code more
84+
optimal[\[4\]][ref4].
85+
86+
Far from setbacks, these discoveries guided our roadmap. We added thread-safety
87+
checks for injective index patterns, deterministic memory policies for device
88+
allocations, and a verified catalog of Thrust pullbacks. The result? GPU
89+
differentiation moved from a research goal to practical, reliable functionality.
90+
91+
92+
### Cladtorch & compiler-driven ML: where compilers and ML talk seriously
93+
94+
Rohan Timmaraju led one of the year's more provocative efforts was to see
95+
whether compiler-driven AD in C++ could be a practical path for training
96+
medium-sized networks[\[5\]][ref5].
97+
98+
The early versions were elegant but slow. Abstractions (temporary objects, RAII,
99+
high-level tensor wrappers) were doing what abstractions always do: hiding costs
100+
that matter at tight loops. The experiment that changed things pivoted to a
101+
simple truth: if the compiler can see everything and the data layout is optimal,
102+
it can produce lower-overhead code than a heavy Python runtime.
103+
104+
Concretely, that meant moving from an object-oriented C++ tensor library to a
105+
minimalist, arena-style engine: a single, contiguous pre-allocated buffer that
106+
held parameters, activations, and gradients. That design removed most of the
107+
allocation and context-switching overhead and gave the compiler a global
108+
allocation layout to optimize. In CPU-bound tests the arena-based approach
109+
reduced overhead and produced iteration speeds competitive with tuned Python
110+
stacks on some workloads [\[4\]][ref4]. The result was not "we beat PyTorch
111+
everywhere" but it was a concrete demonstration that compile-time AD has real
112+
leverage when memory layout and kernel fusion are designed for it. Next in the
113+
plan is porting that work from CPU to GPU.
114+
115+
That experience taught us how to think about co-design: compiler optimizations
116+
plus memory layout plus tight kernels. The lesson will inform both our ML
117+
experiments and how we approach HPC workloads going forward.
118+
119+
---
120+
121+
122+
## Compiler as a service: the tooling that makes C++ alive
123+
124+
A quiet but consequential part of 2025 enhancing interactive C++. Clang-Repl
125+
continued to evolve in a stable and predictable manner.
126+
127+
### Xeus
128+
129+
Anutosh Bhat pushed browser-side experiments in xeus-cpp using a Wasm
130+
incremental executor approach (compile small units to standalone Wasm modules
131+
and link at runtime) so that C++ REPL sessions can run without a
132+
server[\[5\]][ref5].That made classroom demos and quick experiments far more
133+
accessible.
134+
135+
At the same time, Abhinav Kumar implemented LLDB/DAP integration for the
136+
notebook/Jupyter flow so people can set breakpoints, step through generated
137+
code, and inspect variables. The change is subtle: once users can *debug*
138+
generated code, they stop treating it as magic and start contributing
139+
fixes[\[6\]][ref6].
140+
141+
142+
### CppInterOp
143+
144+
CppInterOp matured to a point where it became a backbone of C++ interoperability
145+
in the newly developed jank-lang[\[7\]][ref7]. The jank-lang author Jeaye
146+
Wilkerson collaborated with our team and donated to sponsor some of our
147+
developments.
148+
149+
Aaron Jomy led the integration of the library in the ROOT framework while Vipul
150+
Cariappa led its integration within the cppyy ecosystem.
151+
152+
Sahil Patidar quietly and persistently shaped the supporting LLVM and Clang
153+
infrastructure and committed downstream code to the LLVM mainline.
154+
155+
Matthew Barton kept our infrastructure sane and reduced the CI noise to minimum
156+
this year which greatly helped our overall development.
157+
158+
---
159+
160+
161+
## Cross-disciplinary work: where system engineering matters
162+
163+
In 2025, we deliberately expanded our cross-domain engagement. Our goal was to
164+
understand where our technologies could have impact beyond their original
165+
context and to invest in making them usable in those settings. One of the most
166+
rewarding outcomes was seeing our tools not just support, but improve if not
167+
reshape, domain-specific workflows.
168+
169+
- **Genomics (RAMTools):** Aditya Pandey adapted RNTuple-style columnar storage
170+
concepts from high-energy physics to genomic alignment queries. The result
171+
was measurable speedups for several analytic workloads and, in some cases,
172+
reduced storage overhead. What began as a student project now highlights
173+
practical data-engineering synergies between HEP and genomics[\[8\]][ref8].
174+
175+
- **Cancer simulation (CARTopiaX):** Salvador de la Torre Gonzalez developed an
176+
agent-based CAR-T simulator on top of BioDynaMo, using our tooling to
177+
accelerate simulations and improve experimental reproducibility. While
178+
modest in scope, this work represents a concrete step toward tissue-aware
179+
digital twins for preclinical research[\[9\]][ref9].
180+
181+
- **Disaster response (NEO-FLOOD):** Rohan Timmaraju applied compiler and
182+
systems thinking contributed to a NASA-recognized project that demonstrates
183+
low-power, on-satellite inference pipelines using neuromorphic processors
184+
for rapid flood mapping, showing how our work can touch mission-critical
185+
applications when integrated properly[\[10\]][ref10].
186+
187+
None of these efforts were accidental. They emerged from sustained collaboration
188+
between domain scientists and systems engineers—and from a shared confidence in
189+
the tools we build.
190+
191+
---
192+
193+
## Broader impact
194+
195+
### The people — mentor, ship, repeat
196+
197+
One of the clearest signals that we are doing something right is watching people
198+
grow into the work. In 2025 we saw contributors arrive cautiously fixing a small
199+
bug, asking careful questions, and leave the year owning real subsystems. For
200+
many of them, this was not just another open-source contribution. It became
201+
something concrete they could point to: a body of work that shaped interviews,
202+
graduate school applications, and their own sense of what they were capable of
203+
building.
204+
205+
That kind of growth does not happen by accident. It only happens when mentorship
206+
is present, patient, and deeply technical.
207+
208+
Jonas Rembser's steady guidance, both mathematical and practical, was
209+
essential in helping us confront the hardest performance questions in the
210+
RooFit-driven Clad use cases. When things became subtle or ambiguous, Jonas
211+
helped anchor discussions in first principles without losing sight of real
212+
constraints.
213+
214+
Harshitha Menon brought a calm, scientific clarity to our benchmarking and
215+
workflow analysis. Her ability to methodically dissect performance behavior and
216+
suggest meaningful optimizations helped turn noisy measurements into actionable
217+
improvements.
218+
219+
Luciana Melina Luque's deep understanding of agent-based modeling and CAR-T cell
220+
therapy shaped the CARTopiaX work in ways we could not have faked. Her domain
221+
expertise ensured that the simulations we built were not just faster, but
222+
scientifically grounded.
223+
224+
Martin Vassilev played a key role in shaping RAMTools, helping bridge ideas from
225+
high-energy physics data handling into a genomics context that demanded both
226+
rigor and pragmatism.
227+
228+
Vipul Cariappa and Anutosh Bhat brought consistency and hard-won knowledge of
229+
low-level tooling to the xeus-cpp debugging infrastructure. Their work quietly
230+
but decisively raised the bar for what interactive C++ debugging can feel like
231+
in practice.
232+
233+
Parth Arora's deep command of data structures and algorithms made a tangible
234+
difference in the tape infrastructure. His contributions helped us simplify,
235+
tighten, and reason about some of the most performance-critical paths in the
236+
system.
237+
238+
Looking back, it is clear that the year's technical progress is inseparable from
239+
these human investments. Code shipped because people were supported. Systems
240+
matured because knowledge was shared. And the next generation of contributors
241+
emerged not by being shielded from complexity, but by being trusted with it.
242+
243+
That cycle is the mechanism by which this work continues to exist.
244+
245+
246+
### Community and leadership
247+
248+
In 2025, our engagement with the broader community became more intentional. We
249+
did not just report progress: we used workshops and meetings as places to test
250+
ideas in public, invite criticism, and ground our research in real use cases.
251+
252+
We shared work across several established venues. CARTopiaX and
253+
CppInterOp-powered cppyy were presented at the ROOT Users Workshop, where
254+
discussions with ROOT developers and users directly shaped follow-up
255+
work. CARTopiaX was also presented at the Foundations of Oncological Digital
256+
Twins workshop in Cambridge, where clinical and modeling perspectives helped us
257+
sharpen both the technical assumptions and the scientific framing. Our progress
258+
on automatic differentiation and CUDA was presented at MODE 2025, alongside
259+
updates on RooFit autodiff work that were also discussed at CMS CAT
260+
meetings. These venues were particularly valuable because they exposed our
261+
compiler-centric ideas to domain experts who are quick to ask the hard,
262+
practical questions.
263+
264+
Beyond participating, we also stepped into a convening role. This year we
265+
organized the first edition of CompilerResearchCon, a small, focused conference
266+
designed to bring together contributors, users, and curious newcomers.
267+
[CompilerResearchCon](/crcon2025/) became a focal point for the project. Its
268+
success confirmed something we suspected that our community benefits most from
269+
formats that are compact, technical, and conversation-driven.
270+
271+
We were also honored to organize the
272+
[EuroAD](https://indico.cern.ch/e/EuroAD-2025) workshop, which brought together
273+
researchers working on automatic differentiation from compiler, ML, and
274+
scientific computing perspectives. There, we presented our work on
275+
differentiating object-oriented C++ code and shared experiences on teaching
276+
differentiable programming to students. More importantly, EuroAD created space
277+
for aligning expectations between theory and practice — exactly the kind of
278+
alignment our work depends on.
279+
280+
---
281+
282+
## Looking ahead: where the work continues
283+
284+
If 2025 taught us anything, it is that infrastructure is never "done". It either
285+
hardens under real use, or it quietly erodes.
286+
287+
There are three areas where we know that the work must continue in 2026.
288+
289+
First, GPU reverse-mode at scale. The Thrust primitives and end-to-end demos we
290+
built this year are real progress, but they are still building blocks rather
291+
than a turnkey solution. Arbitrary kernels, complex memory access patterns, and
292+
predictable performance remain open problems. Benchmarks like RSBench and LULESH
293+
are no longer aspirational demos for us; they are acceptance tests, and they
294+
will continue to be the standard we measure ourselves against.
295+
296+
Second, packaging and cross-platform reliability. macOS and Windows failures,
297+
fragile upstream test matrices, and dependency churn still consume an outsized
298+
amount of maintainer time. None of this work is glamorous, but all of it
299+
determines whether someone can actually try our tools without giving up. A
300+
focused investment here would likely unlock more adoption than any single new
301+
feature.
302+
303+
Third, shared JIT and interoperability hardening. The idea of a shared JIT model
304+
between CppInterOp, Numba, and notebook environments continues to show real
305+
promise for interactive performance and usability. But symbol resolution, thread
306+
safety, and long-running session stability need careful, disciplined engineering
307+
-- and far more integration testing -- before that promise becomes something
308+
users can rely on.
309+
310+
These are not research risks. They are engineering commitments.
311+
312+
313+
## Epilogue: why this matters — beyond code
314+
315+
We did not spend 2025 chasing visibility or novelty. We spent it making things
316+
that bend workflows. We turned student curiosity into real engineering
317+
capacity. And we ended the year with something that feels different from before:
318+
weight.
319+
320+
Once a compiler primitive becomes reliable enough to use, it reshapes design
321+
choices in other projects. It becomes a lever that domain scientists pull
322+
without thinking about compilers at all. And, quietly, it creates career paths:
323+
for students who learn to debug generated code; for contributors who become
324+
maintainers; and for researchers who discover that infrastructure work can carry
325+
scientific weight.
326+
327+
The tools we maintain now matter in other people's pipelines. They surface real
328+
problems. They attract collaborators. They are no longer purely speculative.
329+
330+
If you read this and want to help you can submit bug report, contribute a test,
331+
or look at the list of [open projects](/open_projects) -- that kind of
332+
contribution is exactly how fragile, useful tools turn into durable
333+
infrastructure.
334+
335+
[ref0]: https://root.cern/blog/roofit-ad/
336+
[ref1]: /blogs/gsoc25_aditi_final_blog/
337+
[ref2]: /blogs/2025_petro_zarytskyi_introduction_blog/
338+
[ref3]: /presentations/#MODE2025CUDA
339+
[ref4]: /blogs/gsoc25_andriichuk_final_blog/
340+
[ref5]: /blogs/gsoc25_rohan_final_blog/
341+
[ref5]: https://blog.jupyter.org/c-in-jupyter-interpreting-c-in-the-web-c9d93542f20b
342+
[ref6]: /blogs/gsoc25_abhinav_kumar_final_blog/
343+
[ref7]: https://jank-lang.org/blog/2025-06-06-next-phase-of-interop/
344+
[ref8]: /blogs/gsoc25_aditya_pandey_final_blog/
345+
[ref9]: /blogs/gsoc25_salvador_wrapup_blog/
346+
[ref10]: /blogs/rohan-timmaraju-neo-flood-nasa/
347+
[ref11]: /blogs/2025_galin_bistrev_results_blog/

images/blog/vv-2025-recap.webp

141 KB
Loading

0 commit comments

Comments
 (0)