Skip to content

Commit 4367eff

Browse files
committed
add explanations to flamegraph comparisons
1 parent 11165fa commit 4367eff

File tree

1 file changed

+61
-13
lines changed

1 file changed

+61
-13
lines changed

src/blog/tanstack-start-ssr-performance-600-percent.md

Lines changed: 61 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ We highlight the highest-impact patterns below.
4040
We are not claiming that any single line of code is "the" reason. This work spanned over [20 PRs](https://github.com/TanStack/router/compare/v1.154.4...v1.157.18), with still more to come. Every change was validated by:
4141

4242
- a stable load test (same endpoint, same load)
43-
- a CPU profile (flamegraph) that explains the delta
4443
- a before/after comparison on the same benchmark endpoint
44+
- a CPU profile (flamegraph) that explains the delta
4545

4646
### Why feature-focused endpoints
4747

@@ -135,10 +135,23 @@ See: [#6442](https://github.com/TanStack/router/pull/6442), [#6447](https://gith
135135

136136
Like every PR in this series, this change was validated by profiling the impacted method before and after. For example we can see in the example below that the `buildLocation` method went from being one of the major bottlenecks of a navigation to being a very small part of the overall cost:
137137

138-
| | |
139-
| ------ | --------------------------------------------------------------------------------------------------------------------------------------- |
140-
| Before | ![CPU profiling of buildLocation before the changes](/blog-assets/tanstack-start-ssr-performance-600-percent/build-location-before.png) |
141-
| After | ![CPU profiling of buildLocation after the changes](/blog-assets/tanstack-start-ssr-performance-600-percent/build-location-after.png) |
138+
139+
<figure>
140+
<img src="/blog-assets/tanstack-start-ssr-performance-600-percent/build-location-before.png" alt="CPU profiling of buildLocation before the changes">
141+
<figcaption>
142+
<b>Before:</b> The <code>RouterCore.buildLocation</code> (red arrow) method was creating a <code>new URL</code> every time (purple blocks), and then updating its search which re-triggers an expensive parsing step.
143+
</figcaption>
144+
</figure>
145+
146+
<figure>
147+
<img
148+
src="/blog-assets/tanstack-start-ssr-performance-600-percent/build-location-after.png"
149+
alt="CPU profiling of buildLocation after the changes"
150+
>
151+
<figcaption>
152+
<b>After:</b> The <code>isSafeInternal</code> check is able to fully skip the <code>URL</code>. <code>RouterCore.buildLocation</code> becomes an almost insignificant part of the overall cost.
153+
</figcaption>
154+
</figure>
142155
143156
## Finding 2: SSR does not need reactivity
144157

@@ -183,10 +196,26 @@ See: [#6497](https://github.com/TanStack/router/pull/6497), [#6482](https://gith
183196

184197
Taking the example of the `useRouterState` hook, we can see that most of the client-only work was removed from the SSR pass, leading to a ~2x improvement in the total CPU time of this hook.
185198

186-
| | |
187-
| ------ | -------------------------------------------------------------------------------------------------------------------------------------- |
188-
| Before | ![CPU profiling of useRouterState before the changes](/blog-assets/tanstack-start-ssr-performance-600-percent/router-state-before.png) |
189-
| After | ![CPU profiling of useRouterState after the changes](/blog-assets/tanstack-start-ssr-performance-600-percent/router-state-after.png) |
199+
<figure>
200+
<img
201+
src="/blog-assets/tanstack-start-ssr-performance-600-percent/router-state-before.png"
202+
alt="CPU profiling of useRouterState before the changes"
203+
>
204+
<figcaption>
205+
<b>Before:</b> The <code>useRouterState</code> hook was subscribing to the router store, which triggers many sync and memoization calls before calling the <code>select</code> callback.
206+
</figcaption>
207+
</figure>
208+
209+
210+
<figure>
211+
<img
212+
src="/blog-assets/tanstack-start-ssr-performance-600-percent/router-state-after.png"
213+
alt="CPU profiling of useRouterState after the changes"
214+
>
215+
<figcaption>
216+
<b>After:</b> The <code>isServer</code> check is able to skip directly to the <code>select</code> callback.
217+
</figcaption>
218+
</figure>
190219
191220
## Finding 3: server-only fast paths are worth it (when gated correctly)
192221

@@ -234,6 +263,9 @@ See: [#4648](https://github.com/TanStack/router/pull/4648), [#6505](https://gith
234263

235264
Taking the example of the `matchRoutesInternal` method, we can see that its children's total CPU time was reduced by ~25%.
236265

266+
267+
<!-- TODO: these images aren't good. They don't really show an improvement that came from a server-only fast path. -->
268+
237269
| | |
238270
| ------ | -------------------------------------------------------------------------------------------------------------------------------------- |
239271
| Before | ![CPU profiling of interpolatePath before the changes](/blog-assets/tanstack-start-ssr-performance-600-percent/interpolate-before.png) |
@@ -268,10 +300,26 @@ See: [#6456](https://github.com/TanStack/router/pull/6456), [#6515](https://gith
268300

269301
Taking the example of the `startViewTransition` method, we can see that the total CPU time of this method was reduced by >50%.
270302

271-
| | |
272-
| ------ | ------------------------------------------------------------------------------------------------------------------------------------- |
273-
| Before | ![CPU profiling of startViewTransition before the changes](/blog-assets/tanstack-start-ssr-performance-600-percent/delete-before.png) |
274-
| After | ![CPU profiling of startViewTransition after the changes](/blog-assets/tanstack-start-ssr-performance-600-percent/delete-after.png) |
303+
<figure>
304+
<img
305+
src="/blog-assets/tanstack-start-ssr-performance-600-percent/delete-before.png"
306+
alt="CPU profiling of startViewTransition before the changes"
307+
>
308+
<figcaption>
309+
<b>Before:</b> The <code>startViewTransition</code> function (red arrow) has ~400ms of self-time in the hot path (i.e. not including the time spent in its children).
310+
</figcaption>
311+
</figure>
312+
313+
314+
<figure>
315+
<img
316+
src="/blog-assets/tanstack-start-ssr-performance-600-percent/delete-after.png"
317+
alt="CPU profiling of startViewTransition after the changes"
318+
>
319+
<figcaption>
320+
<b>After:</b> Removing the <code>delete</code> statement almost completely removes the self-time of this function.
321+
</figcaption>
322+
</figure>
275323
276324
## Results
277325

0 commit comments

Comments
 (0)