Skip to content

improve performance and memory usage in HashUtils.djb2#28

Closed
skimbrel-figma wants to merge 3 commits intostatsig-io:mainfrom
skimbrel-figma:patch-1
Closed

improve performance and memory usage in HashUtils.djb2#28
skimbrel-figma wants to merge 3 commits intostatsig-io:mainfrom
skimbrel-figma:patch-1

Conversation

@skimbrel-figma
Copy link
Contributor

Runtime profiling indicates this method generates several many memory allocations.

Comparing to the JS implementation, we saw the intent of the hash &= hash line was to force the JS runtime to keep the number as a 32-bit integer. This is indeed the correct way to do it in JS, but not in Ruby; as a result, the hash local will grow ever larger, requiring more and more memory since Ruby supports unbounded integers.

Fix: truncate the hash value on each iteration with the same 32-bit 0xFFFFFFF constant used at the end instead.

Runtime profiling indicates this method generates several many memory allocations.

Comparing to the JS implementation, we saw the intent of the `hash &= hash` line was to force the JS runtime to keep the number as a 32-bit integer. This is indeed the correct way to do it in JS, but not in Ruby; as a result, the `hash` local will grow ever larger, requiring more and more memory since Ruby supports unbounded integers.

Fix: truncate the hash value on each iteration with the same 32-bit `0xFFFFFFF` constant used at the end instead.
@skimbrel-figma
Copy link
Contributor Author

Benchmarking before/after with a 256-char string on my laptop:

irb(main):199:0> puts Benchmark.measure { n.times { djb2_fixed(short_str) }}
  0.003365   0.000043   0.003408 (  0.003415)
=> nil                                                                                       
irb(main):200:0> puts Benchmark.measure { n.times { djb2(short_str) }}
  0.024926   0.001164   0.026090 (  0.026234)
=> nil    

A quick, imprecise comparison by inspecting GC.stat[:total_allocated_objects] before/after also shows 60% reduction in memory allocations.

I spot-checked a handful of input values to ensure the output hash value did not change.

@tore-statsig
Copy link
Contributor

Nice find! Pulling this in to run tests on it now

@skimbrel-figma
Copy link
Contributor Author

@tore-statsig actually i just realized we can do even better — the only thing inside the each loop is .ord, which is available as each_codepoint! i'll update.

statsig-kong bot pushed a commit that referenced this pull request May 14, 2025
#28

"""
Runtime profiling indicates this method generates several many memory
allocations.

Comparing to the JS implementation, we saw the intent of the `hash &=
hash` line was to force the JS runtime to keep the number as a 32-bit
integer. This is indeed the correct way to do it in JS, but not in Ruby;
as a result, the `hash` local will grow ever larger, requiring more and
more memory since Ruby supports unbounded integers.

Fix: truncate the hash value on each iteration with the same 32-bit
`0xFFFFFFF` constant used at the end instead.
"""

Co-authored-by: Sam Kimbrel <98781278+skimbrel-figma@users.noreply.github.com>
@tore-statsig
Copy link
Contributor

The original version of this is released

https://github.com/statsig-io/ruby-sdk/releases/tag/2.4.2

@skimbrel-figma
Copy link
Contributor Author

great! we've been running the second commit (switching to each_codepoint) as a monkeypatch for most of a week now and it seems totally fine + reduces memory still further 👍

statsig-kong bot pushed a commit that referenced this pull request May 22, 2025
the only thing inside the each loop is .ord, which is available as
each_codepoint

#28

---------

Co-authored-by: Sam Kimbrel <98781278+skimbrel-figma@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants