Use Lazy for expensive properties in System Dataset by CaptaiNiveau · Pull Request #550 · bchavez/Bogus

CaptaiNiveau · 2024-05-15T11:30:40Z

In my tests with repeated creation (clones) of Faker instances, the creation of this dataset was by far the most expensive one. Making it lazy should alleviate a lot of GC pressure, until the Dataset is actually used.

For my benchmark I ran 100 generators in parallel, each of which creating ~100 Fakers. The fakers are cloned from a static base config. A generator has all the configs needed to create all the data needed for our app, which it does.
This should also be reproducible with a simpler setup. That would probably drop the lazy version to even lower allocations as there wouldn't be that much data generated from it.


BenchmarkDotNet v0.13.12, Arch Linux
AMD Ryzen 9 7940HS w/ Radeon 780M Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.104
  [Host]     : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  DefaultJob : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
RunBogusGeneratorInParallel	961.9 ms	18.84 ms	22.43 ms	125000.0000	59000.0000	1000.0000	931.62 MB
RunBogusGeneratorInParallel	309.4 ms	7.10 ms	20.26 ms	21000.0000	7000.0000	1000.0000	154.46 MB

Is there anything that would speak against this change? It speeds up data generation a lot in cases like mine without sacrificing functionality. Tests all passed without issues, too.

In my tests with repeated creation (clones) of Faker instances, the creation of this dataset was by far the most expensive one. Making it lazy should alleviate a lot of GC pressure, until the Dataset is actually used.

CaptaiNiveau · 2024-06-12T10:02:18Z

Hey, I don't want to be annoying, but it'd be cool if you'd be able to take a short look at this.
This PR is pretty tiny, but still has a sizeable impact for what I'm doing with Bogus.

Thanks for this great library, it's fun to use :)

bchavez · 2024-07-06T16:59:45Z

Hi @CaptaiNiveau, apologies for the delay.

Thank you for the PR and suggestion. I don't have anything strictly "against" this PR or better performance in general. However, where my main concern is around maintainability; more specifically, introducing Lazy<T> in Dataset constructors for accessing data.

Here, for this specific System dataset, we are introducing a new pattern that is different from the rest of the source code. While it's probably okay in this one specific case (and I could probably live with it), I'm slightly hesitant because it seems to indicate a different kind of performance smell.

That is, in this specific case, it seems we are doing a lot of "runtime computation of data" here with the mime dataset in the constructor. IIRC, the mime data is pretty large; and I suspect doing all these LINQ operations is causing some perf issues in your case.

Your approach seems decent for a performance fix at runtime. Also, I suspect this is a classic computing "speed vs space" tradeoff, where we could potentially push the result of these computations into the dataset resource locale (at compile time) and skip the LINQ computations at the cost of increasing space used in the locale file; and access exts and types from a dedicated locale list. I really hate duplicating the data, though.

The more I think about it, your solution is probably okay.

Also, if you could do me a favor and add your benchmarks (for reproducibility) to the Benchmark folder included with this PR; that would be great and help a lot:

https://github.com/bchavez/Bogus/tree/master/Source/Benchmark

Thanks,
Brian

bchavez · 2024-07-06T17:04:01Z

Source/Bogus/DataSets/System.cs

+            .OfType<BObject>()
+            .SelectMany(bObject =>
            {
-               if( bObject.ContainsKey("extensions") )


also, if you could; small nit: avoid these small reformatting changes in existing source that doesn't change.

Use Lazy for expensive properties in System Dataset

1969b41

In my tests with repeated creation (clones) of Faker instances, the creation of this dataset was by far the most expensive one. Making it lazy should alleviate a lot of GC pressure, until the Dataset is actually used.

bchavez reviewed Jul 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use Lazy for expensive properties in System Dataset#550

Use Lazy for expensive properties in System Dataset#550
CaptaiNiveau wants to merge 1 commit intobchavez:masterfrom
CaptaiNiveau:master

CaptaiNiveau commented May 15, 2024

Uh oh!

CaptaiNiveau commented Jun 12, 2024

Uh oh!

bchavez commented Jul 6, 2024 •

edited

Loading

Uh oh!

bchavez Jul 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

CaptaiNiveau commented May 15, 2024

Uh oh!

CaptaiNiveau commented Jun 12, 2024

Uh oh!

bchavez commented Jul 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bchavez Jul 6, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bchavez commented Jul 6, 2024 •

edited

Loading