Use Lazy for expensive properties in System Dataset#550
Use Lazy for expensive properties in System Dataset#550CaptaiNiveau wants to merge 1 commit intobchavez:masterfrom
Conversation
In my tests with repeated creation (clones) of Faker instances, the creation of this dataset was by far the most expensive one. Making it lazy should alleviate a lot of GC pressure, until the Dataset is actually used.
|
Hey, I don't want to be annoying, but it'd be cool if you'd be able to take a short look at this. Thanks for this great library, it's fun to use :) |
|
Hi @CaptaiNiveau, apologies for the delay. Thank you for the PR and suggestion. I don't have anything strictly "against" this PR or better performance in general. However, where my main concern is around maintainability; more specifically, introducing Here, for this specific That is, in this specific case, it seems we are doing a lot of "runtime computation of data" here with the mime dataset in the constructor. IIRC, the mime data is pretty large; and I suspect doing all these LINQ operations is causing some perf issues in your case. Your approach seems decent for a performance fix at runtime. Also, I suspect this is a classic computing "speed vs space" tradeoff, where we could potentially push the result of these computations into the dataset resource locale (at compile time) and skip the LINQ computations at the cost of increasing space used in the locale file; and access The more I think about it, your solution is probably okay. Also, if you could do me a favor and add your benchmarks (for reproducibility) to the Thanks, |
| .OfType<BObject>() | ||
| .SelectMany(bObject => | ||
| { | ||
| if( bObject.ContainsKey("extensions") ) |
There was a problem hiding this comment.
also, if you could; small nit: avoid these small reformatting changes in existing source that doesn't change.
In my tests with repeated creation (clones) of Faker instances, the creation of this dataset was by far the most expensive one. Making it lazy should alleviate a lot of GC pressure, until the Dataset is actually used.
For my benchmark I ran 100 generators in parallel, each of which creating ~100 Fakers. The fakers are cloned from a static base config. A generator has all the configs needed to create all the data needed for our app, which it does.
This should also be reproducible with a simpler setup. That would probably drop the lazy version to even lower allocations as there wouldn't be that much data generated from it.
Is there anything that would speak against this change? It speeds up data generation a lot in cases like mine without sacrificing functionality. Tests all passed without issues, too.