Conversation
|
Awesome, thanks for this! Will try to get to this later this evening. |
cmars
left a comment
There was a problem hiding this comment.
A couple of questions below.
I'm a little nervous about these changes without good test coverage in place to guard against regressions.
pystdf/IO.py
Outdated
| @@ -193,17 +184,26 @@ def parse(self, count=0): | |||
| raise | |||
|
|
|||
| def getFieldParser(self, fieldType): | |||
There was a problem hiding this comment.
Could you get the same effect with a memoizing decorator here?
| try: | ||
| for parse_field in field_parsers: | ||
| fields.append(parse_field(parser, header, fields)) | ||
| except EndOfRecordException: pass |
There was a problem hiding this comment.
This seems like a change in behavior [swallowing the end of record exception, I mean]. Why was this added? Could it subtly change the behavior of how the parser is used in an unexpected way? I'm concerned that it might... can you provide more context here to derisk that concern?
There was a problem hiding this comment.
I understand that the addition of the EndOfRecordException exception handling in the createRecordParser method originates from the original appendFieldParser method.
The original appendFieldParser method served the purpose of organizing the order of actions and handling exceptions.
Now, the memorize decorator only provides the caching functionality. The organization of actions and exception handling have been moved to the createRecordParser method (the original caller of appendFieldParser) for explicit implementation.
|
A new The performance is further optimized (the time consumed is about 75% of the master) (pystdf) ➜ pystdf git:(speed_up) ✗ python test.py
Time taken to import data/lot3.stdf: 0.4064 seconds for 4.35 MB of data (lot3)
Time taken to import data/lot2.stdf: 0.4240 seconds for 4.21 MB of data (lot2)
Time taken to import data/demofile.stdf: 0.4097 seconds for 4.35 MB of data (demofile) |
|
From a performance standpoint, batch processing of However, this PR is already quite complex, and introducing To keep things manageable, I think it would be better to address the optimization of This way, we can focus on refining the current changes while ensuring clarity and maintainability. Let me know your thoughts! |

By using closures instead of lambda / caching field parsers methods / precompiling regular objects, a performance improvement of about 15% was achieved (mainly from closures)
Here are my benchmark codes: