Conversation
cf686d3 to
3efb63f
Compare
|
I have this now running in production without any issue. |
7bde9b5 to
fe367ed
Compare
cf4880a to
57bf417
Compare
…d a Worker node failed add basic unit testing for FeedWorkerProcess logic add unit test for when command queue is full
57bf417 to
0c5c5ad
Compare
|
Whats the actual problem here? That the reads run as python code in threads and therefore run into the GIL? I always thought due to the "run everything as subprocess" we never run into that problem? This feels like a lot of complexity and I don't really see the gain here. Any chance to make that gain clearer to me? |
|
@jankatins the problem what I was trying to solve is that when running a parallel task, the commands for the internal sub pipelines need to be evaluated before the pipeline starts working. I had a file bucket with over millions of files which I had to process. In my case, the pipeline became so big that it was unable to start; probably because of memory consumption or the job was still reading the complete file list of the bucket after more than 1 hour. This PR changes the parallel task behavior by putting the sub pipeline generation into a separate feed worker task. This PR is complex and I am not 100% sure if it should be part of mara. It is a first try to implement file based micro batch streaming via mara. I realized that it might not have been the best idea💡 I had in the last years 😉 |
See #75