-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Describe the enhancement
The kodosumi spooler represents the backend component to collect and store events of active flow executions. Current implementation uses sqlite3 files to store the execution event stream. Each flow execution correlates to a dedicated sqlite3 file.
The spooler therefore is a single point of failure (SPOF). Note that no events get lost on spooler failure and restart since the events remain in Ray's shared object store if the spooler does not gather the events. At restart the spooler will therefore continue spooling and continues to materialise the event stream to disk (sqlite3).
The problem is the spooler does not scale with the current design. To scale the spooler - i.e. run multiple and redundant spooler instances - a central data store is required to share the event stream across multiple spoolers.
suggested solutions and enhancements:
- refactor the sqlite3 implementation to plug in different storage backends (i.e. Redis, MongoDB, PostgreSQL)
- refactor the spooler component to "assign" flow executions and their event streams to a dedicated spooler instance
Alternatives considered
none