Stateless pipeline resuming #2343
Closed
chubei
started this conversation in
Feature Requests
Replies: 1 comment 5 replies
-
|
How should connector behave when ingestion resumes from middle of source transaction? Now, in every connector we need to have logic which drops those events before sending them to pipeline. Maybe, it would be better that in pipeline we drop those messages instead of doing that in every connector. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Stateless pipeline resuming
For stateless pipelines, we store the pipeline state in the sink.
The pipeline state is a collection of connection states.
A connection state consists of two parts:
OpIdentifier. It's always twou64s.Connection level state is stored in a separate table in the sink, and is written only once. Record level state is written as part of the record.
To be more specific, every record in the sink has a
__dozer_record_statefield which stores theOpIdentifier. We should be able to get the maximumOpIdentifierof a sink. (How a specific sink stores and aggregates the field is out of scope for this document.)For inserts, the field is set to the record's
OpIdentifierif it has one (during replication), orNoneif it doesn't (during snapshotting).For deletes, the field is deleted with the record.
For updates, the field is updated to the new
OpIdentifierof the update.Upon restart, Dozer first aggregates the sink for the maximum
OpIdentifierof the sink. If it'sNone, Dozer starts moving the whole source tables. Otherwise, Dozer passes theOpIdentifierand the connection level state to the connector to continue replication.Why this is correct
If the last operation that was written to the sink is an insert or update, then the connector restarts from the correct position.
If the last one or more operations are deletes, the connector will send these deletes again, but the deletes will not modify the sink because these records are already deleted.
Beta Was this translation helpful? Give feedback.
All reactions