Skip to content

Commit 7d50cca

Browse files
committed
swift agent design doc cleanup
Signed-off-by: Artem Torubarov <artem.torubarov@clyso.com>
1 parent 371ace9 commit 7d50cca

File tree

1 file changed

+68
-45
lines changed

1 file changed

+68
-45
lines changed

docs/swift-agent/design.md

Lines changed: 68 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ The S3 agent approach does not work for OpenStack Swift because Swift does not s
2828

2929
| ID | Requirement | Target | Rationale |
3030
|----|-------------|--------|-----------|
31-
| N1 | Event capture latency | < 10 seconds | Chorus replicates ashynchronuosly. Data copy takes time. Capture latency is tolerable when it is small compared to obj copy time |
32-
| N2 | Minimal impact on Swift hot path | No added latency to user requests, agent failure not leads to swift failure | Production safety |
31+
| N1 | Event capture latency | < 10 seconds | Chorus replicates asynchronously. Data copy takes time. Capture latency is tolerable when it is small compared to object copy time |
32+
| N2 | Minimal impact on Swift hot path | No added latency to user requests, agent failure does not lead to Swift failure | Production safety |
3333
| N3 | Deployment without Swift source modification | Preferred | Operational simplicity |
3434
| N4 | Support Kubernetes deployment | Required | Primary deployment model |
3535
| N5 | Support non-containerized deployment | Should | Customer flexibility |
@@ -40,7 +40,7 @@ The S3 agent approach does not work for OpenStack Swift because Swift does not s
4040
Chorus uses a policy-based replication model:
4141

4242
1. User creates replication via Chorus API specifying: `user`, `from_storage`, `to_storage`, optionally `from_bucket`/`to_bucket`
43-
2. Replication policy stored in Redis (`pkg/store/replication_stores.go`)
43+
2. Replication policy stored in Redis
4444
3. Agent receives events and queries policy service to determine if event matches active replication
4545
4. For matching events, agent creates tasks in work queue
4646
5. Worker processes tasks idempotently: compares source/destination, copies if needed. It is tolerable to duplicated/reordered tasks.
@@ -116,7 +116,7 @@ pipeline = catch_errors gatekeeper healthcheck proxy-logging cache ... proxy-ser
116116

117117
### 2.2 Access Logs
118118

119-
Swift logs all requests via the [`proxy_logging` middleware](https://docs.openstack.org/swift/latest/logs.html). Log format is configurable via `log_msg_template` parameter (since Swift 2.22.0).
119+
Swift logs all requests via the [`proxy_logging` middleware](https://docs.openstack.org/swift/latest/logs.html). Log format is configurable via `log_msg_template` parameter.
120120

121121
**Default format** includes: client_ip, timestamp, method, path, status, bytes, transaction_id, request_time.
122122

@@ -138,7 +138,7 @@ Path structure: `/v1/AUTH_<project_id>/<container>/<object>`
138138

139139
**Key characteristic**:
140140
- Log format is not fixed. Each Swift installation may configure different templates, requiring configurable parsing.
141-
- Log does not contain `Swift method`, only HTTP method and path. Swift method (Object/Container create/update/metadata-update/delete) have to be calculated.
141+
- Log does not contain Swift method, only HTTP method and path. Swift method (Object/Container create/update/metadata-update/delete) must be derived from HTTP method + path.
142142

143143
### 2.3 Extensibility: Ceph RGW
144144

@@ -159,8 +159,8 @@ A log-based approach extends to Ceph RGW, which provides [ops logging](https://d
159159
"http_status":"201","user":"610acf37e0144593b76a8ce6a16f2c4f","object_size":12}
160160
```
161161

162-
**Key characteristic**:
163-
- unlike openstack, RGW logs Swift/S3 method name in `operation` field.
162+
**Key characteristic**:
163+
- Unlike OpenStack, RGW logs the Swift/S3 method name in the `operation` field.
164164

165165
## 3. Comparison
166166

@@ -171,9 +171,8 @@ A log-based approach extends to Ceph RGW, which provides [ops logging](https://d
171171
| **Deployment** | Swift config change + restart | Sidecar/file access only |
172172
| **Latency** | Lower | Higher |
173173
| **Format configuration** | Not needed | Required |
174-
| **Extensibility** | Works only with Openstack | can be extended to support logs from other verndors |
175-
| **Implementation** | Python | Go only |
176-
| **Maintenance** | Two codebases | Single codebase |
174+
| **Extensibility** | Works only with OpenStack | Can be extended to support logs from other vendors |
175+
| **Maintenance** | Two codebases (+ Python) | Single codebase |
177176

178177
## 4. Recommendation: Access Log Parsing
179178

@@ -183,7 +182,7 @@ Log parsing is recommended based on trade-off analysis:
183182

184183
1. **Production safety** (high weight): Log parsing cannot cause Swift outages. A middleware bug on the hot path risks production availability.
185184
2. **Deployment simplicity** (high weight): No Swift configuration changes. Sidecar deployment is standard Kubernetes practice.
186-
3. **Extensibility** (medium weight): Same architecture supports multiple storage venfors with different parser configuration.
185+
3. **Extensibility** (medium weight): Same architecture supports multiple storage vendors with different parser configuration.
187186
4. **Maintenance** (medium weight): Go-only implementation aligns with Chorus codebase. No Python component to maintain.
188187
5. **Latency trade-off** (acceptable): 1-10 second latency meets async replication requirements.
189188
6. **Durability trade-off** (acceptable): Modern log tailing/exporting libraries handle log rotation/truncation correctly.
@@ -197,18 +196,17 @@ Log parsing is recommended based on trade-off analysis:
197196

198197
Agent architecture is inspired by popular log collector [FluentBit](https://fluentbit.io/how-it-works/)
199198

200-
```css
199+
```
201200
[ Input ] → [ Parser ] → [ Filter ] → [ Buffer ] → [ Output ]
202201
```
203202

204-
1. Agent parse log entry to map it to S3 notification structure with
205-
- [Swift method](../../pkg/swift/methods.go)
206-
- Resource name: Account/Container/[Object]/[ObjectVersion]
207-
2. Agent filters read requests and errors.
208-
3. Agent batch multiple entries into buffer
209-
4. Sends batch of events to Chorus webhook.
203+
1. Agent parses log entry to map it to S3 notification structure with:
204+
- [Swift method](../../pkg/swift/methods.go)
205+
- Resource name: Account/Container/[Object]/[ObjectVersion]
206+
2. Agent filters out read requests and errors.
207+
3. Agent batches multiple entries into buffer.
208+
4. Agent sends batch of events to Chorus webhook.
210209

211-
TODO: add mermaid digram with agent sending webhook to chorus
212210

213211
**Alternatives:**
214212

@@ -239,12 +237,6 @@ This mapping logic could be described in a DSL, or using regexes, but would be c
239237

240238
**Recommendation**: The agent uses predefined **source types** that encapsulate the mapping logic for each storage vendor. Users configure only the log parsing; the event classification (method + path → event type) is hardcoded per source.
241239

242-
**Rationale**:
243-
- Mapping logic is well-defined per vendor (e.g., Swift PUT + 4 path segments = ObjectCreated)
244-
- Users shouldn't need to understand or configure this mapping
245-
- Keeps configuration simple; complex logic stays in testable Go code
246-
- Trade-off: adding new vendor requires code change, but this is infrequent and ensures correctness
247-
248240
```yaml
249241
swift_agent:
250242
source: openstack_swift
@@ -260,20 +252,27 @@ swift_agent:
260252
config: {} # not needed for RGW JSON logs
261253
```
262254
255+
**Rationale**:
256+
- Mapping logic is well-defined per vendor (e.g., Swift PUT + 4 path segments = ObjectCreated)
257+
- Users shouldn't need to understand or configure this mapping
258+
- Keeps configuration simple; complex logic stays in testable Go code
259+
- Trade-off: adding new vendor requires code change, but this is infrequent and ensures correctness
260+
263261
### 5.3 Alternative: parse logs with Fluent Bit
264262
265263
Use Fluent Bit for log tailing and parsing, then send parsed events to Chorus agent via HTTP.
266264
Fluent Bit allows to use Lua scripts or regex parsers to extract needed fields and map to Swift method.
267265
268-
Here is example Fluent Bit config for Swift logs:
266+
---
267+
Below is example Fluent Bit config for Swift logs:
269268
270269
<details>
271270
272271
<summary>Fluent Bit config + Lua script</summary>
273272
273+
274274
> [!WARNING]
275-
> Config and script are illustrative only. It was generated using AI.
276-
> Full implementation and testing is needed.
275+
> Config and script are illustrative only. It was generated using AI. Full implementation and testing is needed.
277276
278277
279278
```ini
@@ -310,34 +309,50 @@ local BATCH_SIZE = 10
310309
function map_record(tag, ts, record)
311310
local path = record["path"]
312311
local method = record["method"]
312+
local status = tonumber(record["status"])
313+
314+
-- Filter: only process successful requests (2xx)
315+
if not status or status < 200 or status >= 300 then
316+
return -1, ts, record
317+
end
318+
319+
if not path or not method then
320+
return -1, ts, record
321+
end
313322

314-
if not path then
323+
-- Filter: only mutations (ignore GET, HEAD)
324+
if method == "GET" or method == "HEAD" then
315325
return -1, ts, record
316326
end
317327

318-
-- /v1/AUTH_x/container/object
328+
-- Parse path: /v1/AUTH_x/container/object
319329
local parts = {}
320330
for p in string.gmatch(path, "[^/]+") do
321331
table.insert(parts, p)
322332
end
323333

324-
local account = parts[2]
334+
local account = parts[2] -- AUTH_xxx
325335
local container = parts[3]
326-
local object = parts[4]
336+
local object = parts[4] -- may be nil for container ops
337+
338+
-- Extract account ID from AUTH_xxx prefix
339+
if account and string.sub(account, 1, 5) == "AUTH_" then
340+
account = string.sub(account, 6)
341+
end
327342

328343
local op = nil
329344

330345
if object then
331-
if method == "PUT" then op = "PutObject"
332-
elseif method == "GET" then op = "GetObject"
333-
elseif method == "HEAD" then op = "HeadObject"
334-
elseif method == "DELETE" then op = "DeleteObject"
346+
-- Object operations (4+ path segments)
347+
if method == "PUT" then op = "ObjectCreated"
348+
elseif method == "POST" then op = "ObjectMetadataUpdated"
349+
elseif method == "DELETE" then op = "ObjectDeleted"
335350
end
336351
elseif container then
337-
if method == "PUT" then op = "PutContainer"
338-
elseif method == "GET" then op = "GetContainer"
339-
elseif method == "HEAD" then op = "HeadContainer"
340-
elseif method == "DELETE" then op = "DeleteContainer"
352+
-- Container operations (3 path segments)
353+
if method == "PUT" then op = "ContainerCreated"
354+
elseif method == "POST" then op = "ContainerMetadataUpdated"
355+
elseif method == "DELETE" then op = "ContainerDeleted"
341356
end
342357
end
343358

@@ -370,9 +385,17 @@ end
370385
```
371386
</details>
372387

373-
**Drawbacks**:
374-
- Harder to test/debug mapping logic (spread across Lua script)
375-
- Not possible to copy swift `log_msg_template` config directly. For every log format change, Lua or regex must be updated.
388+
---
389+
390+
### 5.4 Decision: Built-in vs Fluent Bit
391+
392+
| Criterion | Built-in Agent | Fluent Bit + Webhook |
393+
|-----------|----------------|----------------------|
394+
| **Deployment** | Go sidecar | Fluent Bit sidecar |
395+
| **Configuration** | Single YAML file | Fluent Bit config + Lua script + Chorus webhook |
396+
| **Log format sync** | Can parse Swift `log_msg_template` directly | Must manually write regex matching the template |
397+
| **Testing** | Go unit tests for mapping logic | Lua script harder to test in CI |
398+
| **Maintenance** | More Go code to write | Less code, but Lua is separate language |
399+
| **User familiarity** | Chorus-specific config | Fluent Bit widely known in ops community |
376400

377-
**Benefits**:
378-
- Reuse Fluent Bit for log tailing, buffering, batching, HTTP sending
401+
**Recommendation**: ???

0 commit comments

Comments
 (0)