Add stop sequences support for response generation#12
Add stop sequences support for response generation#12scouzi1966 wants to merge 1 commit intomainfrom
Conversation
This commit implements stop sequences functionality, allowing users to specify strings where the model should stop generating text. This is a standard OpenAI API feature that enhances output control. Features: - CLI parameter: --stop "seq1,seq2" (comma-separated stop sequences) - API parameter: "stop": ["seq1", "seq2"] (array of strings) - Works in both streaming and non-streaming modes - Stop sequences from CLI and API are merged with duplicates removed - The stop sequence itself is excluded from the output - Stops at the earliest occurrence when multiple sequences match Implementation: - Added --stop parameter to RootCommand and ServeCommand in main.swift - Updated Server.swift to accept and pass stop parameter - Updated ChatCompletionsController with mergeStopSequences helper - Updated FoundationModelService with applyStopSequences method - Enhanced CLAUDE.md documentation with examples and usage Use cases: - Structured output formatting (JSON, XML, etc.) - Limiting response length at specific markers - Multi-step generation with clear boundaries - Code generation with stop markers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Reviewer's GuideThis PR adds support for stop sequences by introducing a new CLI/API parameter, merging sequences from both sources, truncating model output at the earliest stop match, and updating documentation and logging accordingly. Sequence diagram for merging and applying stop sequences during response generationsequenceDiagram
participant CLI as actor CLI User
participant Root as RootCommand
participant Serve as ServeCommand
participant Server as Server
participant Chat as ChatCompletionsController
participant Model as FoundationModelService
CLI->>Root: Provide --stop parameter
Root->>Serve: Pass stop parameter
Serve->>Server: Pass stop parameter
Server->>Chat: Pass stop parameter
Chat->>Chat: mergeStopSequences(cliStop, apiStop)
Chat->>Model: generateResponse(..., stop: mergedStop)
Model->>Model: applyStopSequences(content, stopSequences)
Model-->>Chat: Return truncated content
Chat-->>Server: Return response
Server-->>Serve: Return response
Serve-->>Root: Return response
Root-->>CLI: Output response
Class diagram for stop sequence support in response generationclassDiagram
class RootCommand {
+temperature: Double?
+randomness: String?
+permissiveGuardrails: Bool
+stop: String?
+run()
-runSinglePrompt(prompt: String, adapter: String?)
}
class ServeCommand {
+temperature: Double?
+randomness: String?
+permissiveGuardrails: Bool
+stop: String?
+run()
}
class Server {
+temperature: Double?
+randomness: String?
+permissiveGuardrails: Bool
+stop: String?
+constructor(..., stop: String?)
}
class ChatCompletionsController {
+temperature: Double?
+randomness: String?
+permissiveGuardrails: Bool
+stop: String?
+constructor(..., stop: String?)
-mergeStopSequences(cliStop: String?, apiStop: [String]?): [String]?
}
class FoundationModelService {
+generateResponse(..., stop: [String]?): String
+generateStreamingResponseWithTiming(..., stop: [String]?): (String, Double)
-applyStopSequences(content: String, stopSequences: [String]?): String
}
RootCommand --> ServeCommand
ServeCommand --> Server
Server --> ChatCompletionsController
ChatCompletionsController --> FoundationModelService
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- Consider centralizing the parsing of comma-separated stop sequences into an array at the CLI parsing stage to avoid duplicating splitting logic in both RootCommand and ChatCompletionsController.
- The applyStopSequences helper truncates only after the full response is available; for true streaming support, consider detecting and halting on stop sequences mid-stream so you don’t emit extra chunks.
- Filter out empty or whitespace-only stop sequences after splitting to avoid unintended early truncation when users supply consecutive commas or trailing commas.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider centralizing the parsing of comma-separated stop sequences into an array at the CLI parsing stage to avoid duplicating splitting logic in both RootCommand and ChatCompletionsController.
- The applyStopSequences helper truncates only after the full response is available; for true streaming support, consider detecting and halting on stop sequences mid-stream so you don’t emit extra chunks.
- Filter out empty or whitespace-only stop sequences after splitting to avoid unintended early truncation when users supply consecutive commas or trailing commas.
## Individual Comments
### Comment 1
<location> `Sources/MacLocalAPI/main.swift:290-291` </location>
<code_context>
let message = Message(role: "user", content: prompt)
DebugLogger.log("Generating response...")
- let response = try await foundationService.generateResponse(for: [message], temperature: temperature, randomness: randomness)
+ let stopSequences = stop?.split(separator: ",").map { String($0.trimmingCharacters(in: .whitespaces)) }
+ let response = try await foundationService.generateResponse(for: [message], temperature: temperature, randomness: randomness, stop: stopSequences)
DebugLogger.log("Response generated successfully")
result = .success(response)
</code_context>
<issue_to_address>
**suggestion:** Consider extracting stop sequence parsing into a shared utility to avoid duplication.
Centralizing the stop sequence parsing will help maintain consistency and simplify future updates to the parsing logic.
Suggested implementation:
```
let stopSequences = StopSequenceParser.parse(stop)
let response = try await foundationService.generateResponse(for: [message], temperature: temperature, randomness: randomness, stop: stopSequences)
```
```
struct StopSequenceParser {
static func parse(_ stop: String?) -> [String]? {
guard let stop = stop else { return nil }
return stop.split(separator: ",").map { String($0.trimmingCharacters(in: .whitespaces)) }
}
}
```
</issue_to_address>
### Comment 2
<location> `Sources/MacLocalAPI/Models/FoundationModelService.swift:611-612` </location>
<code_context>
+ return content
+ }
+
+ var shortestStopIndex: String.Index? = nil
+ var foundStop = false
+
+ // Find the earliest occurrence of any stop sequence
</code_context>
<issue_to_address>
**nitpick:** Using a boolean flag for foundStop is redundant since shortestStopIndex can serve the same purpose.
Consider removing the foundStop flag and rely on shortestStopIndex being nil to indicate whether a stop was found.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| let stopSequences = stop?.split(separator: ",").map { String($0.trimmingCharacters(in: .whitespaces)) } | ||
| let response = try await foundationService.generateResponse(for: [message], temperature: temperature, randomness: randomness, stop: stopSequences) |
There was a problem hiding this comment.
suggestion: Consider extracting stop sequence parsing into a shared utility to avoid duplication.
Centralizing the stop sequence parsing will help maintain consistency and simplify future updates to the parsing logic.
Suggested implementation:
let stopSequences = StopSequenceParser.parse(stop)
let response = try await foundationService.generateResponse(for: [message], temperature: temperature, randomness: randomness, stop: stopSequences)
struct StopSequenceParser {
static func parse(_ stop: String?) -> [String]? {
guard let stop = stop else { return nil }
return stop.split(separator: ",").map { String($0.trimmingCharacters(in: .whitespaces)) }
}
}
| var shortestStopIndex: String.Index? = nil | ||
| var foundStop = false |
There was a problem hiding this comment.
nitpick: Using a boolean flag for foundStop is redundant since shortestStopIndex can serve the same purpose.
Consider removing the foundStop flag and rely on shortestStopIndex being nil to indicate whether a stop was found.
This commit implements stop sequences functionality, allowing users to specify strings where the model should stop generating text. This is a standard OpenAI API feature that enhances output control.
Features:
Implementation:
Use cases:
🤖 Generated with Claude Code
Summary by Sourcery
Implement configurable stop sequences for response generation by extending CLI and API interfaces, merging inputs, and truncating outputs at specified markers.
New Features:
Documentation: