|
2 | 2 |
|
3 | 3 | ## Overview |
4 | 4 |
|
5 | | -This document describes the gRPC-based architecture for cuOpt remote solve, which replaces the custom Protobuf serialization with industry-standard gRPC. |
6 | | - |
7 | | -## Benefits of gRPC |
8 | | - |
9 | | -1. **Robust and Standard**: Uses HTTP/2, built-in error handling, flow control |
10 | | -2. **Type-safe**: Generated code from .proto files using protoc compiler |
11 | | -3. **Streaming Support**: Native server-side streaming for logs |
12 | | -4. **Better Tooling**: Standard debugging tools, interceptors, middleware |
13 | | -5. **Less Error-Prone**: No custom message framing or serialization code |
| 5 | +cuOpt remote solve uses gRPC for transport and protobuf-generated stubs for the service API. |
| 6 | +The request/response payloads are serialized with a protobuf-based serializer that maps |
| 7 | +cuOpt data structures to protobuf messages. This preserves existing semantics while |
| 8 | +moving the network layer to a standard, well-supported RPC stack. |
14 | 9 |
|
15 | 10 | ## Service Definition |
16 | 11 |
|
17 | | -### RPC Methods |
18 | | - |
19 | | -```protobuf |
20 | | -service CuOptRemoteService { |
21 | | - // Submit a new LP/MIP solve job (async) |
22 | | - rpc SubmitJob(SubmitJobRequest) returns (SubmitJobResponse); |
23 | | -
|
24 | | - // Check job status |
25 | | - rpc CheckStatus(StatusRequest) returns (StatusResponse); |
26 | | -
|
27 | | - // Get completed result |
28 | | - rpc GetResult(GetResultRequest) returns (ResultResponse); |
29 | | -
|
30 | | - // Delete result from server memory |
31 | | - rpc DeleteResult(DeleteRequest) returns (DeleteResponse); |
32 | | -
|
33 | | - // Stream logs in real-time (server-side streaming) |
34 | | - rpc StreamLogs(StreamLogsRequest) returns (stream LogMessage); |
35 | | -
|
36 | | - // Cancel a queued or running job |
37 | | - rpc CancelJob(CancelRequest) returns (CancelResponse); |
38 | | -
|
39 | | - // Wait for result (blocking call, returns when job completes) |
40 | | - rpc WaitForResult(WaitRequest) returns (ResultResponse); |
41 | | -
|
42 | | - // Synchronous solve (blocking, returns result immediately) |
43 | | - rpc SolveSync(SolveSyncRequest) returns (SolveSyncResponse); |
44 | | -} |
45 | | -``` |
46 | | - |
47 | | -### Key Improvements |
| 12 | +The gRPC service is defined in `cpp/src/linear_programming/utilities/cuopt_remote_service.proto` |
| 13 | +and imports the message schema in `cuopt_remote.proto`. Code is generated by `protoc` |
| 14 | +plus `grpc_cpp_plugin` during the build. |
48 | 15 |
|
49 | | -1. **Log Streaming**: Replace polling with gRPC server-side streaming |
50 | | - - Client opens stream, server pushes log lines as they arrive |
51 | | - - More efficient, real-time, less network overhead |
| 16 | +Core RPCs include: |
52 | 17 |
|
53 | | -2. **Type Safety**: Each RPC has specific request/response types |
54 | | - - No more generic `AsyncRequest` wrapper with `oneof` |
55 | | - - Better compile-time checking |
| 18 | +- `SubmitJob` / `UploadAndSubmit` |
| 19 | +- `CheckStatus` |
| 20 | +- `GetResult` / `StreamResult` |
| 21 | +- `StreamLogs` |
| 22 | +- `CancelJob` |
| 23 | +- `DeleteResult` |
| 24 | +- `GetIncumbents` |
56 | 25 |
|
57 | | -3. **Error Handling**: gRPC status codes instead of custom `ResponseStatus` |
58 | | - - Standard codes: OK, CANCELLED, NOT_FOUND, DEADLINE_EXCEEDED, etc. |
| 26 | +## Components |
59 | 27 |
|
60 | | -4. **Streaming Cancellation**: Built-in support for cancelling streams |
| 28 | +### gRPC Server (`cuopt_grpc_server`) |
61 | 29 |
|
62 | | -## Architecture Components |
| 30 | +- Source: `cpp/cuopt_grpc_server.cpp` |
| 31 | +- Implements `CuOptRemoteService` and owns the worker process pool. |
| 32 | +- Workers communicate with the main server process via shared memory + pipes. |
| 33 | +- For results, the server calls `to_host()` before serialization. |
| 34 | +- Supports streaming logs and incumbents through gRPC streaming endpoints. |
63 | 35 |
|
64 | | -### 1. gRPC Server (`cuopt_grpc_server`) |
| 36 | +### gRPC Client Path (C++) |
65 | 37 |
|
66 | | -- Listens on TCP port (e.g., 8765) |
67 | | -- Implements `CuOptRemoteService` interface |
68 | | -- Manages worker processes (same as current implementation) |
69 | | -- Each worker still uses pipes/shared memory for IPC with main process |
70 | | -- Handles concurrent gRPC requests from multiple clients |
| 38 | +- Client logic lives in `cpp/src/linear_programming/utilities/remote_solve_grpc.cpp` |
| 39 | + and is used by `remote_solve.cu` and `cuopt_cli`. |
| 40 | +- The client serializes problems using the protobuf serializer, submits them |
| 41 | + via gRPC, and deserializes results back into cuOpt solution objects. |
71 | 42 |
|
72 | | -### 2. gRPC Client Wrapper (C++) |
| 43 | +### Serialization Layer |
73 | 44 |
|
74 | | -```cpp |
75 | | -class CuOptGrpcClient { |
76 | | -public: |
77 | | - CuOptGrpcClient(const std::string& server_address); |
| 45 | +- Default serializer: `cpp/src/linear_programming/utilities/protobuf_serializer.cu` |
| 46 | +- Interface: `cpp/include/cuopt/linear_programming/utilities/remote_serialization.hpp` |
| 47 | +- Optional plugin override: `CUOPT_SERIALIZER_LIB` can load a custom serializer. |
| 48 | +- The serializer uses protobuf message types defined in `cuopt_remote.proto`. |
78 | 49 |
|
79 | | - // Async API |
80 | | - std::string submit_job(const SolveLPRequest& request); |
81 | | - JobStatus check_status(const std::string& job_id); |
82 | | - LPSolution get_result(const std::string& job_id); |
83 | | - void delete_result(const std::string& job_id); |
84 | | - void cancel_job(const std::string& job_id); |
| 50 | +## Data Flow (LP/MIP) |
85 | 51 |
|
86 | | - // Blocking API |
87 | | - LPSolution wait_for_result(const std::string& job_id); |
88 | | - LPSolution solve_sync(const SolveLPRequest& request); |
| 52 | +1. Client builds a problem (LP/MIP). |
| 53 | +2. Serializer converts the problem + settings into protobuf bytes. |
| 54 | +3. gRPC `SubmitJob` or `UploadAndSubmit` sends the bytes to the server. |
| 55 | +4. Server deserializes to cuOpt data structures. |
| 56 | +5. Server runs `solve_lp` / `solve_mip` in a worker process. |
| 57 | +6. Server calls `to_host()` and serializes the solution to protobuf bytes. |
| 58 | +7. Client retrieves results via `GetResult` / `StreamResult` and deserializes. |
89 | 59 |
|
90 | | - // Log streaming |
91 | | - void stream_logs(const std::string& job_id, |
92 | | - std::function<void(const std::string&)> callback); |
| 60 | +## Generated Code (protoc output) |
93 | 61 |
|
94 | | -private: |
95 | | - std::unique_ptr<CuOptRemoteService::Stub> stub_; |
96 | | -}; |
97 | | -``` |
| 62 | +Generated files are written to the CMake binary directory (not checked into source): |
98 | 63 |
|
99 | | -### 3. Python gRPC Client Wrapper |
| 64 | +- `cuopt_remote.pb.cc/.h` |
| 65 | +- `cuopt_remote_service.pb.cc/.h` |
| 66 | +- `cuopt_remote_service.grpc.pb.cc/.h` |
100 | 67 |
|
101 | | -```python |
102 | | -class CuOptGrpcClient: |
103 | | - def __init__(self, server_address: str): |
104 | | - self.channel = grpc.insecure_channel(server_address) |
105 | | - self.stub = cuopt_remote_pb2_grpc.CuOptRemoteServiceStub(self.channel) |
| 68 | +## Build Integration |
106 | 69 |
|
107 | | - def submit_job(self, problem, settings) -> str: |
108 | | - """Submit job, returns job_id""" |
| 70 | +`cpp/CMakeLists.txt` drives code generation: |
109 | 71 |
|
110 | | - def check_status(self, job_id: str) -> JobStatus: |
111 | | - """Check job status""" |
| 72 | +- Locates `protoc` and `grpc_cpp_plugin` |
| 73 | +- Runs `protoc` to generate the `*.pb.cc/.h` sources |
| 74 | +- Adds generated sources to the `cuopt` library |
| 75 | +- Builds `cuopt_grpc_server` only when gRPC is available |
112 | 76 |
|
113 | | - def get_result(self, job_id: str): |
114 | | - """Get completed result""" |
| 77 | +## Security Notes |
115 | 78 |
|
116 | | - def stream_logs(self, job_id: str): |
117 | | - """Generator that yields log lines as they arrive""" |
118 | | - for log_msg in self.stub.StreamLogs(request): |
119 | | - yield log_msg.line |
| 79 | +- Service stubs and message parsing are generated by `protoc` and `grpc_cpp_plugin`. |
| 80 | +- Payload serialization uses protobuf message APIs rather than hand-written parsing. |
| 81 | +- gRPC provides HTTP/2 framing, flow control, and standard status codes. |
120 | 82 | ``` |
121 | | - |
122 | | -### 4. Pluggable Architecture |
123 | | - |
124 | | -Instead of pluggable serialization, we have pluggable client/server implementations: |
125 | | - |
126 | | -```cpp |
127 | | -// Abstract remote client interface |
128 | | -class IRemoteClient { |
129 | | -public: |
130 | | - virtual ~IRemoteClient() = default; |
131 | | - |
132 | | - virtual std::string submit_job(const ProblemData& problem) = 0; |
133 | | - virtual JobStatus check_status(const std::string& job_id) = 0; |
134 | | - virtual Solution get_result(const std::string& job_id) = 0; |
135 | | - virtual void stream_logs(const std::string& job_id, LogCallback callback) = 0; |
136 | | - // ... other methods |
137 | | -}; |
138 | | - |
139 | | -// Implementations: |
140 | | -// - CuOptGrpcClient (gRPC-based) |
141 | | -// - CuOptLegacyClient (current pipe/socket-based) |
142 | | -// - CuOptMockClient (for testing) |
143 | | -``` |
144 | | -
|
145 | | -On the server side: |
146 | | -
|
147 | | -```cpp |
148 | | -// Abstract remote server interface |
149 | | -class IRemoteServer { |
150 | | -public: |
151 | | - virtual ~IRemoteServer() = default; |
152 | | -
|
153 | | - virtual void start(int port, int num_workers) = 0; |
154 | | - virtual void stop() = 0; |
155 | | - virtual void wait() = 0; |
156 | | -}; |
157 | | -
|
158 | | -// Implementations: |
159 | | -// - CuOptGrpcServer (gRPC-based) |
160 | | -// - CuOptLegacyServer (current implementation) |
161 | | -``` |
162 | | - |
163 | | -## Worker Communication |
164 | | - |
165 | | -Two options for worker processes: |
166 | | - |
167 | | -### Option 1: Keep Pipes (Current) |
168 | | -- gRPC server receives requests over network |
169 | | -- Server process communicates with workers via pipes (current implementation) |
170 | | -- **Pros**: Minimal changes to worker code |
171 | | -- **Cons**: Still have custom pipe serialization internally |
172 | | - |
173 | | -### Option 2: Workers as gRPC Clients |
174 | | -- Workers listen on localhost ports (e.g., 8766, 8767, ...) |
175 | | -- Main process sends jobs to workers via gRPC |
176 | | -- **Pros**: Full gRPC stack, no custom serialization |
177 | | -- **Cons**: More refactoring, workers need to accept connections |
178 | | - |
179 | | -**Recommendation**: Start with Option 1 (keep pipes for workers), can migrate to Option 2 later. |
180 | | - |
181 | | -## Implementation Phases |
182 | | - |
183 | | -### Phase 1: Setup (Current) |
184 | | -- [x] Analyze current protocol |
185 | | -- [ ] Create grpc-implementation branch |
186 | | -- [ ] Add gRPC dependencies (grpc++, protobuf) |
187 | | -- [ ] Create gRPC service definition (.proto) |
188 | | - |
189 | | -### Phase 2: Server Implementation |
190 | | -- [ ] Generate gRPC code with protoc |
191 | | -- [ ] Implement CuOptGrpcServer class |
192 | | -- [ ] Implement all RPC methods |
193 | | -- [ ] Keep existing worker/pipe communication |
194 | | -- [ ] Add log streaming support |
195 | | - |
196 | | -### Phase 3: C++ Client |
197 | | -- [ ] Implement CuOptGrpcClient wrapper |
198 | | -- [ ] Add to cuopt_cli for testing |
199 | | -- [ ] Test all operations |
200 | | - |
201 | | -### Phase 4: Python Client |
202 | | -- [ ] Generate Python gRPC code |
203 | | -- [ ] Implement Python client wrapper |
204 | | -- [ ] Update test scripts |
205 | | -- [ ] Test async operations and log streaming |
206 | | - |
207 | | -### Phase 5: Testing & Performance |
208 | | -- [ ] Functional testing (all operations) |
209 | | -- [ ] Performance comparison vs pipe-based |
210 | | -- [ ] Load testing (multiple concurrent clients) |
211 | | -- [ ] Documentation |
212 | | - |
213 | | -## Performance Considerations |
214 | | - |
215 | | -1. **Message Size**: gRPC handles large messages well (better than raw TCP) |
216 | | -2. **Latency**: HTTP/2 multiplexing may add slight overhead, but negligible for solve times |
217 | | -3. **Throughput**: gRPC is highly optimized, should match or exceed current implementation |
218 | | -4. **Streaming**: Server-side streaming for logs is more efficient than polling |
219 | | - |
220 | | -## Migration Path |
221 | | - |
222 | | -1. **Dual Implementation**: Keep both gRPC and legacy implementations |
223 | | -2. **Environment Variable**: `CUOPT_REMOTE_PROTOCOL=grpc` or `legacy` |
224 | | -3. **Default**: Start with legacy as default, switch to gRPC after validation |
225 | | -4. **Deprecation**: Remove legacy after performance validation |
226 | | - |
227 | | -## Security Considerations |
228 | | - |
229 | | -1. **TLS/SSL**: gRPC has built-in TLS support (can enable later) |
230 | | -2. **Authentication**: Can add token-based auth via gRPC metadata |
231 | | -3. **Network Isolation**: Can bind to localhost only for local-only access |
232 | | -4. **Input Validation**: gRPC handles message validation automatically |
233 | | - |
234 | | -## Dependencies |
235 | | - |
236 | | -- **grpc++**: C++ gRPC library |
237 | | -- **protobuf**: Already have this |
238 | | -- **grpcio**: Python gRPC library (for Python clients) |
239 | | -- **grpcio-tools**: For Python code generation |
0 commit comments