Skip to content

Commit 1cc9796

Browse files
committed
Remove legacy async serialization and update gRPC docs
Align remote execution with gRPC-only path, drop legacy async APIs, refresh architecture docs, and fix protobuf target aliasing in Python builds.
1 parent 35636de commit 1cc9796

File tree

12 files changed

+129
-1383
lines changed

12 files changed

+129
-1383
lines changed

GRPC_ARCHITECTURE.md

Lines changed: 55 additions & 212 deletions
Original file line numberDiff line numberDiff line change
@@ -2,238 +2,81 @@
22

33
## Overview
44

5-
This document describes the gRPC-based architecture for cuOpt remote solve, which replaces the custom Protobuf serialization with industry-standard gRPC.
6-
7-
## Benefits of gRPC
8-
9-
1. **Robust and Standard**: Uses HTTP/2, built-in error handling, flow control
10-
2. **Type-safe**: Generated code from .proto files using protoc compiler
11-
3. **Streaming Support**: Native server-side streaming for logs
12-
4. **Better Tooling**: Standard debugging tools, interceptors, middleware
13-
5. **Less Error-Prone**: No custom message framing or serialization code
5+
cuOpt remote solve uses gRPC for transport and protobuf-generated stubs for the service API.
6+
The request/response payloads are serialized with a protobuf-based serializer that maps
7+
cuOpt data structures to protobuf messages. This preserves existing semantics while
8+
moving the network layer to a standard, well-supported RPC stack.
149

1510
## Service Definition
1611

17-
### RPC Methods
18-
19-
```protobuf
20-
service CuOptRemoteService {
21-
// Submit a new LP/MIP solve job (async)
22-
rpc SubmitJob(SubmitJobRequest) returns (SubmitJobResponse);
23-
24-
// Check job status
25-
rpc CheckStatus(StatusRequest) returns (StatusResponse);
26-
27-
// Get completed result
28-
rpc GetResult(GetResultRequest) returns (ResultResponse);
29-
30-
// Delete result from server memory
31-
rpc DeleteResult(DeleteRequest) returns (DeleteResponse);
32-
33-
// Stream logs in real-time (server-side streaming)
34-
rpc StreamLogs(StreamLogsRequest) returns (stream LogMessage);
35-
36-
// Cancel a queued or running job
37-
rpc CancelJob(CancelRequest) returns (CancelResponse);
38-
39-
// Wait for result (blocking call, returns when job completes)
40-
rpc WaitForResult(WaitRequest) returns (ResultResponse);
41-
42-
// Synchronous solve (blocking, returns result immediately)
43-
rpc SolveSync(SolveSyncRequest) returns (SolveSyncResponse);
44-
}
45-
```
46-
47-
### Key Improvements
12+
The gRPC service is defined in `cpp/src/linear_programming/utilities/cuopt_remote_service.proto`
13+
and imports the message schema in `cuopt_remote.proto`. Code is generated by `protoc`
14+
plus `grpc_cpp_plugin` during the build.
4815

49-
1. **Log Streaming**: Replace polling with gRPC server-side streaming
50-
- Client opens stream, server pushes log lines as they arrive
51-
- More efficient, real-time, less network overhead
16+
Core RPCs include:
5217

53-
2. **Type Safety**: Each RPC has specific request/response types
54-
- No more generic `AsyncRequest` wrapper with `oneof`
55-
- Better compile-time checking
18+
- `SubmitJob` / `UploadAndSubmit`
19+
- `CheckStatus`
20+
- `GetResult` / `StreamResult`
21+
- `StreamLogs`
22+
- `CancelJob`
23+
- `DeleteResult`
24+
- `GetIncumbents`
5625

57-
3. **Error Handling**: gRPC status codes instead of custom `ResponseStatus`
58-
- Standard codes: OK, CANCELLED, NOT_FOUND, DEADLINE_EXCEEDED, etc.
26+
## Components
5927

60-
4. **Streaming Cancellation**: Built-in support for cancelling streams
28+
### gRPC Server (`cuopt_grpc_server`)
6129

62-
## Architecture Components
30+
- Source: `cpp/cuopt_grpc_server.cpp`
31+
- Implements `CuOptRemoteService` and owns the worker process pool.
32+
- Workers communicate with the main server process via shared memory + pipes.
33+
- For results, the server calls `to_host()` before serialization.
34+
- Supports streaming logs and incumbents through gRPC streaming endpoints.
6335

64-
### 1. gRPC Server (`cuopt_grpc_server`)
36+
### gRPC Client Path (C++)
6537

66-
- Listens on TCP port (e.g., 8765)
67-
- Implements `CuOptRemoteService` interface
68-
- Manages worker processes (same as current implementation)
69-
- Each worker still uses pipes/shared memory for IPC with main process
70-
- Handles concurrent gRPC requests from multiple clients
38+
- Client logic lives in `cpp/src/linear_programming/utilities/remote_solve_grpc.cpp`
39+
and is used by `remote_solve.cu` and `cuopt_cli`.
40+
- The client serializes problems using the protobuf serializer, submits them
41+
via gRPC, and deserializes results back into cuOpt solution objects.
7142

72-
### 2. gRPC Client Wrapper (C++)
43+
### Serialization Layer
7344

74-
```cpp
75-
class CuOptGrpcClient {
76-
public:
77-
CuOptGrpcClient(const std::string& server_address);
45+
- Default serializer: `cpp/src/linear_programming/utilities/protobuf_serializer.cu`
46+
- Interface: `cpp/include/cuopt/linear_programming/utilities/remote_serialization.hpp`
47+
- Optional plugin override: `CUOPT_SERIALIZER_LIB` can load a custom serializer.
48+
- The serializer uses protobuf message types defined in `cuopt_remote.proto`.
7849

79-
// Async API
80-
std::string submit_job(const SolveLPRequest& request);
81-
JobStatus check_status(const std::string& job_id);
82-
LPSolution get_result(const std::string& job_id);
83-
void delete_result(const std::string& job_id);
84-
void cancel_job(const std::string& job_id);
50+
## Data Flow (LP/MIP)
8551

86-
// Blocking API
87-
LPSolution wait_for_result(const std::string& job_id);
88-
LPSolution solve_sync(const SolveLPRequest& request);
52+
1. Client builds a problem (LP/MIP).
53+
2. Serializer converts the problem + settings into protobuf bytes.
54+
3. gRPC `SubmitJob` or `UploadAndSubmit` sends the bytes to the server.
55+
4. Server deserializes to cuOpt data structures.
56+
5. Server runs `solve_lp` / `solve_mip` in a worker process.
57+
6. Server calls `to_host()` and serializes the solution to protobuf bytes.
58+
7. Client retrieves results via `GetResult` / `StreamResult` and deserializes.
8959

90-
// Log streaming
91-
void stream_logs(const std::string& job_id,
92-
std::function<void(const std::string&)> callback);
60+
## Generated Code (protoc output)
9361

94-
private:
95-
std::unique_ptr<CuOptRemoteService::Stub> stub_;
96-
};
97-
```
62+
Generated files are written to the CMake binary directory (not checked into source):
9863

99-
### 3. Python gRPC Client Wrapper
64+
- `cuopt_remote.pb.cc/.h`
65+
- `cuopt_remote_service.pb.cc/.h`
66+
- `cuopt_remote_service.grpc.pb.cc/.h`
10067

101-
```python
102-
class CuOptGrpcClient:
103-
def __init__(self, server_address: str):
104-
self.channel = grpc.insecure_channel(server_address)
105-
self.stub = cuopt_remote_pb2_grpc.CuOptRemoteServiceStub(self.channel)
68+
## Build Integration
10669

107-
def submit_job(self, problem, settings) -> str:
108-
"""Submit job, returns job_id"""
70+
`cpp/CMakeLists.txt` drives code generation:
10971

110-
def check_status(self, job_id: str) -> JobStatus:
111-
"""Check job status"""
72+
- Locates `protoc` and `grpc_cpp_plugin`
73+
- Runs `protoc` to generate the `*.pb.cc/.h` sources
74+
- Adds generated sources to the `cuopt` library
75+
- Builds `cuopt_grpc_server` only when gRPC is available
11276

113-
def get_result(self, job_id: str):
114-
"""Get completed result"""
77+
## Security Notes
11578

116-
def stream_logs(self, job_id: str):
117-
"""Generator that yields log lines as they arrive"""
118-
for log_msg in self.stub.StreamLogs(request):
119-
yield log_msg.line
79+
- Service stubs and message parsing are generated by `protoc` and `grpc_cpp_plugin`.
80+
- Payload serialization uses protobuf message APIs rather than hand-written parsing.
81+
- gRPC provides HTTP/2 framing, flow control, and standard status codes.
12082
```
121-
122-
### 4. Pluggable Architecture
123-
124-
Instead of pluggable serialization, we have pluggable client/server implementations:
125-
126-
```cpp
127-
// Abstract remote client interface
128-
class IRemoteClient {
129-
public:
130-
virtual ~IRemoteClient() = default;
131-
132-
virtual std::string submit_job(const ProblemData& problem) = 0;
133-
virtual JobStatus check_status(const std::string& job_id) = 0;
134-
virtual Solution get_result(const std::string& job_id) = 0;
135-
virtual void stream_logs(const std::string& job_id, LogCallback callback) = 0;
136-
// ... other methods
137-
};
138-
139-
// Implementations:
140-
// - CuOptGrpcClient (gRPC-based)
141-
// - CuOptLegacyClient (current pipe/socket-based)
142-
// - CuOptMockClient (for testing)
143-
```
144-
145-
On the server side:
146-
147-
```cpp
148-
// Abstract remote server interface
149-
class IRemoteServer {
150-
public:
151-
virtual ~IRemoteServer() = default;
152-
153-
virtual void start(int port, int num_workers) = 0;
154-
virtual void stop() = 0;
155-
virtual void wait() = 0;
156-
};
157-
158-
// Implementations:
159-
// - CuOptGrpcServer (gRPC-based)
160-
// - CuOptLegacyServer (current implementation)
161-
```
162-
163-
## Worker Communication
164-
165-
Two options for worker processes:
166-
167-
### Option 1: Keep Pipes (Current)
168-
- gRPC server receives requests over network
169-
- Server process communicates with workers via pipes (current implementation)
170-
- **Pros**: Minimal changes to worker code
171-
- **Cons**: Still have custom pipe serialization internally
172-
173-
### Option 2: Workers as gRPC Clients
174-
- Workers listen on localhost ports (e.g., 8766, 8767, ...)
175-
- Main process sends jobs to workers via gRPC
176-
- **Pros**: Full gRPC stack, no custom serialization
177-
- **Cons**: More refactoring, workers need to accept connections
178-
179-
**Recommendation**: Start with Option 1 (keep pipes for workers), can migrate to Option 2 later.
180-
181-
## Implementation Phases
182-
183-
### Phase 1: Setup (Current)
184-
- [x] Analyze current protocol
185-
- [ ] Create grpc-implementation branch
186-
- [ ] Add gRPC dependencies (grpc++, protobuf)
187-
- [ ] Create gRPC service definition (.proto)
188-
189-
### Phase 2: Server Implementation
190-
- [ ] Generate gRPC code with protoc
191-
- [ ] Implement CuOptGrpcServer class
192-
- [ ] Implement all RPC methods
193-
- [ ] Keep existing worker/pipe communication
194-
- [ ] Add log streaming support
195-
196-
### Phase 3: C++ Client
197-
- [ ] Implement CuOptGrpcClient wrapper
198-
- [ ] Add to cuopt_cli for testing
199-
- [ ] Test all operations
200-
201-
### Phase 4: Python Client
202-
- [ ] Generate Python gRPC code
203-
- [ ] Implement Python client wrapper
204-
- [ ] Update test scripts
205-
- [ ] Test async operations and log streaming
206-
207-
### Phase 5: Testing & Performance
208-
- [ ] Functional testing (all operations)
209-
- [ ] Performance comparison vs pipe-based
210-
- [ ] Load testing (multiple concurrent clients)
211-
- [ ] Documentation
212-
213-
## Performance Considerations
214-
215-
1. **Message Size**: gRPC handles large messages well (better than raw TCP)
216-
2. **Latency**: HTTP/2 multiplexing may add slight overhead, but negligible for solve times
217-
3. **Throughput**: gRPC is highly optimized, should match or exceed current implementation
218-
4. **Streaming**: Server-side streaming for logs is more efficient than polling
219-
220-
## Migration Path
221-
222-
1. **Dual Implementation**: Keep both gRPC and legacy implementations
223-
2. **Environment Variable**: `CUOPT_REMOTE_PROTOCOL=grpc` or `legacy`
224-
3. **Default**: Start with legacy as default, switch to gRPC after validation
225-
4. **Deprecation**: Remove legacy after performance validation
226-
227-
## Security Considerations
228-
229-
1. **TLS/SSL**: gRPC has built-in TLS support (can enable later)
230-
2. **Authentication**: Can add token-based auth via gRPC metadata
231-
3. **Network Isolation**: Can bind to localhost only for local-only access
232-
4. **Input Validation**: gRPC handles message validation automatically
233-
234-
## Dependencies
235-
236-
- **grpc++**: C++ gRPC library
237-
- **protobuf**: Already have this
238-
- **grpcio**: Python gRPC library (for Python clients)
239-
- **grpcio-tools**: For Python code generation

0 commit comments

Comments
 (0)