-
Notifications
You must be signed in to change notification settings - Fork 320
Description
Feature Request
Is your feature request related to a problem? Please describe:
Currently, there is no built-in version management mechanism for clients, leading to the following issues:
- Manual version verification is required before users use the client.
- Outdated client versions are still active in production, carrying potential bugs that affect system stability.
- Lack of visibility into client version distribution, making it difficult to drive version convergence.
- Inability to quickly locate IPs using a specific client version.
Describe the feature you'd like:
I propose enhancing the client query_cfg mechanism to automatically report client version information during the query process. This data should be collected and visualized by MetaProxy.
Specifically:
- Extend the
query_cfg_requestThrift struct to include optional fields likeclient_version,client_ip, andclient_sdk. - MetaProxy should collect and monitor this data upon receiving the request.
- Use Prometheus + Grafana for aggregation and visualization, enabling filtering by cluster, region, table, client type, and version.
Describe alternatives you've considered:
- We have considered alternative approaches such as:
- Active heartbeat mechanism: This would require introducing new RPC interfaces or communication paths, as well as additional client-side logic. It may also necessitate deploying new backend components for data collection, increasing system complexity and maintenance cost.
- Log-based collection: While feasible, MetaProxy typically runs in Kubernetes Pods. To persist and collect logs would require additional log collection and storage infrastructure, which introduces unnecessary dependencies and operational overhead.
We chose to leverage existing query_config requests to implement version reporting, which is more lightweight and backward-compatible.
Teachability, Documentation, Adoption, Migration Strategy:
Interface Changes:
We may extend the existing Thrift struct query_cfg_request with optional fields. This guarantees backward compatibility.
struct query_cfg_request {
1: string app_name;
2: list<i32> partition_indices;
3: option string client_version;
4: option string client_ip;
5: option string client_port;
6: option string client_sdk;
}Client Version Maintenance and Retrieval
| Language | Version Maintenance Mechanism | Version Retrieval Method |
|---|---|---|
| Java | Defined in <version> tag in pom.xml |
Injected into code via Maven resource filtering |
| C++ | Defined via macro PEGASUS_VERSION in version.h |
Read directly from file |
| Go | Version field in go.mod + Git commit hash |
Parsed using runtime/debug.ReadBuildInfo() |
| Python | __version__ variable in __init__.py |
Retrieved using importlib.metadata.version() |
| Scala | version configured in .sbt |
Generated into BuildInfo.scala during build |
Metrics Tag Format
For metrics collection and monitoring, MetaProxy will log the following fields upon receiving a topology query request:
{
"client_version": "2.3.8",
"client_ip": "192.168.1.100",
"client_port": "1234",
"client_sdk": "pegasus-java-client",
"timestamp": 1769754743,
"table_name": "test",
"cluster_name": "aktst-function1",
"region": "c3"
}
Dashboard Features:
We envision the following visualizations and features in the Grafana dashboard backed by Prometheus metrics:
- Overview:
- Unique client IP count over time.
- Client type distribution (Java/C++/Go).
- Client version distribution.
- Detail Query:
- Filter by region, cluster, table, or client type.
- Time-based aggregation.
- Export IP list of clients using a specific version.