RFC00001: Snowflake Feature Pull Connector for BharatMLStack #162
addloopy
started this conversation in
Draft RFC Discussion
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
RFC00001: Snowflake Feature Pull Connector for BharatMLStack
Metadata
1. Summary
This proposal introduces native support for Snowflake as an offline feature store and describes an end-to-end pipeline that ports those features into BharatMLStack's online feature store so they can be served for real-time inference. The connector retrieves features from Snowflake using the
snowflake.ml.feature_storeAPI, applies the transformation/deduplication logic indata_helpers.py, then relies onspark_feature_push_clientto serialize the resulting Spark DataFrame into Protobuf and push it—via Kafka or another configured sink—into the online store. The aim is to provide a single, frictionless path from Snowflake to online serving with zero intermediate exports.2. Motivation
snowflake.ml.feature_storeSDK abstracts low-level SQL and improves developer productivity.Without this connector, users have to export data to object storage before ingesting it, introducing latency and operational overhead.
3. Goals
SNOWFLAKE_TABLEas a newsource_typevalue consumed byread_from_source().snowflake.ml.feature_store.FeatureStoreabstraction for reading data, not raw JDBC.ingestion_timestamp).spark_feature_push_client, enabling immediate consumption at inference time.4. Non-Goals
5. Detailed Design
5.1 Data Pull Flow
snowpark.Sessionis established using env-vars or a passed-in config map.FeatureStore(session).feature_store.read().to_spark().feature_mapping.5.2 Data Push Flow
fill_na_features_with_default_valuesto replace missing values based on feature metadata.OnlineFeatureStorePyClient.generate_df_with_protobuf_messages(fromspark_feature_push_client) to convert rows into Protobuf messages.write_protobuf_df_to_kafka, optionally in batches, to stream messages to the online feature-store ingress topic.features_write_to_cloud_storageistrue, persist the transformed DataFrame to the configured cloud path for replay or debugging.5.3 API Touch-Points
read_from_source()elif source_type == "SNOWFLAKE_TABLE"branch to perform steps in 5.1src_type_to_partition_col_map"SNOWFLAKE_TABLE": "ingestion_timestamp"5.4 Configuration
SNOWFLAKE_ACCOUNTxy12345.us-east-1SNOWFLAKE_USERml_readonlySNOWFLAKE_PASSWORD****SNOWFLAKE_WAREHOUSEML_WHSNOWFLAKE_DATABASEFEATURE_DBSNOWFLAKE_SCHEMAPUBLICCredentials are loaded by helper utilities and injected into the Snowpark session. Users may override through function arguments for multi-account scenarios.
5.5 Error Handling & Observability
XYZ-0001,XYZ-0002).snowflake.connector.latency_mssnowflake.connector.rows_readsnowflake.connector.bytes_scanned6. Test Plan
snowflake.ml.feature_storeto simulate data fetches.TPCH_SF1sample to validate partition filtering, column renaming, null-handling, Protobuf generation, and successful push to the online feature store.7. Rollout Strategy
8. Alternatives Considered
9. Risks & Mitigations
10. Success Metrics
11. Future Work
12. References
RFC PR

Beta Was this translation helpful? Give feedback.
All reactions