Skip to content

refactor(obkv-hbase): migrate to user-specified schema with STRUCT types#93

Open
yuanoOo wants to merge 4 commits intooceanbase:mainfrom
yuanoOo:obkv2
Open

refactor(obkv-hbase): migrate to user-specified schema with STRUCT types#93
yuanoOo wants to merge 4 commits intooceanbase:mainfrom
yuanoOo:obkv2

Conversation

@yuanoOo
Copy link
Collaborator

@yuanoOo yuanoOo commented Jan 22, 2026

Summary

Remove cumbersome JSON schema configuration parameter and adopt Spark's SchemaRelationProvider pattern with STRUCT types for column family mapping.

BREAKING CHANGE: The 'schema' configuration parameter has been removed. Users must now define schemas using STRUCT types in DDL or DataFrame API.

Changes:

  • Implement SchemaRelationProvider in OBKVHBaseSparkSource
  • Remove schema JSON parsing logic from HBaseRelation
  • Update flush() to handle STRUCT-based column family data
  • Remove SCHEMA config parameter from OBKVHbaseConfig
  • Update all test cases to use STRUCT approach
  • Add schema validation (first field = rowkey, others = STRUCT)

Migration example:
Before:
.option("schema", "{...JSON...}")

After (DataFrame API):
val schema = StructType(Seq( StructField("id", StringType), StructField("cf1", StructType(Seq( StructField("col1", StringType) ))) ))

After (Spark SQL):
CREATE TEMPORARY VIEW t ( id STRING,
cf1 STRUCT<col1: STRING>
) USING obkv-hbase OPTIONS(...);

Solution Description

Remove cumbersome JSON schema configuration parameter and adopt Spark's
SchemaRelationProvider pattern with STRUCT types for column family mapping.

BREAKING CHANGE: The 'schema' configuration parameter has been removed.
Users must now define schemas using STRUCT types in DDL or DataFrame API.

Changes:
- Implement SchemaRelationProvider in OBKVHBaseSparkSource
- Remove schema JSON parsing logic from HBaseRelation
- Update flush() to handle STRUCT-based column family data
- Remove SCHEMA config parameter from OBKVHbaseConfig
- Update all test cases to use STRUCT approach
- Add schema validation (first field = rowkey, others = STRUCT)

Migration example:
Before:
  .option("schema", "{...JSON...}")

After (DataFrame API):
  val schema = StructType(Seq(
    StructField("id", StringType),
    StructField("cf1", StructType(Seq(
      StructField("col1", StringType)
    )))
  ))

After (Spark SQL):
  CREATE TEMPORARY VIEW t (
    id STRING,
    cf1 STRUCT<col1: STRING>
  ) USING `obkv-hbase` OPTIONS(...);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant