Skip to content

[VL][Iceberg] Fix input_file_name() on Iceberg, JNI init stability, and metadata propagation#11615

Open
ReemaAlzaid wants to merge 1 commit intoapache:mainfrom
ReemaAlzaid:iceberg-input-file
Open

[VL][Iceberg] Fix input_file_name() on Iceberg, JNI init stability, and metadata propagation#11615
ReemaAlzaid wants to merge 1 commit intoapache:mainfrom
ReemaAlzaid:iceberg-input-file

Conversation

@ReemaAlzaid
Copy link
Contributor

@ReemaAlzaid ReemaAlzaid commented Feb 14, 2026

What changes are proposed in this pull request?

This PR fixes input_file_name(), input_file_block_start(), input_file_block_length() behavior on Iceberg with Velox and addresses a related JNI crash path.
Issue: #11513

Main fixes

  • JNI stability fix (Velox native path)

    • cpp/core/jni/JniCommon.h
    • Pre-resolve reservation listener JNI methods in SparkAllocationListener constructor.
    • Add null-class guard in getMethodIdOrError to avoid unsafe JNI method lookup failures.
  • Iceberg split metadata propagation fix

    • cpp/velox/compute/WholeStageResultIterator.cc
    • Pass metadata map to HiveIcebergSplit (instead of empty metadata), so file metadata is preserved.
  • Iceberg metadata wiring

    • gluten-iceberg/src/main/scala/org/apache/iceberg/spark/source/GlutenIcebergSourceUtil.scala
    • gluten-iceberg/src/main/java/org/apache/gluten/substrait/rel/IcebergLocalFilesBuilder.java
    • gluten-iceberg/src/main/java/org/apache/gluten/substrait/rel/IcebergLocalFilesNode.java
    • Generate and propagate metadata columns for:
      • input_file_name
      • input_file_block_start
      • input_file_block_length
  • Scan transformer + rule generalization

    • gluten-substrait/src/main/scala/org/apache/gluten/execution/BatchScanExecTransformer.scala
    • gluten-iceberg/src/main/scala/org/apache/gluten/execution/IcebergScanTransformer.scala
    • gluten-kafka/src/main/scala/org/apache/gluten/execution/MicroBatchScanExecTransformer.scala
    • gluten-paimon/src-paimon/main/scala/org/apache/gluten/execution/PaimonScanTransformer.scala
    • Add/implement withOutput(...) for BatchScanExecTransformerBase family.
    • Update PushDownInputFileExpression to work against BatchScanExecTransformerBase and deduplicate rewriteExpr.

How was this patch tested?

  • Build/compile validation:
    • mvn -o -Pgluten-substrait,gluten-iceberg -DskipTests compile
mvn -Pbackends-velox\
   -Dspark=3.5 \
   -DskipTests -Dmaven.test.skip=true \
   -Dscala.recompileMode=incremental \
   -Dmaven.compiler.skip=false \
   -pl 'gluten-iceberg,package' \
   -am clean package
  • Repro scenario:
    • Ran Iceberg input_file_name(), input_file_block_start(), input_file_block_length() and reproduction with Gluten Velox bundle
    • Verified crash path moved from native segfault to stable execution.
    • Verified metadata propagation fix removes empty/null file metadata behavior.

@github-actions github-actions bot added CORE works for Gluten Core VELOX DATA_LAKE labels Feb 14, 2026
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@ReemaAlzaid
Copy link
Contributor Author

Here are the relevant test cases I ran

Before

Last login: Sun Feb 15 15:10:56 on ttys088
(3.13.3) ➜  incubator-gluten git:(main) ✗ export GLUTEN_JAR=/Users/reema/Desktop/OpenSource/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-SNAPSHOT.jar
export ICEBERG_JAR=/tmp/iceberg.jar

spark-submit \
  --jars "$GLUTEN_JAR,$ICEBERG_JAR" \
  --conf spark.plugins=org.apache.gluten.GlutenPlugin \
  --conf spark.gluten.sql.columnar.backend.lib=velox \
  --conf spark.gluten.enabled=true \
  --conf spark.driver.extraClassPath="$GLUTEN_JAR:$ICEBERG_JAR" \
  --conf spark.executor.extraClassPath="$GLUTEN_JAR:$ICEBERG_JAR" \
  test_iceberg_simple.py
26/02/15 15:15:56 WARN Utils: Your hostname, Reemas-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.100.32 instead (on interface en0)
26/02/15 15:15:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
26/02/15 15:15:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
26/02/15 15:15:57 INFO SparkContext: Running Spark version 3.5.5
26/02/15 15:15:57 INFO SparkContext: OS info Mac OS X, 15.6, aarch64
26/02/15 15:15:57 INFO SparkContext: Java version 17.0.18
26/02/15 15:15:57 INFO ResourceUtils: ==============================================================
26/02/15 15:15:57 INFO ResourceUtils: No custom resources configured for spark.driver.
26/02/15 15:15:57 INFO ResourceUtils: ==============================================================
26/02/15 15:15:57 INFO SparkContext: Submitted application: iceberg-input-file-metadata-test
26/02/15 15:15:57 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 2048, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
26/02/15 15:15:57 INFO ResourceProfile: Limiting resource is cpu
26/02/15 15:15:57 INFO ResourceProfileManager: Added ResourceProfile id: 0
26/02/15 15:15:57 INFO SecurityManager: Changing view acls to: reema
26/02/15 15:15:57 INFO SecurityManager: Changing modify acls to: reema
26/02/15 15:15:57 INFO SecurityManager: Changing view acls groups to: 
26/02/15 15:15:57 INFO SecurityManager: Changing modify acls groups to: 
26/02/15 15:15:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: reema; groups with view permissions: EMPTY; users with modify permissions: reema; groups with modify permissions: EMPTY
26/02/15 15:15:57 INFO Utils: Successfully started service 'sparkDriver' on port 49661.
26/02/15 15:15:57 INFO SparkEnv: Registering MapOutputTracker
26/02/15 15:15:57 INFO SparkEnv: Registering BlockManagerMaster
26/02/15 15:15:57 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
26/02/15 15:15:57 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
26/02/15 15:15:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
26/02/15 15:15:57 INFO DiskBlockManager: Created local directory at /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/blockmgr-e6925b7a-9b9b-43a1-8861-1452ad6dda87
26/02/15 15:15:57 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:15:57 INFO SparkEnv: Registering OutputCommitCoordinator
26/02/15 15:15:57 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
26/02/15 15:15:57 INFO Utils: Successfully started service 'SparkUI' on port 4040.
26/02/15 15:15:57 INFO SparkContext: Added JAR file:///Users/reema/Desktop/OpenSource/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-SNAPSHOT.jar at spark://192.168.100.32:49661/jars/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-SNAPSHOT.jar with timestamp 1771157757097
26/02/15 15:15:57 INFO SparkContext: Added JAR file:///private/tmp/iceberg.jar at spark://192.168.100.32:49661/jars/iceberg.jar with timestamp 1771157757097
26/02/15 15:15:57 INFO Discovery: Start discovering components in the current classpath... 
26/02/15 15:15:57 INFO Discovery: Discovered component files: org.apache.gluten.backendsapi.velox.VeloxBackend, org.apache.gluten.component.VeloxIcebergComponent. Duration: 8 ms.
26/02/15 15:15:57 INFO package: Components registered within order: velox, velox-iceberg
26/02/15 15:15:57 INFO GlutenDriverPlugin: Gluten components:
==============================================================
Component velox
  velox_branch = HEAD
  velox_revision = f247a8e922c4802fd9b9cf7a626421bff9b803fd
  velox_revisionTime = 2026-02-07 14:11:45 +0000
Component velox-iceberg
==============================================================
26/02/15 15:15:57 INFO SubstraitBackend: Gluten build info:
==============================================================
Gluten Version: 1.7.0-SNAPSHOT
GCC Version: 
Java Version: 17
Scala Version: 2.12.15
Spark Version: 3.5.5
Hadoop Version: 2.7.4
Gluten Branch: main
Gluten Revision: be3eeea8c33ddfb5352a37ad7d169e326c4dc1ba
Gluten Revision Time: 2026-02-13 22:47:03 +0000
Gluten Build Time: 2026-02-15T12:07:38Z
Gluten Repo URL: https://github.com/ReemaAlzaid/incubator-gluten.git
==============================================================
26/02/15 15:15:57 INFO VeloxListenerApi: Memory overhead is not set. Setting it to 644245094 automatically. Gluten doesn't follow Spark's calculation on default value of this option because the actual required memory overhead will depend on off-heap usage than on on-heap usage.
26/02/15 15:15:57 INFO SparkDirectoryUtil: Created local directory at /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b
26/02/15 15:15:57 INFO JniWorkspace: Creating JNI workspace in root directory /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7
26/02/15 15:15:57 INFO JniWorkspace: JNI workspace /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281 created in root directory /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7
26/02/15 15:15:57 INFO JniLibLoader: Read real path /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libgluten.dylib for libPath /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libgluten.dylib
26/02/15 15:15:57 INFO JniLibLoader: Library /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libgluten.dylib has been loaded using path-loading method
26/02/15 15:15:57 INFO JniLibLoader: Library /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libgluten.dylib has been loaded
26/02/15 15:15:57 INFO JniLibLoader: Successfully loaded library darwin/aarch64/libgluten.dylib
26/02/15 15:15:57 INFO JniLibLoader: Read real path /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libvelox.dylib for libPath /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libvelox.dylib
26/02/15 15:15:57 INFO JniLibLoader: Library /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libvelox.dylib has been loaded using path-loading method
26/02/15 15:15:57 INFO JniLibLoader: Library /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-27f14ba0-aa7c-45b8-8c18-2bf2e896d45b/jni/182e3935-f5c1-4a64-86fb-377b2af85cd7/gluten-13074889086958015281/darwin/aarch64/libvelox.dylib has been loaded
26/02/15 15:15:57 INFO JniLibLoader: Successfully loaded library darwin/aarch64/libvelox.dylib
W20260215 15:15:57.885989 14490670 MemoryArbitrator.cpp:84] Query memory capacity[460.50MB] is set for NOOP arbitrator which has no capacity enforcement
26/02/15 15:15:57 INFO DriverPluginContainer: Initialized driver component for plugin org.apache.gluten.GlutenPlugin.
26/02/15 15:15:57 INFO Executor: Starting executor ID driver on host 192.168.100.32
26/02/15 15:15:57 INFO Executor: OS info Mac OS X, 15.6, aarch64
26/02/15 15:15:57 INFO Executor: Java version 17.0.18
26/02/15 15:15:57 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): 'file:/Users/reema/Desktop/OpenSource/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-SNAPSHOT.jar,file:/tmp/iceberg.jar,file:/Users/reema/Desktop/OpenSource/incubator-gluten/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-SNAPSHOT.jar,file:/Users/reema/Desktop/OpenSource/incubator-gluten/iceberg.jar'
26/02/15 15:15:57 INFO Executor: Created or updated repl class loader org.apache.spark.util.MutableURLClassLoader@3d5e1c01 for default.
26/02/15 15:15:57 INFO CodedInputStreamClassInitializer: The defaultRecursionLimit in protobuf has been increased to 100000
26/02/15 15:15:57 INFO VeloxListenerApi: Gluten is running with Spark local mode. Skip running static initializer for executor.
26/02/15 15:15:57 INFO ExecutorPluginContainer: Initialized executor component for plugin org.apache.gluten.GlutenPlugin.
26/02/15 15:15:57 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49662.
26/02/15 15:15:57 INFO NettyBlockTransferService: Server created on 192.168.100.32:49662
26/02/15 15:15:57 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
26/02/15 15:15:57 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.100.32, 49662, None)
26/02/15 15:15:57 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.100.32:49662 with 2.4 GiB RAM, BlockManagerId(driver, 192.168.100.32, 49662, None)
26/02/15 15:15:57 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.100.32, 49662, None)
26/02/15 15:15:57 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.100.32, 49662, None)
26/02/15 15:15:58 INFO VeloxBackend: Gluten SQL Tab has been attached.
26/02/15 15:15:58 INFO SparkShimLoader: Loading Spark Shims for version: 3.5.5
26/02/15 15:15:58 INFO SparkShimLoader: Using Shim provider: List(org.apache.gluten.sql.shims.spark35.SparkShimProvider@4339652b)
================================================================================
Creating Iceberg table...
================================================================================
26/02/15 15:15:58 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
26/02/15 15:15:58 INFO SharedState: Warehouse path is 'file:/Users/reema/Desktop/OpenSource/incubator-gluten/spark-warehouse'.
26/02/15 15:15:58 INFO CatalogUtil: Loading custom FileIO implementation: org.apache.iceberg.hadoop.HadoopFileIO
26/02/15 15:15:59 INFO BaseMetastoreCatalog: Table properties set at catalog level through catalog properties: {}
26/02/15 15:15:59 INFO BaseMetastoreCatalog: Table properties enforced at catalog level through catalog properties: {}
26/02/15 15:15:59 INFO HadoopTableOperations: Committed a new metadata file file:/tmp/iceberg_warehouse/default/test_table/metadata/v1.metadata.json
26/02/15 15:15:59 WARN GlutenFallbackReporter: Validation failed for plan: AppendData[QueryId=1], due to: [FallbackByBackendSettings] Validation failed on node AppendData
26/02/15 15:15:59 INFO CodeGenerator: Code generated in 120.717458 ms
26/02/15 15:15:59 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:15:59 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 29.7 KiB, free 2.4 GiB)
26/02/15 15:15:59 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.100.32:49662 (size: 29.7 KiB, free: 2.4 GiB)
26/02/15 15:15:59 INFO SparkContext: Created broadcast 0 from broadcast at SparkWrite.java:195
26/02/15 15:15:59 INFO AppendDataExec: Start processing data source write support: IcebergBatchWrite(table=local.default.test_table, format=PARQUET). The input RDD has 3 partitions.
26/02/15 15:15:59 INFO SparkContext: Starting job: sql at NativeMethodAccessorImpl.java:0
26/02/15 15:15:59 INFO DAGScheduler: Got job 0 (sql at NativeMethodAccessorImpl.java:0) with 3 output partitions
26/02/15 15:15:59 INFO DAGScheduler: Final stage: ResultStage 0 (sql at NativeMethodAccessorImpl.java:0)
26/02/15 15:15:59 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:15:59 INFO DAGScheduler: Missing parents: List()
26/02/15 15:15:59 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at sql at NativeMethodAccessorImpl.java:0), which has no missing parents
26/02/15 15:15:59 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 7.8 KiB, free 2.4 GiB)
26/02/15 15:15:59 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.4 KiB, free 2.4 GiB)
26/02/15 15:15:59 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.100.32:49662 (size: 4.4 KiB, free: 2.4 GiB)
26/02/15 15:15:59 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1585
26/02/15 15:15:59 INFO DAGScheduler: Submitting 3 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at sql at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0, 1, 2))
26/02/15 15:15:59 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 tasks resource profile 0
26/02/15 15:15:59 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 9503 bytes) 
26/02/15 15:15:59 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1) (192.168.100.32, executor driver, partition 1, PROCESS_LOCAL, 9503 bytes) 
26/02/15 15:15:59 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2) (192.168.100.32, executor driver, partition 2, PROCESS_LOCAL, 9503 bytes) 
26/02/15 15:15:59 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
26/02/15 15:15:59 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
26/02/15 15:15:59 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
26/02/15 15:16:00 INFO CodecPool: Got brand-new compressor [.zstd]
26/02/15 15:16:00 INFO CodecPool: Got brand-new compressor [.zstd]
26/02/15 15:16:00 INFO CodecPool: Got brand-new compressor [.zstd]
26/02/15 15:16:00 INFO DataWritingSparkTask: Writer for partition 0 is committing.
26/02/15 15:16:00 INFO DataWritingSparkTask: Writer for partition 2 is committing.
26/02/15 15:16:00 INFO DataWritingSparkTask: Writer for partition 1 is committing.
26/02/15 15:16:00 INFO DataWritingSparkTask: Committed partition 1 (task 1, attempt 0, stage 0.0)
26/02/15 15:16:00 INFO DataWritingSparkTask: Committed partition 0 (task 0, attempt 0, stage 0.0)
26/02/15 15:16:00 INFO DataWritingSparkTask: Committed partition 2 (task 2, attempt 0, stage 0.0)
26/02/15 15:16:00 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 4118 bytes result sent to driver
26/02/15 15:16:00 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 4114 bytes result sent to driver
26/02/15 15:16:00 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 4110 bytes result sent to driver
26/02/15 15:16:00 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 405 ms on 192.168.100.32 (executor driver) (1/3)
26/02/15 15:16:00 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 429 ms on 192.168.100.32 (executor driver) (2/3)
26/02/15 15:16:00 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 406 ms on 192.168.100.32 (executor driver) (3/3)
26/02/15 15:16:00 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
26/02/15 15:16:00 INFO DAGScheduler: ResultStage 0 (sql at NativeMethodAccessorImpl.java:0) finished in 0.476 s
26/02/15 15:16:00 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
26/02/15 15:16:00 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
26/02/15 15:16:00 INFO DAGScheduler: Job 0 finished: sql at NativeMethodAccessorImpl.java:0, took 0.513478 s
26/02/15 15:16:00 INFO AppendDataExec: Data source write support IcebergBatchWrite(table=local.default.test_table, format=PARQUET) is committing.
26/02/15 15:16:00 INFO SparkWrite: Committing append with 3 new data files to table local.default.test_table
26/02/15 15:16:00 INFO HadoopTableOperations: Committed a new metadata file file:/tmp/iceberg_warehouse/default/test_table/metadata/v2.metadata.json
26/02/15 15:16:00 INFO SnapshotProducer: Committed snapshot 8759041077900200141 (MergeAppend)
26/02/15 15:16:00 INFO LoggingMetricsReporter: Received metrics report: CommitReport{tableName=local.default.test_table, snapshotId=8759041077900200141, sequenceNumber=1, operation=append, commitMetrics=CommitMetricsResult{totalDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT0.302477167S, count=1}, attempts=CounterResult{unit=COUNT, value=1}, addedDataFiles=CounterResult{unit=COUNT, value=3}, removedDataFiles=null, totalDataFiles=CounterResult{unit=COUNT, value=3}, addedDeleteFiles=null, addedEqualityDeleteFiles=null, addedPositionalDeleteFiles=null, addedDVs=null, removedDeleteFiles=null, removedEqualityDeleteFiles=null, removedPositionalDeleteFiles=null, removedDVs=null, totalDeleteFiles=CounterResult{unit=COUNT, value=0}, addedRecords=CounterResult{unit=COUNT, value=3}, removedRecords=null, totalRecords=CounterResult{unit=COUNT, value=3}, addedFilesSizeInBytes=CounterResult{unit=BYTES, value=1920}, removedFilesSizeInBytes=null, totalFilesSizeInBytes=CounterResult{unit=BYTES, value=1920}, addedPositionalDeletes=null, removedPositionalDeletes=null, totalPositionalDeletes=CounterResult{unit=COUNT, value=0}, addedEqualityDeletes=null, removedEqualityDeletes=null, totalEqualityDeletes=CounterResult{unit=COUNT, value=0}, manifestsCreated=null, manifestsReplaced=null, manifestsKept=null, manifestEntriesProcessed=null}, metadata={engine-version=3.5.5, app-id=local-1771157757901, engine-name=spark, iceberg-version=Apache Iceberg 1.10.0 (commit 2114bf631e49af532d66e2ce148ee49dd1dd1f1f)}}
26/02/15 15:16:00 INFO SparkWrite: Committed in 322 ms
26/02/15 15:16:00 INFO AppendDataExec: Data source write support IcebergBatchWrite(table=local.default.test_table, format=PARQUET) committed.

================================================================================
Testing input_file_name() on Iceberg table
================================================================================

=== input_file_name() Results ===
26/02/15 15:16:00 INFO V2ScanRelationPushDown: 
Output: id#7, name#8
         
26/02/15 15:16:00 INFO SnapshotScan: Scanning table local.default.test_table snapshot 8759041077900200141 created at 2026-02-15T12:16:00.623+00:00 with filter true
26/02/15 15:16:00 INFO BaseDistributedDataScan: Planning file tasks locally for table local.default.test_table
26/02/15 15:16:00 INFO SparkPartitioningAwareScan: Reporting UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:00 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:00 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:00 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:00 INFO SparkContext: Created broadcast 2 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:00 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:00 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:00 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:00 INFO SparkContext: Created broadcast 3 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:00 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 WARN GlutenFallbackReporter: Validation failed for plan: Project[QueryId=2], due to: fallback input file expression
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 8.318542 ms
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 4 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:01 INFO SparkContext: Starting job: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:01 INFO DAGScheduler: Got job 1 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56) with 1 output partitions
26/02/15 15:16:01 INFO DAGScheduler: Final stage: ResultStage 1 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56)
26/02/15 15:16:01 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:01 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:01 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[5] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56), which has no missing parents
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 16.7 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 7.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 192.168.100.32:49662 (size: 7.0 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1585
26/02/15 15:16:01 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[5] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56) (first 15 tasks are for partitions Vector(0))
26/02/15 15:16:01 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks resource profile 0
26/02/15 15:16:01 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 3) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11471 bytes) 
26/02/15 15:16:01 INFO Executor: Running task 0.0 in stage 1.0 (TID 3)
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 4.898333 ms
26/02/15 15:16:01 INFO Executor: Finished task 0.0 in stage 1.0 (TID 3). 7086 bytes result sent to driver
26/02/15 15:16:01 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 3) in 59 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:01 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
26/02/15 15:16:01 INFO DAGScheduler: ResultStage 1 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56) finished in 0.064 s
26/02/15 15:16:01 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job
26/02/15 15:16:01 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished
26/02/15 15:16:01 INFO DAGScheduler: Job 1 finished: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56, took 0.067347 s
ID: 1, Name: Alice, File: ''
ID: 2, Name: Bob, File: ''
ID: 3, Name: Charlie, File: ''

❌ BUG: 3/3 rows have EMPTY file paths!

================================================================================
Testing input_file_block_start() on Iceberg table
================================================================================

=== input_file_block_start() Results ===
26/02/15 15:16:01 INFO V2ScanRelationPushDown: 
Output: id#23, name#24
         
26/02/15 15:16:01 INFO SnapshotScan: Scanning table local.default.test_table snapshot 8759041077900200141 created at 2026-02-15T12:16:00.623+00:00 with filter true
26/02/15 15:16:01 INFO BaseDistributedDataScan: Planning file tasks locally for table local.default.test_table
26/02/15 15:16:01 INFO SparkPartitioningAwareScan: Reporting UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 6 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 7 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 WARN GlutenFallbackReporter: Validation failed for plan: Project[QueryId=3], due to: fallback input file expression
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 4.856583 ms
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 8 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:01 INFO SparkContext: Starting job: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:01 INFO DAGScheduler: Got job 2 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82) with 1 output partitions
26/02/15 15:16:01 INFO DAGScheduler: Final stage: ResultStage 2 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82)
26/02/15 15:16:01 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:01 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:01 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[9] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82), which has no missing parents
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 16.7 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 7.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on 192.168.100.32:49662 (size: 7.0 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:1585
26/02/15 15:16:01 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[9] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82) (first 15 tasks are for partitions Vector(0))
26/02/15 15:16:01 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks resource profile 0
26/02/15 15:16:01 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11473 bytes) 
26/02/15 15:16:01 INFO Executor: Running task 0.0 in stage 2.0 (TID 4)
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 4.496625 ms
26/02/15 15:16:01 INFO Executor: Finished task 0.0 in stage 2.0 (TID 4). 7037 bytes result sent to driver
26/02/15 15:16:01 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 14 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:01 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
26/02/15 15:16:01 INFO DAGScheduler: ResultStage 2 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82) finished in 0.017 s
26/02/15 15:16:01 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job
26/02/15 15:16:01 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished
26/02/15 15:16:01 INFO DAGScheduler: Job 2 finished: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82, took 0.018839 s
ID: 1, Name: Alice, Block Start: -1
ID: 2, Name: Bob, Block Start: -1
ID: 3, Name: Charlie, Block Start: -1

❌ BUG: Some rows have invalid block start positions!

================================================================================
Testing input_file_block_length() on Iceberg table
================================================================================

=== input_file_block_length() Results ===
26/02/15 15:16:01 INFO V2ScanRelationPushDown: 
Output: id#39, name#40
         
26/02/15 15:16:01 INFO SnapshotScan: Scanning table local.default.test_table snapshot 8759041077900200141 created at 2026-02-15T12:16:00.623+00:00 with filter true
26/02/15 15:16:01 INFO BaseDistributedDataScan: Planning file tasks locally for table local.default.test_table
26/02/15 15:16:01 INFO SparkPartitioningAwareScan: Reporting UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 10 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 11 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 WARN GlutenFallbackReporter: Validation failed for plan: Project[QueryId=4], due to: fallback input file expression
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 5.000667 ms
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 12 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:01 INFO SparkContext: Starting job: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:01 INFO DAGScheduler: Got job 3 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106) with 1 output partitions
26/02/15 15:16:01 INFO DAGScheduler: Final stage: ResultStage 3 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106)
26/02/15 15:16:01 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:01 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:01 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[13] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106), which has no missing parents
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 16.7 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 7.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory on 192.168.100.32:49662 (size: 7.0 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:1585
26/02/15 15:16:01 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[13] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106) (first 15 tasks are for partitions Vector(0))
26/02/15 15:16:01 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks resource profile 0
26/02/15 15:16:01 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 5) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11473 bytes) 
26/02/15 15:16:01 INFO Executor: Running task 0.0 in stage 3.0 (TID 5)
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_9_piece0 on 192.168.100.32:49662 in memory (size: 7.0 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 4.27675 ms
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 192.168.100.32:49662 in memory (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.100.32:49662 in memory (size: 4.4 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_11_piece0 on 192.168.100.32:49662 in memory (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO Executor: Finished task 0.0 in stage 3.0 (TID 5). 7037 bytes result sent to driver
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_7_piece0 on 192.168.100.32:49662 in memory (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 5) in 12 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:01 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
26/02/15 15:16:01 INFO DAGScheduler: ResultStage 3 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106) finished in 0.020 s
26/02/15 15:16:01 INFO DAGScheduler: Job 3 is finished. Cancelling potential speculative or zombie tasks for this job
26/02/15 15:16:01 INFO TaskSchedulerImpl: Killing all running tasks in stage 3: Stage finished
26/02/15 15:16:01 INFO DAGScheduler: Job 3 finished: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106, took 0.020872 s
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.100.32:49662 in memory (size: 29.7 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Removed broadcast_5_piece0 on 192.168.100.32:49662 in memory (size: 7.0 KiB, free: 2.4 GiB)
ID: 1, Name: Alice, Block Length: -1
ID: 2, Name: Bob, Block Length: -1
ID: 3, Name: Charlie, Block Length: -1

❌ BUG: Some rows have invalid block lengths!

================================================================================
Testing all three metadata functions together
================================================================================

=== All Metadata Functions Results ===
26/02/15 15:16:01 INFO V2ScanRelationPushDown: 
Output: id#57, name#58
         
26/02/15 15:16:01 INFO SnapshotScan: Scanning table local.default.test_table snapshot 8759041077900200141 created at 2026-02-15T12:16:00.623+00:00 with filter true
26/02/15 15:16:01 INFO BaseDistributedDataScan: Planning file tasks locally for table local.default.test_table
26/02/15 15:16:01 INFO SparkPartitioningAwareScan: Reporting UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 14 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 15 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:01 WARN GlutenFallbackReporter: Validation failed for plan: Project[QueryId=5], due to: fallback input file expression
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 7.722917 ms
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_16 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_16_piece0 in memory on 192.168.100.32:49662 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 16 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:01 INFO SparkContext: Starting job: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:01 INFO DAGScheduler: Got job 4 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132) with 1 output partitions
26/02/15 15:16:01 INFO DAGScheduler: Final stage: ResultStage 4 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132)
26/02/15 15:16:01 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:01 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:01 INFO DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[17] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132), which has no missing parents
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_17 stored as values in memory (estimated size 17.0 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 7.1 KiB, free 2.4 GiB)
26/02/15 15:16:01 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory on 192.168.100.32:49662 (size: 7.1 KiB, free: 2.4 GiB)
26/02/15 15:16:01 INFO SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:1585
26/02/15 15:16:01 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[17] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132) (first 15 tasks are for partitions Vector(0))
26/02/15 15:16:01 INFO TaskSchedulerImpl: Adding task set 4.0 with 1 tasks resource profile 0
26/02/15 15:16:01 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 6) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11473 bytes) 
26/02/15 15:16:01 INFO Executor: Running task 0.0 in stage 4.0 (TID 6)
26/02/15 15:16:01 INFO CodeGenerator: Code generated in 4.799166 ms
26/02/15 15:16:01 INFO Executor: Finished task 0.0 in stage 4.0 (TID 6). 7049 bytes result sent to driver
26/02/15 15:16:01 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 6) in 23 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:01 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 
26/02/15 15:16:01 INFO DAGScheduler: ResultStage 4 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132) finished in 0.027 s
26/02/15 15:16:01 INFO DAGScheduler: Job 4 is finished. Cancelling potential speculative or zombie tasks for this job
26/02/15 15:16:01 INFO TaskSchedulerImpl: Killing all running tasks in stage 4: Stage finished
26/02/15 15:16:01 INFO DAGScheduler: Job 4 finished: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132, took 0.029916 s
ID: 1, Name: Alice
  File: ''
  Block Start: -1
  Block Length: -1

ID: 2, Name: Bob
  File: ''
  Block Start: -1
  Block Length: -1

ID: 3, Name: Charlie
  File: ''
  Block Start: -1
  Block Length: -1

❌ SOME TESTS FAILED: Check the output above for details
26/02/15 15:16:01 INFO SparkContext: SparkContext is stopping with exitCode 0.
26/02/15 15:16:01 INFO SparkUI: Stopped Spark web UI at http://192.168.100.32:4040
26/02/15 15:16:01 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
26/02/15 15:16:01 INFO MemoryStore: MemoryStore cleared
26/02/15 15:16:01 INFO BlockManager: BlockManager stopped
26/02/15 15:16:01 INFO BlockManagerMaster: BlockManagerMaster stopped
26/02/15 15:16:01 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
26/02/15 15:16:01 INFO SparkContext: Successfully stopped SparkContext
26/02/15 15:16:02 INFO ShutdownHookManager: Shutdown hook called
26/02/15 15:16:02 INFO ShutdownHookManager: Deleting directory /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/spark-63dfb5d6-fcec-4e7f-a9a3-af9007b62490
26/02/15 15:16:02 INFO ShutdownHookManager: Deleting directory /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/spark-d0203fd8-92c9-42cb-a2b3-e3437e2b4a37
26/02/15 15:16:02 INFO ShutdownHookManager: Deleting directory /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/spark-63dfb5d6-fcec-4e7f-a9a3-af9007b62490/pyspark-11640bfd-7a71-41d6-8134-9eca7086ac34

[Process completed]

After

Last login: Sun Feb 15 15:15:06 on ttys072
(3.13.3) ➜  incubator-gluten git:(main) ✗ export GLUTEN_JAR=/Users/reema/Desktop/OpenSource/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-iceberg-fix.jar
export ICEBERG_JAR=/tmp/iceberg.jar

spark-submit \
  --jars "$GLUTEN_JAR,$ICEBERG_JAR" \
  --conf spark.plugins=org.apache.gluten.GlutenPlugin \
  --conf spark.gluten.sql.columnar.backend.lib=velox \
  --conf spark.gluten.enabled=true \
  --conf spark.driver.extraClassPath="$GLUTEN_JAR:$ICEBERG_JAR" \
  --conf spark.executor.extraClassPath="$GLUTEN_JAR:$ICEBERG_JAR" \
  test_iceberg_simple.py
26/02/15 15:16:15 WARN Utils: Your hostname, Reemas-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.100.32 instead (on interface en0)
26/02/15 15:16:15 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
26/02/15 15:16:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
26/02/15 15:16:16 INFO SparkContext: Running Spark version 3.5.5
26/02/15 15:16:16 INFO SparkContext: OS info Mac OS X, 15.6, aarch64
26/02/15 15:16:16 INFO SparkContext: Java version 17.0.18
26/02/15 15:16:16 INFO ResourceUtils: ==============================================================
26/02/15 15:16:16 INFO ResourceUtils: No custom resources configured for spark.driver.
26/02/15 15:16:16 INFO ResourceUtils: ==============================================================
26/02/15 15:16:16 INFO SparkContext: Submitted application: iceberg-input-file-metadata-test
26/02/15 15:16:16 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 2048, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
26/02/15 15:16:16 INFO ResourceProfile: Limiting resource is cpu
26/02/15 15:16:16 INFO ResourceProfileManager: Added ResourceProfile id: 0
26/02/15 15:16:16 INFO SecurityManager: Changing view acls to: reema
26/02/15 15:16:16 INFO SecurityManager: Changing modify acls to: reema
26/02/15 15:16:16 INFO SecurityManager: Changing view acls groups to: 
26/02/15 15:16:16 INFO SecurityManager: Changing modify acls groups to: 
26/02/15 15:16:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: reema; groups with view permissions: EMPTY; users with modify permissions: reema; groups with modify permissions: EMPTY
26/02/15 15:16:17 INFO Utils: Successfully started service 'sparkDriver' on port 49683.
26/02/15 15:16:17 INFO SparkEnv: Registering MapOutputTracker
26/02/15 15:16:17 INFO SparkEnv: Registering BlockManagerMaster
26/02/15 15:16:17 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
26/02/15 15:16:17 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
26/02/15 15:16:17 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
26/02/15 15:16:17 INFO DiskBlockManager: Created local directory at /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/blockmgr-28fb4afe-55ae-4ae3-b6bb-b9ce02a8e490
26/02/15 15:16:17 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:17 INFO SparkEnv: Registering OutputCommitCoordinator
26/02/15 15:16:17 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
26/02/15 15:16:17 INFO Utils: Successfully started service 'SparkUI' on port 4040.
26/02/15 15:16:17 INFO SparkContext: Added JAR file:///Users/reema/Desktop/OpenSource/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-iceberg-fix.jar at spark://192.168.100.32:49683/jars/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-iceberg-fix.jar with timestamp 1771157776861
26/02/15 15:16:17 INFO SparkContext: Added JAR file:///private/tmp/iceberg.jar at spark://192.168.100.32:49683/jars/iceberg.jar with timestamp 1771157776861
26/02/15 15:16:17 INFO Discovery: Start discovering components in the current classpath... 
26/02/15 15:16:17 INFO Discovery: Discovered component files: org.apache.gluten.backendsapi.velox.VeloxBackend, org.apache.gluten.component.VeloxIcebergComponent. Duration: 4 ms.
26/02/15 15:16:17 INFO package: Components registered within order: velox, velox-iceberg
26/02/15 15:16:17 INFO GlutenDriverPlugin: Gluten components:
==============================================================
Component velox
  velox_branch = HEAD
  velox_revision = f247a8e922c4802fd9b9cf7a626421bff9b803fd
  velox_revisionTime = 2026-02-07 14:11:45 +0000
Component velox-iceberg
==============================================================
26/02/15 15:16:17 INFO SubstraitBackend: Gluten build info:
==============================================================
Gluten Version: 1.7.0-SNAPSHOT
GCC Version: 
Java Version: 17
Scala Version: 2.12.15
Spark Version: 3.5.5
Hadoop Version: 2.7.4
Gluten Branch: iceberg-input-file
Gluten Revision: bdb1f9117dc415d0c42c89fbd5533844bfa17b85
Gluten Revision Time: 2026-02-15 01:29:56 +0300
Gluten Build Time: 2026-02-15T11:32:58Z
Gluten Repo URL: https://github.com/ReemaAlzaid/incubator-gluten.git
==============================================================
26/02/15 15:16:17 INFO VeloxListenerApi: Memory overhead is not set. Setting it to 644245094 automatically. Gluten doesn't follow Spark's calculation on default value of this option because the actual required memory overhead will depend on off-heap usage than on on-heap usage.
26/02/15 15:16:17 INFO SparkDirectoryUtil: Created local directory at /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e
26/02/15 15:16:17 INFO JniWorkspace: Creating JNI workspace in root directory /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27
26/02/15 15:16:17 INFO JniWorkspace: JNI workspace /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019 created in root directory /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27
26/02/15 15:16:17 INFO JniLibLoader: Read real path /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libgluten.dylib for libPath /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libgluten.dylib
26/02/15 15:16:17 INFO JniLibLoader: Library /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libgluten.dylib has been loaded using path-loading method
26/02/15 15:16:17 INFO JniLibLoader: Library /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libgluten.dylib has been loaded
26/02/15 15:16:17 INFO JniLibLoader: Successfully loaded library darwin/aarch64/libgluten.dylib
26/02/15 15:16:17 INFO JniLibLoader: Read real path /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libvelox.dylib for libPath /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libvelox.dylib
26/02/15 15:16:17 INFO JniLibLoader: Library /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libvelox.dylib has been loaded using path-loading method
26/02/15 15:16:17 INFO JniLibLoader: Library /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/gluten-87fb8dd4-e5f3-48f1-8ab9-480ad6d7553e/jni/a633327e-1f81-4a3f-8c1b-cf20ab0d3f27/gluten-7854201777374203019/darwin/aarch64/libvelox.dylib has been loaded
26/02/15 15:16:17 INFO JniLibLoader: Successfully loaded library darwin/aarch64/libvelox.dylib
W20260215 15:16:17.696556 14493069 MemoryArbitrator.cpp:84] Query memory capacity[460.50MB] is set for NOOP arbitrator which has no capacity enforcement
26/02/15 15:16:17 INFO DriverPluginContainer: Initialized driver component for plugin org.apache.gluten.GlutenPlugin.
26/02/15 15:16:17 INFO Executor: Starting executor ID driver on host 192.168.100.32
26/02/15 15:16:17 INFO Executor: OS info Mac OS X, 15.6, aarch64
26/02/15 15:16:17 INFO Executor: Java version 17.0.18
26/02/15 15:16:17 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): 'file:/Users/reema/Desktop/OpenSource/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-iceberg-fix.jar,file:/tmp/iceberg.jar,file:/Users/reema/Desktop/OpenSource/incubator-gluten/gluten-velox-bundle-spark3.5_2.12-darwin_aarch64-1.7.0-iceberg-fix.jar,file:/Users/reema/Desktop/OpenSource/incubator-gluten/iceberg.jar'
26/02/15 15:16:17 INFO Executor: Created or updated repl class loader org.apache.spark.util.MutableURLClassLoader@3a7933da for default.
26/02/15 15:16:17 INFO CodedInputStreamClassInitializer: The defaultRecursionLimit in protobuf has been increased to 100000
26/02/15 15:16:17 INFO VeloxListenerApi: Gluten is running with Spark local mode. Skip running static initializer for executor.
26/02/15 15:16:17 INFO ExecutorPluginContainer: Initialized executor component for plugin org.apache.gluten.GlutenPlugin.
26/02/15 15:16:17 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49684.
26/02/15 15:16:17 INFO NettyBlockTransferService: Server created on 192.168.100.32:49684
26/02/15 15:16:17 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
26/02/15 15:16:17 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.100.32, 49684, None)
26/02/15 15:16:17 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.100.32:49684 with 2.4 GiB RAM, BlockManagerId(driver, 192.168.100.32, 49684, None)
26/02/15 15:16:17 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.100.32, 49684, None)
26/02/15 15:16:17 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.100.32, 49684, None)
26/02/15 15:16:17 INFO VeloxBackend: Gluten SQL Tab has been attached.
26/02/15 15:16:17 INFO SparkShimLoader: Loading Spark Shims for version: 3.5.5
26/02/15 15:16:17 INFO SparkShimLoader: Using Shim provider: List(org.apache.gluten.sql.shims.spark35.SparkShimProvider@4d028882)
================================================================================
Creating Iceberg table...
================================================================================
26/02/15 15:16:17 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
26/02/15 15:16:17 INFO SharedState: Warehouse path is 'file:/Users/reema/Desktop/OpenSource/incubator-gluten/spark-warehouse'.
26/02/15 15:16:18 INFO CatalogUtil: Loading custom FileIO implementation: org.apache.iceberg.hadoop.HadoopFileIO
26/02/15 15:16:18 INFO BaseMetastoreCatalog: Table properties set at catalog level through catalog properties: {}
26/02/15 15:16:18 INFO BaseMetastoreCatalog: Table properties enforced at catalog level through catalog properties: {}
26/02/15 15:16:18 INFO HadoopTableOperations: Committed a new metadata file file:/tmp/iceberg_warehouse/default/test_table/metadata/v1.metadata.json
26/02/15 15:16:19 WARN GlutenFallbackReporter: Validation failed for plan: AppendData[QueryId=1], due to: [FallbackByBackendSettings] Validation failed on node AppendData
26/02/15 15:16:19 INFO CodeGenerator: Code generated in 84.452875 ms
26/02/15 15:16:19 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:19 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 29.7 KiB, free 2.4 GiB)
26/02/15 15:16:19 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.100.32:49684 (size: 29.7 KiB, free: 2.4 GiB)
26/02/15 15:16:19 INFO SparkContext: Created broadcast 0 from broadcast at SparkWrite.java:195
26/02/15 15:16:19 INFO AppendDataExec: Start processing data source write support: IcebergBatchWrite(table=local.default.test_table, format=PARQUET). The input RDD has 3 partitions.
26/02/15 15:16:19 INFO SparkContext: Starting job: sql at NativeMethodAccessorImpl.java:0
26/02/15 15:16:19 INFO DAGScheduler: Got job 0 (sql at NativeMethodAccessorImpl.java:0) with 3 output partitions
26/02/15 15:16:19 INFO DAGScheduler: Final stage: ResultStage 0 (sql at NativeMethodAccessorImpl.java:0)
26/02/15 15:16:19 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:19 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:19 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at sql at NativeMethodAccessorImpl.java:0), which has no missing parents
26/02/15 15:16:19 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 7.8 KiB, free 2.4 GiB)
26/02/15 15:16:19 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.4 KiB, free 2.4 GiB)
26/02/15 15:16:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.100.32:49684 (size: 4.4 KiB, free: 2.4 GiB)
26/02/15 15:16:19 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1585
26/02/15 15:16:19 INFO DAGScheduler: Submitting 3 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at sql at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0, 1, 2))
26/02/15 15:16:19 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 tasks resource profile 0
26/02/15 15:16:19 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 9506 bytes) 
26/02/15 15:16:19 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1) (192.168.100.32, executor driver, partition 1, PROCESS_LOCAL, 9506 bytes) 
26/02/15 15:16:19 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2) (192.168.100.32, executor driver, partition 2, PROCESS_LOCAL, 9506 bytes) 
26/02/15 15:16:19 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
26/02/15 15:16:19 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
26/02/15 15:16:19 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
26/02/15 15:16:19 INFO CodecPool: Got brand-new compressor [.zstd]
26/02/15 15:16:19 INFO CodecPool: Got brand-new compressor [.zstd]
26/02/15 15:16:19 INFO CodecPool: Got brand-new compressor [.zstd]
26/02/15 15:16:19 INFO DataWritingSparkTask: Writer for partition 1 is committing.
26/02/15 15:16:19 INFO DataWritingSparkTask: Writer for partition 0 is committing.
26/02/15 15:16:19 INFO DataWritingSparkTask: Writer for partition 2 is committing.
26/02/15 15:16:19 INFO DataWritingSparkTask: Committed partition 0 (task 0, attempt 0, stage 0.0)
26/02/15 15:16:19 INFO DataWritingSparkTask: Committed partition 2 (task 2, attempt 0, stage 0.0)
26/02/15 15:16:19 INFO DataWritingSparkTask: Committed partition 1 (task 1, attempt 0, stage 0.0)
26/02/15 15:16:19 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 4161 bytes result sent to driver
26/02/15 15:16:19 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 4157 bytes result sent to driver
26/02/15 15:16:19 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 4153 bytes result sent to driver
26/02/15 15:16:19 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 401 ms on 192.168.100.32 (executor driver) (1/3)
26/02/15 15:16:19 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 393 ms on 192.168.100.32 (executor driver) (2/3)
26/02/15 15:16:19 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 393 ms on 192.168.100.32 (executor driver) (3/3)
26/02/15 15:16:19 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
26/02/15 15:16:19 INFO DAGScheduler: ResultStage 0 (sql at NativeMethodAccessorImpl.java:0) finished in 0.450 s
26/02/15 15:16:19 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
26/02/15 15:16:19 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
26/02/15 15:16:19 INFO DAGScheduler: Job 0 finished: sql at NativeMethodAccessorImpl.java:0, took 0.477432 s
26/02/15 15:16:19 INFO AppendDataExec: Data source write support IcebergBatchWrite(table=local.default.test_table, format=PARQUET) is committing.
26/02/15 15:16:19 INFO SparkWrite: Committing append with 3 new data files to table local.default.test_table
26/02/15 15:16:20 INFO HadoopTableOperations: Committed a new metadata file file:/tmp/iceberg_warehouse/default/test_table/metadata/v2.metadata.json
26/02/15 15:16:20 INFO SnapshotProducer: Committed snapshot 7722039398521868759 (MergeAppend)
26/02/15 15:16:20 INFO LoggingMetricsReporter: Received metrics report: CommitReport{tableName=local.default.test_table, snapshotId=7722039398521868759, sequenceNumber=1, operation=append, commitMetrics=CommitMetricsResult{totalDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT0.239004458S, count=1}, attempts=CounterResult{unit=COUNT, value=1}, addedDataFiles=CounterResult{unit=COUNT, value=3}, removedDataFiles=null, totalDataFiles=CounterResult{unit=COUNT, value=3}, addedDeleteFiles=null, addedEqualityDeleteFiles=null, addedPositionalDeleteFiles=null, addedDVs=null, removedDeleteFiles=null, removedEqualityDeleteFiles=null, removedPositionalDeleteFiles=null, removedDVs=null, totalDeleteFiles=CounterResult{unit=COUNT, value=0}, addedRecords=CounterResult{unit=COUNT, value=3}, removedRecords=null, totalRecords=CounterResult{unit=COUNT, value=3}, addedFilesSizeInBytes=CounterResult{unit=BYTES, value=1920}, removedFilesSizeInBytes=null, totalFilesSizeInBytes=CounterResult{unit=BYTES, value=1920}, addedPositionalDeletes=null, removedPositionalDeletes=null, totalPositionalDeletes=CounterResult{unit=COUNT, value=0}, addedEqualityDeletes=null, removedEqualityDeletes=null, totalEqualityDeletes=CounterResult{unit=COUNT, value=0}, manifestsCreated=null, manifestsReplaced=null, manifestsKept=null, manifestEntriesProcessed=null}, metadata={engine-version=3.5.5, app-id=local-1771157777707, engine-name=spark, iceberg-version=Apache Iceberg 1.10.0 (commit 2114bf631e49af532d66e2ce148ee49dd1dd1f1f)}}
26/02/15 15:16:20 INFO SparkWrite: Committed in 255 ms
26/02/15 15:16:20 INFO AppendDataExec: Data source write support IcebergBatchWrite(table=local.default.test_table, format=PARQUET) committed.

================================================================================
Testing input_file_name() on Iceberg table
================================================================================

=== input_file_name() Results ===
26/02/15 15:16:20 INFO V2ScanRelationPushDown: 
Output: id#7, name#8
         
26/02/15 15:16:20 INFO SnapshotScan: Scanning table local.default.test_table snapshot 7722039398521868759 created at 2026-02-15T12:16:20.033+00:00 with filter true
26/02/15 15:16:20 INFO BaseDistributedDataScan: Planning file tasks locally for table local.default.test_table
26/02/15 15:16:20 INFO SparkPartitioningAwareScan: Reporting UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 30.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.100.32:49684 (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 2 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.100.32:49684 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 3 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 WARN GlutenFallbackReporter: Validation failed for plan: Project[QueryId=2], due to: fallback input file expression
26/02/15 15:16:20 INFO CodeGenerator: Code generated in 15.250084 ms
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 30.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.100.32:49684 (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 4 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 192.168.100.32:49684 in memory (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.100.32:49684 in memory (size: 4.4 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.100.32:49684 in memory (size: 29.7 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Starting job: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56
26/02/15 15:16:20 INFO DAGScheduler: Got job 1 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56) with 1 output partitions
26/02/15 15:16:20 INFO DAGScheduler: Final stage: ResultStage 1 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56)
26/02/15 15:16:20 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:20 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:20 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[8] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56), which has no missing parents
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 29.5 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 11.7 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 192.168.100.32:49684 (size: 11.7 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1585
26/02/15 15:16:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[8] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56) (first 15 tasks are for partitions Vector(0))
26/02/15 15:16:20 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks resource profile 0
26/02/15 15:16:20 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 3) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11766 bytes) 
26/02/15 15:16:20 INFO Executor: Running task 0.0 in stage 1.0 (TID 3)
26/02/15 15:16:20 INFO CodeGenerator: Code generated in 7.91075 ms
26/02/15 15:16:20 INFO BaseAllocator: Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true.
26/02/15 15:16:20 INFO DefaultAllocationManagerOption: allocation manager type not specified, using netty as the default type
26/02/15 15:16:20 INFO CheckAllocator: Using DefaultAllocationManager at memory/DefaultAllocationManagerFactory.class
26/02/15 15:16:20 INFO CodeGenerator: Code generated in 4.909666 ms
26/02/15 15:16:20 INFO Executor: Finished task 0.0 in stage 1.0 (TID 3). 8350 bytes result sent to driver
26/02/15 15:16:20 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 3) in 118 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:20 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
26/02/15 15:16:20 INFO DAGScheduler: ResultStage 1 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56) finished in 0.122 s
26/02/15 15:16:20 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job
26/02/15 15:16:20 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished
26/02/15 15:16:20 INFO DAGScheduler: Job 1 finished: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:56, took 0.123958 s
ID: 1, Name: Alice, File: 'file:/tmp/iceberg_warehouse/default/test_table/data/00000-0-2b0c5d04-98fb-4cda-bdf3-7dac021f8032-0-00001.parquet'
ID: 2, Name: Bob, File: 'file:/tmp/iceberg_warehouse/default/test_table/data/00001-1-2b0c5d04-98fb-4cda-bdf3-7dac021f8032-0-00001.parquet'
ID: 3, Name: Charlie, File: 'file:/tmp/iceberg_warehouse/default/test_table/data/00002-2-2b0c5d04-98fb-4cda-bdf3-7dac021f8032-0-00001.parquet'

✅ SUCCESS: All 3 rows have valid file paths

================================================================================
Testing input_file_block_start() on Iceberg table
================================================================================

=== input_file_block_start() Results ===
26/02/15 15:16:20 INFO V2ScanRelationPushDown: 
Output: id#24, name#25
         
26/02/15 15:16:20 INFO SnapshotScan: Scanning table local.default.test_table snapshot 7722039398521868759 created at 2026-02-15T12:16:20.033+00:00 with filter true
26/02/15 15:16:20 INFO BaseDistributedDataScan: Planning file tasks locally for table local.default.test_table
26/02/15 15:16:20 INFO SparkPartitioningAwareScan: Reporting UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on 192.168.100.32:49684 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 6 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 30.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 192.168.100.32:49684 (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 7 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_5_piece0 on 192.168.100.32:49684 in memory (size: 11.7 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 WARN GlutenFallbackReporter: Validation failed for plan: Project[QueryId=3], due to: fallback input file expression
26/02/15 15:16:20 INFO CodeGenerator: Code generated in 7.716333 ms
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on 192.168.100.32:49684 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 8 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:20 INFO SparkContext: Starting job: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82
26/02/15 15:16:20 INFO DAGScheduler: Got job 2 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82) with 1 output partitions
26/02/15 15:16:20 INFO DAGScheduler: Final stage: ResultStage 2 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82)
26/02/15 15:16:20 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:20 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:20 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[15] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82), which has no missing parents
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_7_piece0 on 192.168.100.32:49684 in memory (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 29.7 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 11.8 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on 192.168.100.32:49684 (size: 11.8 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:1585
26/02/15 15:16:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[15] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82) (first 15 tasks are for partitions Vector(0))
26/02/15 15:16:20 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks resource profile 0
26/02/15 15:16:20 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11786 bytes) 
26/02/15 15:16:20 INFO Executor: Running task 0.0 in stage 2.0 (TID 4)
26/02/15 15:16:20 INFO CodeGenerator: Code generated in 3.413875 ms
26/02/15 15:16:20 INFO CodeGenerator: Code generated in 4.24975 ms
26/02/15 15:16:20 INFO Executor: Finished task 0.0 in stage 2.0 (TID 4). 8212 bytes result sent to driver
26/02/15 15:16:20 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 22 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:20 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
26/02/15 15:16:20 INFO DAGScheduler: ResultStage 2 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82) finished in 0.028 s
26/02/15 15:16:20 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job
26/02/15 15:16:20 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished
26/02/15 15:16:20 INFO DAGScheduler: Job 2 finished: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:82, took 0.035285 s
ID: 1, Name: Alice, Block Start: 4
ID: 2, Name: Bob, Block Start: 4
ID: 3, Name: Charlie, Block Start: 4

✅ SUCCESS: All 3 rows have valid block start positions

================================================================================
Testing input_file_block_length() on Iceberg table
================================================================================

=== input_file_block_length() Results ===
26/02/15 15:16:20 INFO V2ScanRelationPushDown: 
Output: id#41, name#42
         
26/02/15 15:16:20 INFO SnapshotScan: Scanning table local.default.test_table snapshot 7722039398521868759 created at 2026-02-15T12:16:20.033+00:00 with filter true
26/02/15 15:16:20 INFO BaseDistributedDataScan: Planning file tasks locally for table local.default.test_table
26/02/15 15:16:20 INFO SparkPartitioningAwareScan: Reporting UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 30.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on 192.168.100.32:49684 (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 10 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on 192.168.100.32:49684 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_9_piece0 on 192.168.100.32:49684 in memory (size: 11.8 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 11 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:20 WARN GlutenFallbackReporter: Validation failed for plan: Project[QueryId=4], due to: fallback input file expression
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 30.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on 192.168.100.32:49684 (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 12 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:20 INFO BlockManagerInfo: Removed broadcast_11_piece0 on 192.168.100.32:49684 in memory (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Starting job: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106
26/02/15 15:16:20 INFO DAGScheduler: Got job 3 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106) with 1 output partitions
26/02/15 15:16:20 INFO DAGScheduler: Final stage: ResultStage 3 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106)
26/02/15 15:16:20 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:20 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:20 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[22] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106), which has no missing parents
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 29.7 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 11.8 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory on 192.168.100.32:49684 (size: 11.8 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:1585
26/02/15 15:16:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[22] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106) (first 15 tasks are for partitions Vector(0))
26/02/15 15:16:20 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks resource profile 0
26/02/15 15:16:20 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 5) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11795 bytes) 
26/02/15 15:16:20 INFO Executor: Running task 0.0 in stage 3.0 (TID 5)
26/02/15 15:16:20 INFO Executor: Finished task 0.0 in stage 3.0 (TID 5). 8220 bytes result sent to driver
26/02/15 15:16:20 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 5) in 17 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:20 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
26/02/15 15:16:20 INFO DAGScheduler: ResultStage 3 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106) finished in 0.019 s
26/02/15 15:16:20 INFO DAGScheduler: Job 3 is finished. Cancelling potential speculative or zombie tasks for this job
26/02/15 15:16:20 INFO TaskSchedulerImpl: Killing all running tasks in stage 3: Stage finished
26/02/15 15:16:20 INFO DAGScheduler: Job 3 finished: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:106, took 0.020702 s
ID: 1, Name: Alice, Block Length: 636
ID: 2, Name: Bob, Block Length: 622
ID: 3, Name: Charlie, Block Length: 650

✅ SUCCESS: All 3 rows have valid block lengths

================================================================================
Testing all three metadata functions together
================================================================================

=== All Metadata Functions Results ===
26/02/15 15:16:20 INFO V2ScanRelationPushDown: 
Output: id#60, name#61
         
26/02/15 15:16:20 INFO SnapshotScan: Scanning table local.default.test_table snapshot 7722039398521868759 created at 2026-02-15T12:16:20.033+00:00 with filter true
26/02/15 15:16:20 INFO BaseDistributedDataScan: Planning file tasks locally for table local.default.test_table
26/02/15 15:16:20 INFO SparkPartitioningAwareScan: Reporting UnknownPartitioning with 1 partition(s) for table local.default.test_table
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:20 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on 192.168.100.32:49684 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:20 INFO SparkContext: Created broadcast 14 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:21 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:21 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 30.0 KiB, free 2.4 GiB)
26/02/15 15:16:21 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on 192.168.100.32:49684 (size: 30.0 KiB, free: 2.4 GiB)
26/02/15 15:16:21 INFO SparkContext: Created broadcast 15 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 INFO MemoryStore: MemoryStore started with capacity 2.4 GiB
26/02/15 15:16:21 WARN GlutenFallbackReporter: Validation failed for plan: Project[QueryId=5], due to: fallback input file expression
26/02/15 15:16:21 INFO CodeGenerator: Code generated in 6.058208 ms
26/02/15 15:16:21 INFO MemoryStore: Block broadcast_16 stored as values in memory (estimated size 32.0 KiB, free 2.4 GiB)
26/02/15 15:16:21 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 29.9 KiB, free 2.4 GiB)
26/02/15 15:16:21 INFO BlockManagerInfo: Added broadcast_16_piece0 in memory on 192.168.100.32:49684 (size: 29.9 KiB, free: 2.4 GiB)
26/02/15 15:16:21 INFO SparkContext: Created broadcast 16 from collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:21 INFO SparkContext: Starting job: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132
26/02/15 15:16:21 INFO DAGScheduler: Got job 4 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132) with 1 output partitions
26/02/15 15:16:21 INFO DAGScheduler: Final stage: ResultStage 4 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132)
26/02/15 15:16:21 INFO DAGScheduler: Parents of final stage: List()
26/02/15 15:16:21 INFO DAGScheduler: Missing parents: List()
26/02/15 15:16:21 INFO DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[29] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132), which has no missing parents
26/02/15 15:16:21 INFO MemoryStore: Block broadcast_17 stored as values in memory (estimated size 30.2 KiB, free 2.4 GiB)
26/02/15 15:16:21 INFO MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 12.0 KiB, free 2.4 GiB)
26/02/15 15:16:21 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory on 192.168.100.32:49684 (size: 12.0 KiB, free: 2.4 GiB)
26/02/15 15:16:21 INFO SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:1585
26/02/15 15:16:21 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[29] at collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132) (first 15 tasks are for partitions Vector(0))
26/02/15 15:16:21 INFO TaskSchedulerImpl: Adding task set 4.0 with 1 tasks resource profile 0
26/02/15 15:16:21 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 6) (192.168.100.32, executor driver, partition 0, PROCESS_LOCAL, 11989 bytes) 
26/02/15 15:16:21 INFO Executor: Running task 0.0 in stage 4.0 (TID 6)
26/02/15 15:16:21 INFO CodeGenerator: Code generated in 4.262917 ms
26/02/15 15:16:21 INFO CodeGenerator: Code generated in 10.714417 ms
26/02/15 15:16:21 INFO Executor: Finished task 0.0 in stage 4.0 (TID 6). 8369 bytes result sent to driver
26/02/15 15:16:21 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 6) in 42 ms on 192.168.100.32 (executor driver) (1/1)
26/02/15 15:16:21 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 
26/02/15 15:16:21 INFO DAGScheduler: ResultStage 4 (collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132) finished in 0.047 s
26/02/15 15:16:21 INFO DAGScheduler: Job 4 is finished. Cancelling potential speculative or zombie tasks for this job
26/02/15 15:16:21 INFO TaskSchedulerImpl: Killing all running tasks in stage 4: Stage finished
26/02/15 15:16:21 INFO DAGScheduler: Job 4 finished: collect at /Users/reema/Desktop/OpenSource/incubator-gluten/test_iceberg_simple.py:132, took 0.050441 s
ID: 1, Name: Alice
  File: 'file:/tmp/iceberg_warehouse/default/test_table/data/00000-0-2b0c5d04-98fb-4cda-bdf3-7dac021f8032-0-00001.parquet'
  Block Start: 4
  Block Length: 636

ID: 2, Name: Bob
  File: 'file:/tmp/iceberg_warehouse/default/test_table/data/00001-1-2b0c5d04-98fb-4cda-bdf3-7dac021f8032-0-00001.parquet'
  Block Start: 4
  Block Length: 622

ID: 3, Name: Charlie
  File: 'file:/tmp/iceberg_warehouse/default/test_table/data/00002-2-2b0c5d04-98fb-4cda-bdf3-7dac021f8032-0-00001.parquet'
  Block Start: 4
  Block Length: 650

✅ ALL TESTS PASSED: All metadata functions work correctly!
26/02/15 15:16:21 INFO SparkContext: SparkContext is stopping with exitCode 0.
26/02/15 15:16:21 INFO SparkUI: Stopped Spark web UI at http://192.168.100.32:4040
26/02/15 15:16:21 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
26/02/15 15:16:21 INFO MemoryStore: MemoryStore cleared
26/02/15 15:16:21 INFO BlockManager: BlockManager stopped
26/02/15 15:16:21 INFO BlockManagerMaster: BlockManagerMaster stopped
26/02/15 15:16:21 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
26/02/15 15:16:21 INFO SparkContext: Successfully stopped SparkContext
26/02/15 15:16:21 INFO ShutdownHookManager: Shutdown hook called
26/02/15 15:16:21 INFO ShutdownHookManager: Deleting directory /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/spark-dbc8040a-b030-4017-b078-49a1acd7c001/pyspark-fe5f02ee-61ab-470d-a651-4a4eb74b901e
26/02/15 15:16:21 INFO ShutdownHookManager: Deleting directory /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/spark-0482ef35-5e29-4648-b2b4-9fc9e31eccac
26/02/15 15:16:21 INFO ShutdownHookManager: Deleting directory /private/var/folders/5z/4mxhbysx1hb00rzxt6wj738m0000gn/T/spark-dbc8040a-b030-4017-b078-49a1acd7c001

[Process completed]

@FelixYBW FelixYBW requested a review from rui-mo February 16, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core DATA_LAKE VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant