What's Changed
- [GLUTEN-8846][CH] [Part 3] Add benchmark for Icerberg Delete by @baibaichen in #9192
- [GLUTEN-9020][CH] Support delta DV BitmapAggregator by @loneylee in #9138
- [GLUTEN-9197][CH] Simplify sum aggregate expression by @taiyang-li in #9198
- [VL] Enable more ut in VeloxTestSettings by @WangGuangxin in #9080
- [GLUTEN-9199][VL] Fix error when creating shuffle file: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments by @zhztheplayer in #9200
- [CORE] Fix duplicate setting for config LEGACY_TIME_PARSER_POLICY by @jinchengchenghh in #9201
- [GLUTEN-9176][CH] Rewrite aggregate if to aggregate with filter clause by @taiyang-li in #9185
- [GLUTEN-8557][CH] Flatten nested
And/Orfor performance optimization by @KevinyhZou in #8558 - Revert "[GLUTEN-9164][CH]Enable row group level bloom filter push down" by @taiyang-li in #9214
- [GLUTEN-9182][VL] Support new s3 configuration in Gluten by @dcoliversun in #9183
- [VL] Celeborn shuffle reader OOM with many empty input stream by @marin-ma in #9221
- [GLUTEN-8821][VL] Update aggregate/generator/window support doc and script by @marin-ma in #8971
- [VL] Change to use Velox's wget_and_untar in setup-centos7.sh by @yaooqinn in #9207
- [GLUTEN-9196][CH] Use wide-table aggregation to eliminate multi-table joins by @lgbo-ustc in #9155
- [GLUTEN-9149][CORE] Remove Spark-specific code from JniLibLoader & JniWorkspace by @shuai-xu in #9150
- [VL][CI] Change to use JDK-17 for Spark 3.3/3.4/3.5 tests by @PHILO-HE in #9209
- [CORE][VL] Hide child nodes from implementations of
OffloadSingleNodeby @zhztheplayer in #9220 - [GLUTEN-9008][VL] Support json_object_keys function by @dcoliversun in #9009
- [GLUTEN-9239][CH] Support JDK17 for the CH backend by @zzcclp in #9242
- [GLUTEN-9152][CORE] Avoid unnecessary serialization of hadoop conf by @zml1206 in #9153
- [GLUTEN-9240][VL] Write NULL value into relation in gluten unit tests by @dcoliversun in #9241
- [VL][CI] bump to use ubuntu-22.04 runner by @zhouyuan in #9262
- [GLUTEN-9177][CH]Fix diff on parse host of url and refactor
SparkParseURLby @KevinyhZou in #9179 - [CORE] Decrease offheap memory size in resource profile for whole stage fallback case by @PHILO-HE in #8911
- [GLUTEN-9205][CH] Support deletion vector native write by @loneylee in #9248
- [VL] Delete global reference to a class object in JNI unload by @PHILO-HE in #9268
- [GLUTEN-9245][VL] Fix partial project expression contains subquery by @jinchengchenghh in #9259
- [GLUTEN-9244][CORE] Change the way of passing default timezone to native config by @zml1206 in #9249
- [GLUTEN-8497][VL] Fix columnar batch type mismatch in table cache by @zhztheplayer in #9230
- [VL] Support Spark legacy statistical aggregation function behavior by @NEUpanning in #9181
- [CORE] Remove library unloading API from JniLibLoader as unused by @zhztheplayer in #9277
- [GLUTEN-9237][CH] Fix the nullability missmatch issue for the Nothing type by @lgbo-ustc in #9238
- [VL] Disable FlushableHashAggreagte when aggregates contains sum/avg for floating type by @kecookier in #8986
- [CORE] Refine the test with specified spark version by @yikf in #9274
- [CH] Add a comment to explain why the endpoint uses a single thread by @dcoliversun in #9257
- [GLUTEN-8891][VL] Refine local ssd cache feature by @zhouyuan in #9228
- [GLUTEN-9267][CH] Fix a bug in
EliminateDeduplicateAggregateWithAnyJoinby @lgbo-ustc in #9293 - [VL] Remove param original of ColumnarPartialProjectExec by @zml1206 in #9290
- [GLUTEN-9178][CH] Fix cse in aggregate operator not working by @loneylee in #9301
- [CORE] Post events until both spark ui and gluten ui are enable by @yikf in #9272
- [CORE] Correctly handle driver configurations when
spark.sql.extensionsis explicitly set for GlutenSessionExtensions by @zhztheplayer in #9312 - [GLUTEN-8851][VL] feat: Support cudf by @jinchengchenghh in #9229
- [GLUTEN-9288][VL] Enable array_prepend function for spark 3.5+ by @dcoliversun in #9305
- [GLUTEN-9317][CH]Fix: duplicated column names in shuffle read by @lgbo-ustc in #9318
- [Gluten-9254][CH] Support RDDScanExec by @loneylee in #9270
- [VL] Count total JVM memory as the on-heap portion for the off-heap sizing feature by @zhztheplayer in #9321
- [GLUTEN-9300][DOC] Support replacement expression in gen-function-support-docs by @dcoliversun in #9331
- [GLUTEN-9239][CH] [PART-1] Support Java-17 Rmove
JNI_OnUnloadby @baibaichen in #9275 - [GLUTEN-7652][VL] Support binary as string by @wForget in #9325
- [Gluten-9334][CH] Support delta metadata column
file_pathandrow_indexfor mergetree by @loneylee in #9340 - [GLUTEN-6867][CH] Fix Bug that cann't read file on minio by @baibaichen in #9332
- [VL] Provide a configuration option to completely turn off off-heap memory tracking with Spark memory manager by @zhztheplayer in #9341
- [GLUTEN-9313][VL] ColumnarPartialProject supports built-in but blacklisted function by @WangGuangxin in #9315
- [GLUTEN-8772][CORE] refactor: Refactoring the usage of SubstraitContext#functionMap by @wypb in #8775
- [VL] Move pre-configuration code of dynamic off-heap sizing to its own place by @zhztheplayer in #9336
- [GLUTEN-9163][VL] Use stream de/compressor in sort-based shuffle by @marin-ma in #9278
- [GLUTEN-9287][VL] Enable array_compact function for Spark 3.4+ by @dcoliversun in #9349
- [GLUTEN-9095][UT] Remove Vanilla Spark InternalRow based checkEvaluation by @ArnavBalyan in #9096
- [CORE] Make max broadcast table size configurable by @yaooqinn in #9359
- [CH] Fix build error by @exmy in #9363
- [GLUTEN-9243][VL] Fix cuda docker image by @zhouyuan in #9333
- [GLUTEN-8912][VL] Add Offset support for CollectLimitExec by @ArnavBalyan in #8914
- [GLUTEN-7589][VL] Support date_trunc function by @zml1206 in #7611
- [GLUTEN-9279] Not pulling out expression from PartialMerge aggregate function to avoid invalid reference binding in ProjectExecTransformer by @Z1Wu in #9280
- [Gluten-8792][CH] Support delta project incrementMetric expr by @loneylee in #9353
- [GLUTEN-9034][VL] Add VeloxResizeBatchesExec for Shuffle by @WangGuangxin in #9035
- Fix ColumnarToRowRemovalGuard not able to be copied by @yaooqinn in #9384
- [GLUTEN-8846][CH] [Part 4] Add full-chain UT by @jlfsdtc in #9256
- [VL] Follow up on #9384 to avoid swallowing exceptions in UT by @zhztheplayer in #9393
- [GLUTEN-9163][VL] Separate compression buffer and disk write buffer configuration by @marin-ma in #9356
- [VL][INFRA] Improve build bundle package workflow by @wForget in #9404
- [VL] Refactor WholeStageTransformer to remove some duplicate code by @wypb in #9388
- [VL] Refactor the HiveConfig to set once by @jinchengchenghh in #9414
- [GLUTEN-8981][VL] Add vcpkg triplet for building Gluten on arm by @zmdaodao in #8982
- [GLUTEN-9137][CH] Support CollectLimit for CH backend by @exmy in #9139
- [CH] Support native base85codec by @loneylee in #9421
- [VL] Fix VeloxColumnarToRowExec repeatedly recorded some metrics by @wypb in #9418
- [VL] The PoC of Flink support in Gluten by @shuai-xu in #8839
- [VL] Fallback pandas udf when input is not an instance of AttributeReference by @Surbhi-Vijay in #9385
- [CH] Change enable_aggregate_if_to_filter to fasle by @exmy in #9420
- [VL][CI] Upgrade ubuntu runner to fix weekly build error by @PHILO-HE in #9443
- [VL] Add a config to enable or disable check usage leak by @j7nhai in #9327
- [GLUTEN-9383][VL] Fix leak when growing capacity by @wForget in #9424
- [VL] Fix dump benchmark data issue in same task by @Yohahaha in #9360
- [GLUTEN-9392] [VL] Support casting array element to varchar by @ArnavBalyan in #9394
- [VL] Remove CollectLimit dependency from Offload Rules by @ArnavBalyan in #9451
- [GLUTEN-9025] Remove the ColumnarPartialProject when its followers don't support columnar by @weixiuli in #9026
- [GLUTEN-8744][VL] Add casting support for timestamp to long by @ArnavBalyan in #8745
- [GLUTEN-9462][CH] Build ARM V8 by default by @lwz9103 in #9463
- [VL] Fix Hudi scan fallback by @xushiyan in #9419
- [GLUTEN-9243][VL] Improve cuda image build by @zhouyuan in #9449
- [GLUTEN-9457][VL] Shuffle test code cleanup by @marin-ma in #9458
- [GLUTEN-9468][VL] Remove parquet-arrow reader dependency in benchmarks by @marin-ma in #9469
- [GLUTEN-8851][VL] Use separate debug config for cudf by @jinchengchenghh in #9466
- Minor: The user specified --extra-conf has high priority by @jinchengchenghh in #9488
- [GLUTEN-9073][VL] Add support for CollectTail Operator by @ArnavBalyan in #9074
- [GLUTEN-9486][VL] Fix example launch.json config in NewToGluten.md documentation by @dmsuehir in #9487
- [VL] Support ObjectDebugInfo in ObjectStore during destruction of native runtime by @ArnavBalyan in #9477
- [GLUTEN-9481][ICEBERG] Fix issue where iceberg columns where not properly being sanitized. by @z123 in #9482
- [VL] Support decimal as double in gluten-it by @jinchengchenghh in #9472
- [VL] Gluten-it: Minor refactor on configuration priorities by @zhztheplayer in #9489
- [GLUTEN-9468][VL] Remove parquet-arrow dependency by @marin-ma in #9483
- [GLUTEN-9496][VL] Enable concat function with array datatype for spark by @dcoliversun in #9497
- [Doc] Update outdated operators in the documentation by @ArnavBalyan in #9474
- [GLUTEN-9502][VL] Remove useless datatype check for concat function by @dcoliversun in #9503
- [GLUTEN-9453][CH][PART-1] Refactor
GlutenClickHouseTPCHAbstractSuiteby @baibaichen in #9454 - [GLUTEN-9392][VL] Support casting integral types to double for array elements by @ArnavBalyan in #9396
- [GLUTEN-8222][VL] Support Factorial function by @ArnavBalyan in #8221
- [CORE] RasOffload: Fix false positive discard by @li-boxuan in #9506
- [GLUTEN-9526][CH] Add cmake option to create symbol instead of rename by @baibaichen in #9527
- [GLUTEN-9517][CH] Allow merge tree format don't configure disk cache by @baibaichen in #9520
- [GLUTEN-9445][VL] Support nexmark q0 for Flink by @shuai-xu in #9446
- [VL] Support user fallback option for CollectTail by @ArnavBalyan in #9531
- [GLUTEN-9448][Flink] Support UTs for flink by @lgbo-ustc in #9534
- [GLUTEN-9435][VL] Fix [Unsafe]ColumnarBuildSideRelation not found ResourceHandle in resource map by @wypb in #9438
- [GLUTEN-9468][VL] Follow up: Remove parquet-arrow cpp build by @marin-ma in #9511
- [VL] Refactor RowToVeloxColumnarExec to remove some duplicate code by @wypb in #9350
- [CORE] Support batch scan customMetrics by @Zouxxyy in #9450
- [GLUTEN-9492][CH] Add Support for CollectTail operator by @ArnavBalyan in #9476
- [GLUTEN-9522][VL] Move date & math scalar functions into new test suites by @dcoliversun in #9533
- [GLUTEN-8877] Port [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning by @wangyum in #9495
- [GLUTEN-9557][VL] Remove outdated test exclusion logic by @dcoliversun in #9558
- [GLUTEN-9551][VL] Fix build failure due to libelf vcpkg unavailable files by @ArnavBalyan in #9550
- [VL] Remove unused HdfsUtils by @wForget in #9578
- [FLINK] Bump Velox4J version by @zhztheplayer in #9545
- [GLUTEN-9475][VL] Serialize ColumnarBatch one by one to reduce memory footprint when broadcasting by @wForget in #9521
- [VL] Support format cpp code on macOS by @wForget in #9584
- [GLUTEN-9392][VL] Support integral to boolean array casts by @ArnavBalyan in #9563
- [GLUTEN-9581][CH] Fix: in subquery expression cannot exist in project by @lgbo-ustc in #9583
- [GLUTEN-9518][FLINK] Support watermark assigner by @shuai-xu in #9541
- [CORE] Remove unnecessary internal from user facing configs by @ArnavBalyan in #9564
- [TYPO] Fix arg used for java version in build info by @yaooqinn in #9602
- [GLUTEN-9594][CH] Reorder join/aggregate keys, ensure keys of all clauses are in the same order by @lgbo-ustc in #9595
- [GLUTEN-9606][CH]Support CH MergeTree + Delta DeletionVector reading by @zzcclp in #9609
- [GLUTEN-9567][FLINK] Add container class
Velox4JBeanfor proper JSON-based object serialization by @zhztheplayer in #9600 - [GLUTEN-9623][FLINK] Fix blocking queue not close may cause memory leak by @KevinyhZou in #9624
- [GLUTEN-9625][VL] Add explicit GCS auth type check for service account key file by @pratham76 in #9627
- [GLUTEN-9631][CH] Fix:
AggregateDescription.parametersis not set inAggregateGroupLimitRelParserby @lgbo-ustc in #9632 - [GLUTEN-9611][CORE] Improve performance of PayloadCloser Iterator by @wForget in #9612
- [INFRA] Add path filter to CH backend code style checks workflow by @yuanhang-dev in #9652
- [GLUTEN-9647][CH] Fix SimplifySumRule on different types by @lwz9103 in #9648
- [GLUTEN-9628][CH] Fix cache data on partition table with escaped value by @lwz9103 in #9629
- Revert "[GLUTEN-9567][FLINK] Add container class
Velox4JBeanfor proper JSON-based object serialization" by @zhztheplayer in #9656 - [VL] Update dockerfile by @FelixYBW in #9644
- [GLUTEN-9660][CH]Fix incorrect columns order when deleting from the mergetree table with the delta dv by @zzcclp in #9661
- [GLUTEN-9646][CH] Fix coalesce project union when has subquery by @loneylee in #9662
- [GLUTEN-9621][FLINK] Fix rowvector may not close caused memory leak by @KevinyhZou in #9622
- [CH] Fix crash in static initialization of MergeTreeRelParser by @exmy in #9664
- [GLUTEN-9626][VL] Support lib geos in vcpkg by @PHILO-HE in #9659
- [GLUTEN-9361][VL] Enable luhn_check function for spark 3.5 by @dcoliversun in #9665
- [GLUTEN-9653][Flink] Support
booleaninvector-to-rowconversion by @lgbo-ustc in #9654 - [GLUTEN-9679][CH] Fix: invalid address access in str_to_map by @lgbo-ustc in #9680
- [GLUTEN-8616] [VL] Add support for Existence Join for broadcast nested loop join by @ArnavBalyan in #9588
- [GLUTEN-8736][VL] Add support for casting bool to timestamp by @ArnavBalyan in #8737
- [GLUTEN-9666][VL] Extend transition planner to support multiple row types by @zhztheplayer in #9689
- Change the code style format diff patch to JobSummary by @jinchengchenghh in #9668
- [GLUTEN-9573][Flink] Implement
createStreamOperatorandgetStreamOperatorClassforGlutenOneInputOperatorFactorby @lgbo-ustc in #9574 - [GLUTEN-9681][CH] Fix the number of rows read by kafka is incorrect by @loneylee in #9687
- [GLUTEN-9691][VL]Fix driver core dump when broadcast too large when UnsafeColumnarBuildSideRelation enabled by @zjuwangg in #9692
- [GLUTEN-9684][VL] Add default config.guess and config.sub by @zml1206 in #9685
- [VL] Increase the spill start partition bit to 48 to avoid overlapping with the hash bits by @zhztheplayer in #9703
- [GLUTEN-9697][CH] Add 'reorg' command ut for the mergetree + delta dv by @zzcclp in #9699
- [GLUTEN-9475][VL][FOLLOWUP] Avoid overflow in calculation of numRows by @wForget in #9704
- [GLUTEN-9535][FLINK] Refactor the conversion for
RexCallby @lgbo-ustc in #9536 - Fix the TPCH/DS script cannot throw exception by @jinchengchenghh in #9710
- [GLUTEN-9666][VL] Add BatchCarrierRow in preparation for replacement of FakeRow by @zhztheplayer in #9708
- [GLUTEN-9717][VL] Update CXX flags to run builds with defaults by @ArnavBalyan in #9718
- [VL] Fix missing tag after CollapseProjectExecTransformer rule by @li-boxuan in #9716
- [VL] Fix ioWaitTime's unit to nanos in batch scan metric by @Zouxxyy in #9715
- [CORE] Remove createTransformContext function by @zml1206 in #9714
- [GLUTEN-8855][VL] Columnar shuffle code cleanup by @marin-ma in #9693
- [GLUETN-9700][Flink] Refactor convertion of row to vector by @lgbo-ustc in #9701
- [VL] Fix VLA error when compiling with apple clang by @marin-ma in #9730
- [VL] Manually register velox conf on executor start by @Zouxxyy in #9721
- [GLUTEN-9728][CH] Fix: Inconsistent result from
JoinAggregateToAggregateUnionby @lgbo-ustc in #9729 - [FLINK] Fix error package name by @exmy in #9747
- [GLUTEN-9736][VL] Fix store ID mismatch by @zhli1142015 in #9746
- [GLUTEN-9561][CH] Upgrade delta version to 3.3.1 by @lwz9103 in #9562
- [GLUTEN-9666][CORE][VL][CH] V1 write: Move from FakeRow to BatchCarrierRow to simplify code by @zhztheplayer in #9731
- [GLUTEN-9682][Flink] Support double in vector-to-row conversion by @lgbo-ustc in #9683
- [GLUTEN-6609][CH] Remove unnecessary readFile override in GlutenDiskS3 by @baibaichen in #9649
- Add a Shim layer for PartitionedFileUtil of Spark 3.5 by @yaooqinn in #9748
- [VL] Add sqrt support by @yaooqinn in #9725
- [FOLLOWUP] Fix splitFilesByPathMethod reflection by @yaooqinn in #9763
- [GLUTEN-9764][VL] Use auto re2 source for openEuler24 by @kevinw66 in #9765
- [VL] RAS: Add internal property
MemoRoleto reduce duplications in plan enumeration for rule applications by @zhztheplayer in #9749 - [GLUTEN-9719][VL] Unify output definitions for Broadcast/Hash/Columnar shuffle joins by @ArnavBalyan in #9720
- [CH] Support deletion vector optimize for mergetree(add ut) by @loneylee in #9762
- [VL] Add mirror for ftp.gnu.org by @zml1206 in #9760
- [FLINK] Fix incorrect Flink-to-Velox conversion on varbinary literals by @zhztheplayer in #9669
- [GLUTEN-9163][VL][FOLLOWUP] Fix segfault triggered by fixed-width inputs by @marin-ma in #9766
- [GLUTEN-9774][VL] Fix AWS, GCS and etc. storge backend builds by @ashahba in #9786
- [GLUTEN-9779][VL] Remove redundant GetTimestamp by @zml1206 in #9782
- [FLINK] Add a basic CI job for flink module by @PHILO-HE in #9778
- [GLUTEN-9796][VL] Disable CI on ubuntu-20.04 by @zhouyuan in #9800
- [VL] RAS: Various fixes by @zhztheplayer in #9803
- [CORE] Test trait
WithQueryPlanListenerfor ensuring no fallback query plan node conveniently by @zhztheplayer in #9802 - [GLUTEN-9790][Flink] Support resolving different converters for Rex calls that share the same operator name by @lgbo-ustc in #9792
- [GLUTEN-9752][FLINK] Support array/map in row/vector conversion by @lgbo-ustc in #9757
- [GLUTEN-8663][VL] Fix column directory structure for partitioned writes by @dmsuehir in #9733
- [GLUTEN-9801] Delete the written file if the task failed by @JkSelf in #9808
- [GLUTEN-9769][CELEBORN] Use the correct shuffle time metrics for celeborn columnar shuffle by @wankunde in #9770
- [VL] Following #9802, fix the plan listener not working for Scala 2.12 by @zhztheplayer in #9815
- [GLUTEN-9755][FLINK] Support null by @lgbo-ustc in #9806
- [GLUTEN-9823] Exclude jsr305 from Gluten shaded jar to reduce redundancy by @wangyum in #9822
- [VL] Minor fix: simplify a few function tests' verbose names by @PHILO-HE in #9848
- [FLINK] Run unit tests in the Flink CI process by @yuanhang-dev in #9811
- Revert "[GLUTEN-9796][VL] Disable CI on ubuntu-20.04" by @zhouyuan in #9852
- [VL] Enable BatchEvalPythonExecSuite by @acvictor in #9780
- [MINOR] Rename ConfigProvider to GlutenConfigProvider by @wangyum in #9821
- [GLUTEN-9756][VL] Support jemalloc memory profile dump on exit by @wForget in #9759
- [VL] Fix link issues found in release process by @PHILO-HE in #9851
- [VL] Minor: Change sort shuffle partition threshold to 4000 by @marin-ma in #9866
- [GLUTEN-9382][VL]Support bucket write with non partition table by @JkSelf in #9575
- [GLUTEN-9873][VL] Fix install pandas by @zml1206 in #9874
- [GLUIEN-9840][Flink] Support type
CHAR(N)by @lgbo-ustc in #9842 - [VL] add rhel setup script support by @FelixYBW in #9863
- [VL] Disable requested type check by @rui-mo in #9868
- [CORE] AQE: Move
GlutenCostEvaluatorfrom shim layer togluten-coreby @zhztheplayer in #9882 - [GLUTEN-9812][Flink] Support type Date by @lgbo-ustc in #9813
- [GLUTEN-9860][VL] Sync docs to apache nightly by @zhouyuan in #9875
- Add RichSparkConf to simplify the interoperations with gluten config entries by @yaooqinn in #9876
- [GLUTEN-9878] Update LICENSE and NOTICE to list all licenses used for copied code. by @weiting-chen in #9879
- [GLUTEN-9860] followup: Fix rsync version to 5.2 by @zhouyuan in #9889
- [GLUTEN-9880][CORE] Move
GlutenConfig.scalafrom shim layer togluten-coreby @zhztheplayer in #9883 - [GLUTEN-9860] followup: Fix rsync remote path by @zhouyuan in #9899
- [Core] Minor: Port SPARK-34541 to ColumnarShuffleManager by @marin-ma in #9884
- Revert "Add RichSparkConf to simplify the interoperations with gluten config entries (#9876) by @baibaichen in #9905
- [GLUTEN-8181][VL] Add CI job running on Arm by @kevinw66 in #9793
- [GLUTEN-9836][VL] Remove ShuffleMemoryPool and support creating arrow pool instances by @marin-ma in #9869
- [VL] Add outputOrdering for VeloxBroadcastNestedLoopJoinExecTransformer by @zml1206 in #9872
- [GLUTEN-6269][VL][UNIFFLE] Remove incBytesWritten in VeloxUniffleColumnarShuffleWriter by @wForget in #9897
- [VL] Fix DPP not working for Arrow Scan by @surnaik in #9829
- [VL] Minor: Fix TestSuite run twice in the document generation script by @marin-ma in #9919
- [GLUTEN-9960][CH] Remove hardcoded path to cmake-format by @ashahba in #9907
- [CORE][VL] Fix fallback for spark literal unsafe map data as input by @Zouxxyy in #9908
- [GLUTEN-8851][VL] Link Velox cudf vector library by @jinchengchenghh in #9925
- [VL] Add trunc support by @zml1206 in #9761
- [GLUTEN-8851][VL] Fix the machine may not install dkms by @jinchengchenghh in #9917
- [GLUTEN-9860][VL] Upload nightly releases to apache nightly site by @zhouyuan in #9888
- Add a RichSparkConf to simplify interoperations gluten config entries by @yaooqinn in #9914
- [GLUTEN-9929] [VL] Add GPU compilation ci and update docker branch by @jinchengchenghh in #9933
- [GLUTEN-8851][VL] Fix cudf debug config not make effect by @jinchengchenghh in #9941
- [GLUTEN-9929][VL] Disable arrow build in cudf ci by @jinchengchenghh in #9951
- [GLUTEN-9952][VL] Python Security updates June 2025 by @ashahba in #9954
- [VL] Add GlutenExtractPythonUDFsSuite by @acvictor in #9877
- [GLUTEN-9881] Minimize module dependency set of module gluten-core by @zhztheplayer in #9900
- [MINOR] Use passed TaskContext instead of TaskContext#get by @Yohahaha in #9980
- [VL] Refine LocalPartitionWriter implementation by @marin-ma in #9982
- [GLUTEN-9950][FLINK] Make nexmark source num events configurable by @KevinyhZou in #9978
- [GLUTEN-9860][VL] enable iceberg/hudi/delta in nightly release by @zhouyuan in #9990
- [GLUTEN-9860][VL] convert markdown to html doc by @zhouyuan in #9991
- [VL] Add SupportsColumnarShuffle trait by @marin-ma in #9984
- [GLUTEN-9942] Refactor the JNI for the native ShuffleWriter and PartitionWriter by @marin-ma in #9944
- [GLUTEN-9953][VL] Add weekly jobs to verify Gluten's functionality on openEuler by @kevinw66 in #9955
- [CORE] Minor: Fix gluten-it decimal as double decimal match by @jinchengchenghh in #9926
- [VL] Fix wget automake and gcc by @zml1206 in #9992
- [GLUTEN-9995][CORE] Source folder control for different Spark versions in all Maven modules by @zhztheplayer in #9996
- [GLUTEN-9909][VL] Enable higher-order-functions.sql test by @rui-mo in #10004
- [GLUEN-9791][CH]Fix map key cannot be nullable by @taiyang-li in #9794
- [GLUTEN-9949][VL] Enable enhanced features compile by @jinchengchenghh in #9957
- [GLUTEN-9328][VL] Update GlutenSQLQueryTestSuite error handling for missing resource files and doc updates by @dmsuehir in #9969
- [VL] Sort by buffer size before hash shuffle evict partition buffers minSize by @wForget in #10009
- [GLUTEN-9994][VL] Log velox memory usage stats when VeloxMemoryManager is shrinking by @wForget in #10000
- [CORE] Minor refactors on ValidatablePlan by @zhztheplayer in #10001
- [VL] update geos version to 3.10.7 by @FelixYBW in #10017
- [VL] Fix FallbackTags for Delta ops by @zhli1142015 in #10005
- [VL] add dependency libraries to module by @FelixYBW in #9972
- [GLUTEN-10011][VL] Fix java test not enable by property skipTests by @jinchengchenghh in #10014
- [VL] Use Velox config in plan converter by @Yohahaha in #10012
- [GLUTEN-9892][FLINK] Add validation test for Nexmark q0 by @yuanhang-dev in #9893
- [VL] Refactor IndexRange usage in serializeColumnarBatches by @zhli1142015 in #10006
- Fix UnsupportedOperationException when converting EmptyHashedRelation by @yaooqinn in #10042
- [VL] Apply provided Velox PR patch for developer testing by @PHILO-HE in #10043
- [VL] Move
SupportsColumnarShuffle.scalato modulegluten-substraitby @zhztheplayer in #10045 - [GLUTEN-9291][VL] Fix Spark try_cast and cast function error case handling by @zhli1142015 in #9292
- [GLUTEN-10035][VL] Fix PullOutDuplicateProject rule when projecting a column multiple times without an alias by @NEUpanning in #10036
- [GLUTEN-9994][VL][FOLLOWUP] Use stringstream to construct velox memory usage stats log by @wForget in #10052
- [GLUTEN-8948][VL] Rename the iceberg test classes to follow surefire plugin's name pattern and fix tests by @jinchengchenghh in #9927
- [GLUTEN-8855][VL] Support dictionary in hash based shuffle by @marin-ma in #9727
- [CORE] Shim: Remove unused Spark33Scan / Spark34Scan / Spark35Scan by @zhztheplayer in #10051
- [GLUTEN-9930][VL] Update GPU document by @jinchengchenghh in #10061
- [VL] Fix missing logical links when converting query plans in rules by @li-boxuan in #10030
- [GLUTEN-9849][VL] Avoid VeloxBloomFilterMightContain being applied to FileSourceScan partition filters by @wForget in #9850
- [GLUTEN-9901][VL] RAS: DistinguishIdenticalScans rule to distinguish identical scans by @li-boxuan in #9915
- [VL] Remove one unnecessary overriding method: isNullIntolerant by @PHILO-HE in #10086
- [GLUTEN-8851][VL] GPU validate the plan and use runtime config to enable it by @jinchengchenghh in #9634
- [GLUTEN-10087][VL] PartialProject avoid ColumnarBatch not being released when exception occurs by @wForget in #10088
- [CORE] Make each RewriteSingleNode evaluates its own isRewritable by @Zouxxyy in #9935
- [GLUTEN-9939][FLINK] Support nexmark q3 by @shuai-xu in #9940
- [CORE] Fix a class's placement to match its declared package by @PHILO-HE in #10101
- [GLUTEN-9798][VL] Cleanup setup scripts for ubuntu/centos by @zhouyuan in #9799
- [GLUTEN-10094][VL] Fix unsupported OS(, ) error by @liujiayi771 in #10093
- [CORE] Add logs in spill for more Spillers by @liujiayi771 in #10117
- [GLUTEN-10050][FLINK] Support decimal type by @lgbo-ustc in #10049
- [CORE] Enable wartremover check for Scala 2.13 by @zhztheplayer in #10122
- [GLUTEN-10131][FLINK] Fix duplicate release of RowVector issue in
GlutenSingleInputOperatorby @yuanhang-dev in #10132 - [DOC] Fix VeloxStageResourceAdj config table style by @zjuwangg in #10110
- [GLUTEN-10033][FLINK] Fix memory leak caused by unclosed RowVector in
GlutenSourceFunctionby @KevinyhZou in #10034 - [GLUTEN-9949][VL] Update enhanced velox repo/branch by @zhouyuan in #10142
- [GLUTEN-9540][FLINK] Add UT for filter & project operators by @yuanhang-dev in #10073
- [Core] Support Jvm memory shrinking for DynamicOffHeapSizingMemoryTarget by @zhli1142015 in #9585
- [CORE] Refactor: Make
GlutenFormatWriterInjects#executeWriterWrappedSparkPlanreturn the wrapped query plan rather than the executed RDD by @zhztheplayer in #10147 - [VL] Enable base64 and unbase64 functions by @zhli1142015 in #9596
- [GLUTEN-10151][CELEBORN] Bump Celeborn version to 0.6.0 by @SteNicholas in #10152
- [GLUTEN-8821][VL] Weekly Update Velox function support docs (2025_07_14) by @GlutenPerfBot in #10176
- [VL] Clarify the code objective of VeloxDataSourceJniWrapper#splitBlockByPartitionAndBucket by @zhztheplayer in #10179
- [GLUTEN-10149][VL] Fix incorrect NestedLoopJoin metrics by @NEUpanning in #10169
- [VL] Spark 3.2 / Spark 3.3, V1 write: Dynamic partition write by @zhztheplayer in #10183
- [CORE] Remove the class-overriding of InsertIntoHadoopFsRelationCommand by @zhztheplayer in #10187
- [GLUTEN-10103][VL] Fall back to vanilla spark when UnresolvedException occurs in the schema validation by @JkSelf in #10138
- [GLUTEN-10181][CH] Not reuse a
HashJoinwithwhen have differentBroadCastHashJoinContextby @lgbo-ustc in #10182 - [CORE] Reflect resource changes in configurations when applying new resource profile by @PHILO-HE in #10172
- [GLUTEN-8070][CORE][VL] Add mechanism for generating configuration files and check configuration content by @yikf in #10190
- [GLUTEN-10163][VL] Optimize S3 network parameters by @weixiuli in #10167
- [GLUTEN-10112][VL] Fix deadLock caused by objectStore when native writer throws oom exception by @zjuwangg in #10111
- [GLUTEN-10155][INFRA] Fix build on macOS by @zml1206 in #10158
- [GLUTEN-10046][Flink] Improve function validation by @lgbo-ustc in #10047
- [GLUTEN-10168][VL] Sort shuffle produce wrong partition lengths in case of spill by @marin-ma in #10208
- [GLUTEN-8332][VL] Support explode/posexplode/inline outer by @marin-ma in #10202
- [CORE][VL] Minor cleanups for pom.xml by @zhztheplayer in #10217
- [GLUTEN-8821][VL] Weekly Update Velox function support docs (2025_07_21) by @GlutenPerfBot in #10225
- [GLUTEN-10091][FLINK] Fix premature termination in NexmarkTest by @yuanhang-dev in #10092
- [VL] Fix broken JAVA_HOME for gluten-it by @zhztheplayer in #10232
- [MINOR] Import config keys instead of hard-coding string values for tests in backends by @yongkyunlee in #9855
- [VL] Minor fixes for the memory API + dynamic off-heap sizing code by @zhztheplayer in #10234
- [GLUTEN-10200][VL] Fix estimating row vector size logic of rss shuffle writer by @NEUpanning in #10235
- [CH] Change the expression name timestamp_add to timestampadd by @zml1206 in #10198
- [GLUTEN-9801] Only delete the files written by the failed task when calling the abortTask() method by @JkSelf in #9844
- [GLUTEN-9335][VL] Support iceberg write unpartitioned table by @jinchengchenghh in #9397
- [DOC][Flink] Update flink build command to skip gpg and spotless check by @zjuwangg in #10205
- [GLUTEN-9540][FLINK] Add UT for join operator by @yuanhang-dev in #10180
- Bump com.fasterxml.jackson.core:jackson-core from 2.13.5 to 2.15.0 by @dependabot[bot] in #10237
- [GLUTEN-10192][VL] Fix sort shuffle read segfault in some cases by @marin-ma in #10193
- [GLUTEN-9571][VL] Respect parquet configs, parquet.page.size and parquet.compression.codec.zstd.level etc. by @WangGuangxin in #9572
- [GLUTEN-9860][VL] Add nightly packages for ARM by @zhouyuan in #10204
- [GLUTEN-9849][VL] Reenable native
might_containevaluation that was disabled in #9850 by @zhztheplayer in #10240 - [VL] update readme, use general table name by @FelixYBW in #10256
- [GLUTEN-10226][Flink] Support decimal arithmetic by @lgbo-ustc in #10218
- [CORE] Fix missing build information by @PHILO-HE in #10243
- [GLUTEN-10236][VL] Support both sort and rss_sort shuffle writer for Celeborn by @marin-ma in #10244
- [GLUTEN-10212][FLINK] Fix resources not being released in GlutenRowVectorSerializer by @shuai-xu in #10265
- [CORE] Gen config also verify config as expected by @yikf in #10259
- [GLUTEN-10048][FLINK] Add test for Nexmark Q1, Q2, Q3 by @yuanhang-dev in #10125
- [GLUTEN-10118][VL] Support writing task statistics to the event log by @marin-ma in #10119
- [MINOR] Remove unused code from SubstraitBackend by @beliefer in #10273
- [GLUTEN-10175][VL] Enable abs test by @rui-mo in #10276
- [GLUTEN-10118][VL] follow-up: fix wrong time unit by @marin-ma in #10281
- [VL] Remove variable EXTRA_FLAGS in velox_backend_x86.yml by @zhztheplayer in #10274
- [GLUTEN-10254][VL] Adding Centos-9 based dev docker image by @zhouyuan in #10231
- [GLUTEN-10279][VL] Reset reserved counter when allocationChanged fails by @wForget in #10280
- [VL] RAS: Embed internal
MemoRoleproperty intoMemoRoleAwarePropertySetby @zhztheplayer in #10288 - [VL] gluten-it: Fixes for clickbench by @zhztheplayer in #10291
- [VL] Add GlutenQueryExecutionSuite by @acvictor in #10268
- [CH] Support map_concat function by @exmy in #9841
- [VL] Support casting array element from more types to varchar by @zml1206 in #10270
- [MINOR][CORE] Simplify fillWithTransitions for InsertTransitions by @beliefer in #10297
- [GLUTEN-10298][VL] Avoid using StringUtils.isNotBlank by @beliefer in #10299
- [DOC][VL] adding back Velox parquet write configuration doc by @zhouyuan in #10282
- [GLUTEN-10002][FLINK] Support nexmark q4-q9 by @shuai-xu in #10095
- [FLINK] Add command to skip gpg in the build by @PHILO-HE in #10289
- [VL] Fix generate all configuration by @zml1206 in #10303
- [GLUTEN-10306] Package libgeos when in dynamic compile mode by @xinghuayu007 in #10307
- [MINOR] String concatenation should follows scala style by @beliefer in #10305
- [MINOR] Improve moveToWorkDir for JniLibLoader by @beliefer in #10301
- [VL] Reuse getMethodIdOrError to get jmethodID by @beliefer in #10347
- [GLUTEN-10332] Remove unnecessary constructor for PlanNode by @beliefer in #10333
- [GLUTEN-10324] Improve the failValidationWithException for ValidatablePlan by @beliefer in #10325
- [GLUTEN-10318] Improve the compareMajorMinorVersion for SparkVersionUtil by @beliefer in #10319
- [GLUTEN-10330] Correct the exception msg when making the Substrait plan by @beliefer in #10331
- [MINOR] Avoid using the fields of StructType by @beliefer in #10328
- [GLUTEN-10254][VL] fix centos9 docker image build by @zhouyuan in #10312
- [GLUTEN-9809][VL] Add timestampdiff support by @zml1206 in #9810
- [MINOR] Correct the comments for primaryBatchType by @beliefer in #10329
- [VL] Remove RAS rule: RemoveSort by @li-boxuan in #10322
- [GLUTEN-10364][VL] Fix get unset GlutenCoreConfig by @zml1206 in #10365
- [GLUTEN-10309][CORE] Improve the implementation of NativeWritePostRule by @beliefer in #10310
- [GLUTEN-10334] Share the results compare the spark version by @beliefer in #10335
- [VL] gluten-it: Print error details by @zhztheplayer in #10293
- [CORE][VL][CH] Add some comments for VeloxRuleApi / CHRuleApi by @zhztheplayer in #10278
- [VL] Support cast from array to string by @zml1206 in #10300
- [VL] Close WriteTask ColumarBatch to fix the potential leak by @boneanxs in #10304
- [GLUTEN-10379][CH] Fix: invalid result from CoalesceAggregationUnion by @lgbo-ustc in #10380
- [GLUTEN-9962][HUDI] Remove unnecessary meta field validation by @alexr17 in #9963
- [GLUTEN-10354][CORE] Just check if sort columns are a subset of partition columns by @wForget in #10355
- [GLUTEN-10383][VL] Refactor makeParquetWriteOption method from VeloxParquetDataSource into a utility file by @JkSelf in #10384
- [GLUTEN-10351] Extract immutable collection as reusable field by @beliefer in #10353
- [MINOR] Use
static constexprto follow C++ style by @beliefer in #10402 - [VL] Enable to_json function by @wecharyu in #9357
- [GLUTEN-10377][CH] Move specialized delta internal column mapping to CH backend by @zhztheplayer in #10381
- [VL] Rewrite disabled test in GlutenFileDataSourceV2FallBackSuite by @acvictor in #10323
- [GLUTEN-8852][CORE] PART0: Adding Spark400 support by @zhouyuan in #9768
- [GLUTEN-10210][CORE] Remove Reducer.java in shim layers by @zhztheplayer in #10418
- [GLUTEN-10349] Remove the unnecessary Set and use enum directly by @beliefer in #10350
- [GLUTEN-9737][VL] Pass ArrowMemoryPool into velox parquet writer by @JkSelf in #10121
- [GLUTEN-10419][CORE] Fix type compatibility for
HttpServletRequestthrough alias by @PHILO-HE in #10420 - [GLUTEN-10403] Extract the common code from ObjectStore as a new method lookup by @beliefer in #10404
- [INFRA] Update pull request template and remove title check by @PHILO-HE in #10376
- [MINOR] Remove expired config by @zml1206 in #10425
- [GLUTEN-10429]Fix spark 400 compile unused KeyGroupedShuffleSpec import by @zjuwangg in #10428
- [GLUTEN-10397][VL] Add timestampadd support by @zml1206 in #10400
- [GLUTEN-10275] Refine exception for not fully supported functions and update generate doc script by @marin-ma in #10391
- [CORE] Build: Enforce -P scala-2.13 / JDK 17 for Spark 4.0 by @zhztheplayer in #10426
- [GLUTEN-8889][CORE] Bump Spark version from 3.5.2 to 3.5.5 by @jackylee-ch in #8890
- [VL][CI] Add compilation check for Spark-4.0 by @PHILO-HE in #10439
- [VL] Enable get_array_struct_fields function by @zhli1142015 in #10313
- [GLUTEN-10454] Fix CUDF docker build by @zhouyuan in #10455
- [GLUTEN-10175][VL] Enable test 'function current_timestamp and now' by @rui-mo in #10456
- [VL] Add Spark configuration horhored status in Gluten by @FelixYBW in #10442
- [GLUTEN-10388] Reduce off-heap memory request for partial stage fallback by @xinghuayu007 in #10389
- [GLUTEN-8852][VL] Fix compilation issues in existing tests for Spark 4.0.0 by @zjuwangg in #10434
- [GLUTEN-9337][VL] Support read Paimon non-PK table by @liujiayi771 in #10186
- [GLUTEN-8852][CORE] Improve the use of Spark's classic SparkSession, Column and Dataset with implicit conversion by @PHILO-HE in #10462
- [GLUTEN-9392][VL] Support casting complex types by @kevinwilfong in #10443
- [GLUTEN-10421][CORE] Force evaluation of newProjections to prevent empty expressionMap by @wecharyu in #10422
- [GLUTEN-10356][VL] Support basic arithmetic expressions with ANSI mode by @nimesh1601 in #10357
- [GLUTEN-10452][VL] Remove HBM support by @marin-ma in #10478
- [GLUTEN-10452][VL] Remove IAA support by @marin-ma in #10480
- [GLUTEN-10473] Change TypeNode from interface to abstract class by @beliefer in #10474
- [GLUTEN-8953][VL] Fix iceberg codec uncompressed by @jinchengchenghh in #10477
- [GLUTEN-10482] Improve
collectAttributeNamesDFSto avoid repeated check calls by @beliefer in #10483 - [GLUTEN-10470] Remove redundant getPartitions by @beliefer in #10471
- [GLUEN-10107] Introduce NeedCustomColumnarBatchSerializer trait to make columnarBatchSerializerClass custom by rss implementation by @zjuwangg in #10201
- [GLUTEN-10484] Improve getTypeNode for StructType by @beliefer in #10485
- [GLUTEN-10317][FLINK] Support state related operation by @shuai-xu in #10320
- [GLUTEN-8852][TEST] Support Spark 4.0 in gluten-it by @PHILO-HE in #10479
- [GLUTEN-8852] Fix code compatibility issues for two tests on Spark-4.0 by @PHILO-HE in #10487
- [GLUTEN-9335][VL] Supports collect iceberg statistics by @jinchengchenghh in #10495
- [GLUTEN-10457][VL] Iceberg supports copy on write by @jinchengchenghh in #10458
- [GLUTEN-10170][VL] Offload
tryarithmetic functions regardless of ANSI configuration by @nimesh1601 in #10267 - [GLUTEN-10491][VL] Simplify the extractStructNeeded for HashAggregateExecTransformer by @beliefer in #10492
- [HotFix] Fix iceberg PRs conflict by @jinchengchenghh in #10498
- [GLUTEN-10352][FLINK] Support nexmark test from q4 to q9 by @KevinyhZou in #10468
- [GLUTEN-10493][VL] Simplify addFunctionNode for HashAggregateExecTransformer by @beliefer in #10494
- [GLUTEN-9335][VL] Enable tests ignored because lack metadata and enable spark35 CI by @jinchengchenghh in #10496
- [GLUTEN-10506] Include compression time in the shuffle write metric for Uniffle by @zuston in #10507
- [VL] Minor: Fix cpp test compilation failure on macos by @marin-ma in #10504
- [GLUTEN-10392][VL] Fix filter fallback in scan-only execution by @rui-mo in #10505
- [GLUTEN-8851][VL] Set kCudfEnabledDefault to true if Gluten GPU is enabled by @jinchengchenghh in #10509
- [GLUTEN-9004][DOC] Document Partial Projection feature by @zhouyuan in #10135
- [GLUTEN-9344][VL] Document dynamic offheap sizing feature by @zhouyuan in #9391
- [GLUTEN-9402][VL] Fix tzdata compatibility by @zhouyuan in #10481
- [FLINK] Decouple Flink module POM from Gluten root POM as its parent by @PHILO-HE in #10515
- [GLUTEN-10574][Branch-1.5] Back port important fixes to branch-1.5 by @zhouyuan in #10576
- [GLUTEN-10574][CORE] Backport some key fixes to branch 1.5 by @PHILO-HE in #10722
- [GLUTEN-10574][VL][1.5] Backport #10541 to fix broadcast exchange stackoverflow due to Kryo serialization by @wForget in #10733
- [GLUTEN-10574][1.5] Backport #10793 and #10807 to update documents and automate release process by @PHILO-HE in #10827
- [GLUTEN-10574][1.5] Bump version to 1.5.0 to finalize release by @PHILO-HE in #10829
New Contributors
- @wypb made their first contribution in #8775
- @dmsuehir made their first contribution in #9368
- @zmdaodao made their first contribution in #8982
- @xushiyan made their first contribution in #9419
- @li-boxuan made their first contribution in #9506
- @yuanhang-dev made their first contribution in #9652
- @ashahba made their first contribution in #9786
- @wankunde made their first contribution in #9770
- @yongkyunlee made their first contribution in #9834
- @alexr17 made their first contribution in #9963
Full Changelog: v1.4.0...v1.5.0