Skip to content

v1.5.0

Latest

Choose a tag to compare

@PHILO-HE PHILO-HE released this 29 Oct 07:30
· 610 commits to main since this release
a269698

What's Changed

  • [GLUTEN-8846][CH] [Part 3] Add benchmark for Icerberg Delete by @baibaichen in #9192
  • [GLUTEN-9020][CH] Support delta DV BitmapAggregator by @loneylee in #9138
  • [GLUTEN-9197][CH] Simplify sum aggregate expression by @taiyang-li in #9198
  • [VL] Enable more ut in VeloxTestSettings by @WangGuangxin in #9080
  • [GLUTEN-9199][VL] Fix error when creating shuffle file: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments by @zhztheplayer in #9200
  • [CORE] Fix duplicate setting for config LEGACY_TIME_PARSER_POLICY by @jinchengchenghh in #9201
  • [GLUTEN-9176][CH] Rewrite aggregate if to aggregate with filter clause by @taiyang-li in #9185
  • [GLUTEN-8557][CH] Flatten nested And/Or for performance optimization by @KevinyhZou in #8558
  • Revert "[GLUTEN-9164][CH]Enable row group level bloom filter push down" by @taiyang-li in #9214
  • [GLUTEN-9182][VL] Support new s3 configuration in Gluten by @dcoliversun in #9183
  • [VL] Celeborn shuffle reader OOM with many empty input stream by @marin-ma in #9221
  • [GLUTEN-8821][VL] Update aggregate/generator/window support doc and script by @marin-ma in #8971
  • [VL] Change to use Velox's wget_and_untar in setup-centos7.sh by @yaooqinn in #9207
  • [GLUTEN-9196][CH] Use wide-table aggregation to eliminate multi-table joins by @lgbo-ustc in #9155
  • [GLUTEN-9149][CORE] Remove Spark-specific code from JniLibLoader & JniWorkspace by @shuai-xu in #9150
  • [VL][CI] Change to use JDK-17 for Spark 3.3/3.4/3.5 tests by @PHILO-HE in #9209
  • [CORE][VL] Hide child nodes from implementations of OffloadSingleNode by @zhztheplayer in #9220
  • [GLUTEN-9008][VL] Support json_object_keys function by @dcoliversun in #9009
  • [GLUTEN-9239][CH] Support JDK17 for the CH backend by @zzcclp in #9242
  • [GLUTEN-9152][CORE] Avoid unnecessary serialization of hadoop conf by @zml1206 in #9153
  • [GLUTEN-9240][VL] Write NULL value into relation in gluten unit tests by @dcoliversun in #9241
  • [VL][CI] bump to use ubuntu-22.04 runner by @zhouyuan in #9262
  • [GLUTEN-9177][CH]Fix diff on parse host of url and refactor SparkParseURL by @KevinyhZou in #9179
  • [CORE] Decrease offheap memory size in resource profile for whole stage fallback case by @PHILO-HE in #8911
  • [GLUTEN-9205][CH] Support deletion vector native write by @loneylee in #9248
  • [VL] Delete global reference to a class object in JNI unload by @PHILO-HE in #9268
  • [GLUTEN-9245][VL] Fix partial project expression contains subquery by @jinchengchenghh in #9259
  • [GLUTEN-9244][CORE] Change the way of passing default timezone to native config by @zml1206 in #9249
  • [GLUTEN-8497][VL] Fix columnar batch type mismatch in table cache by @zhztheplayer in #9230
  • [VL] Support Spark legacy statistical aggregation function behavior by @NEUpanning in #9181
  • [CORE] Remove library unloading API from JniLibLoader as unused by @zhztheplayer in #9277
  • [GLUTEN-9237][CH] Fix the nullability missmatch issue for the Nothing type by @lgbo-ustc in #9238
  • [VL] Disable FlushableHashAggreagte when aggregates contains sum/avg for floating type by @kecookier in #8986
  • [CORE] Refine the test with specified spark version by @yikf in #9274
  • [CH] Add a comment to explain why the endpoint uses a single thread by @dcoliversun in #9257
  • [GLUTEN-8891][VL] Refine local ssd cache feature by @zhouyuan in #9228
  • [GLUTEN-9267][CH] Fix a bug in EliminateDeduplicateAggregateWithAnyJoin by @lgbo-ustc in #9293
  • [VL] Remove param original of ColumnarPartialProjectExec by @zml1206 in #9290
  • [GLUTEN-9178][CH] Fix cse in aggregate operator not working by @loneylee in #9301
  • [CORE] Post events until both spark ui and gluten ui are enable by @yikf in #9272
  • [CORE] Correctly handle driver configurations when spark.sql.extensions is explicitly set for GlutenSessionExtensions by @zhztheplayer in #9312
  • [GLUTEN-8851][VL] feat: Support cudf by @jinchengchenghh in #9229
  • [GLUTEN-9288][VL] Enable array_prepend function for spark 3.5+ by @dcoliversun in #9305
  • [GLUTEN-9317][CH]Fix: duplicated column names in shuffle read by @lgbo-ustc in #9318
  • [Gluten-9254][CH] Support RDDScanExec by @loneylee in #9270
  • [VL] Count total JVM memory as the on-heap portion for the off-heap sizing feature by @zhztheplayer in #9321
  • [GLUTEN-9300][DOC] Support replacement expression in gen-function-support-docs by @dcoliversun in #9331
  • [GLUTEN-9239][CH] [PART-1] Support Java-17 Rmove JNI_OnUnload by @baibaichen in #9275
  • [GLUTEN-7652][VL] Support binary as string by @wForget in #9325
  • [Gluten-9334][CH] Support delta metadata column file_path and row_index for mergetree by @loneylee in #9340
  • [GLUTEN-6867][CH] Fix Bug that cann't read file on minio by @baibaichen in #9332
  • [VL] Provide a configuration option to completely turn off off-heap memory tracking with Spark memory manager by @zhztheplayer in #9341
  • [GLUTEN-9313][VL] ColumnarPartialProject supports built-in but blacklisted function by @WangGuangxin in #9315
  • [GLUTEN-8772][CORE] refactor: Refactoring the usage of SubstraitContext#functionMap by @wypb in #8775
  • [VL] Move pre-configuration code of dynamic off-heap sizing to its own place by @zhztheplayer in #9336
  • [GLUTEN-9163][VL] Use stream de/compressor in sort-based shuffle by @marin-ma in #9278
  • [GLUTEN-9287][VL] Enable array_compact function for Spark 3.4+ by @dcoliversun in #9349
  • [GLUTEN-9095][UT] Remove Vanilla Spark InternalRow based checkEvaluation by @ArnavBalyan in #9096
  • [CORE] Make max broadcast table size configurable by @yaooqinn in #9359
  • [CH] Fix build error by @exmy in #9363
  • [GLUTEN-9243][VL] Fix cuda docker image by @zhouyuan in #9333
  • [GLUTEN-8912][VL] Add Offset support for CollectLimitExec by @ArnavBalyan in #8914
  • [GLUTEN-7589][VL] Support date_trunc function by @zml1206 in #7611
  • [GLUTEN-9279] Not pulling out expression from PartialMerge aggregate function to avoid invalid reference binding in ProjectExecTransformer by @Z1Wu in #9280
  • [Gluten-8792][CH] Support delta project incrementMetric expr by @loneylee in #9353
  • [GLUTEN-9034][VL] Add VeloxResizeBatchesExec for Shuffle by @WangGuangxin in #9035
  • Fix ColumnarToRowRemovalGuard not able to be copied by @yaooqinn in #9384
  • [GLUTEN-8846][CH] [Part 4] Add full-chain UT by @jlfsdtc in #9256
  • [VL] Follow up on #9384 to avoid swallowing exceptions in UT by @zhztheplayer in #9393
  • [GLUTEN-9163][VL] Separate compression buffer and disk write buffer configuration by @marin-ma in #9356
  • [VL][INFRA] Improve build bundle package workflow by @wForget in #9404
  • [VL] Refactor WholeStageTransformer to remove some duplicate code by @wypb in #9388
  • [VL] Refactor the HiveConfig to set once by @jinchengchenghh in #9414
  • [GLUTEN-8981][VL] Add vcpkg triplet for building Gluten on arm by @zmdaodao in #8982
  • [GLUTEN-9137][CH] Support CollectLimit for CH backend by @exmy in #9139
  • [CH] Support native base85codec by @loneylee in #9421
  • [VL] Fix VeloxColumnarToRowExec repeatedly recorded some metrics by @wypb in #9418
  • [VL] The PoC of Flink support in Gluten by @shuai-xu in #8839
  • [VL] Fallback pandas udf when input is not an instance of AttributeReference by @Surbhi-Vijay in #9385
  • [CH] Change enable_aggregate_if_to_filter to fasle by @exmy in #9420
  • [VL][CI] Upgrade ubuntu runner to fix weekly build error by @PHILO-HE in #9443
  • [VL] Add a config to enable or disable check usage leak by @j7nhai in #9327
  • [GLUTEN-9383][VL] Fix leak when growing capacity by @wForget in #9424
  • [VL] Fix dump benchmark data issue in same task by @Yohahaha in #9360
  • [GLUTEN-9392] [VL] Support casting array element to varchar by @ArnavBalyan in #9394
  • [VL] Remove CollectLimit dependency from Offload Rules by @ArnavBalyan in #9451
  • [GLUTEN-9025] Remove the ColumnarPartialProject when its followers don't support columnar by @weixiuli in #9026
  • [GLUTEN-8744][VL] Add casting support for timestamp to long by @ArnavBalyan in #8745
  • [GLUTEN-9462][CH] Build ARM V8 by default by @lwz9103 in #9463
  • [VL] Fix Hudi scan fallback by @xushiyan in #9419
  • [GLUTEN-9243][VL] Improve cuda image build by @zhouyuan in #9449
  • [GLUTEN-9457][VL] Shuffle test code cleanup by @marin-ma in #9458
  • [GLUTEN-9468][VL] Remove parquet-arrow reader dependency in benchmarks by @marin-ma in #9469
  • [GLUTEN-8851][VL] Use separate debug config for cudf by @jinchengchenghh in #9466
  • Minor: The user specified --extra-conf has high priority by @jinchengchenghh in #9488
  • [GLUTEN-9073][VL] Add support for CollectTail Operator by @ArnavBalyan in #9074
  • [GLUTEN-9486][VL] Fix example launch.json config in NewToGluten.md documentation by @dmsuehir in #9487
  • [VL] Support ObjectDebugInfo in ObjectStore during destruction of native runtime by @ArnavBalyan in #9477
  • [GLUTEN-9481][ICEBERG] Fix issue where iceberg columns where not properly being sanitized. by @z123 in #9482
  • [VL] Support decimal as double in gluten-it by @jinchengchenghh in #9472
  • [VL] Gluten-it: Minor refactor on configuration priorities by @zhztheplayer in #9489
  • [GLUTEN-9468][VL] Remove parquet-arrow dependency by @marin-ma in #9483
  • [GLUTEN-9496][VL] Enable concat function with array datatype for spark by @dcoliversun in #9497
  • [Doc] Update outdated operators in the documentation by @ArnavBalyan in #9474
  • [GLUTEN-9502][VL] Remove useless datatype check for concat function by @dcoliversun in #9503
  • [GLUTEN-9453][CH][PART-1] Refactor GlutenClickHouseTPCHAbstractSuite by @baibaichen in #9454
  • [GLUTEN-9392][VL] Support casting integral types to double for array elements by @ArnavBalyan in #9396
  • [GLUTEN-8222][VL] Support Factorial function by @ArnavBalyan in #8221
  • [CORE] RasOffload: Fix false positive discard by @li-boxuan in #9506
  • [GLUTEN-9526][CH] Add cmake option to create symbol instead of rename by @baibaichen in #9527
  • [GLUTEN-9517][CH] Allow merge tree format don't configure disk cache by @baibaichen in #9520
  • [GLUTEN-9445][VL] Support nexmark q0 for Flink by @shuai-xu in #9446
  • [VL] Support user fallback option for CollectTail by @ArnavBalyan in #9531
  • [GLUTEN-9448][Flink] Support UTs for flink by @lgbo-ustc in #9534
  • [GLUTEN-9435][VL] Fix [Unsafe]ColumnarBuildSideRelation not found ResourceHandle in resource map by @wypb in #9438
  • [GLUTEN-9468][VL] Follow up: Remove parquet-arrow cpp build by @marin-ma in #9511
  • [VL] Refactor RowToVeloxColumnarExec to remove some duplicate code by @wypb in #9350
  • [CORE] Support batch scan customMetrics by @Zouxxyy in #9450
  • [GLUTEN-9492][CH] Add Support for CollectTail operator by @ArnavBalyan in #9476
  • [GLUTEN-9522][VL] Move date & math scalar functions into new test suites by @dcoliversun in #9533
  • [GLUTEN-8877] Port [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning by @wangyum in #9495
  • [GLUTEN-9557][VL] Remove outdated test exclusion logic by @dcoliversun in #9558
  • [GLUTEN-9551][VL] Fix build failure due to libelf vcpkg unavailable files by @ArnavBalyan in #9550
  • [VL] Remove unused HdfsUtils by @wForget in #9578
  • [FLINK] Bump Velox4J version by @zhztheplayer in #9545
  • [GLUTEN-9475][VL] Serialize ColumnarBatch one by one to reduce memory footprint when broadcasting by @wForget in #9521
  • [VL] Support format cpp code on macOS by @wForget in #9584
  • [GLUTEN-9392][VL] Support integral to boolean array casts by @ArnavBalyan in #9563
  • [GLUTEN-9581][CH] Fix: in subquery expression cannot exist in project by @lgbo-ustc in #9583
  • [GLUTEN-9518][FLINK] Support watermark assigner by @shuai-xu in #9541
  • [CORE] Remove unnecessary internal from user facing configs by @ArnavBalyan in #9564
  • [TYPO] Fix arg used for java version in build info by @yaooqinn in #9602
  • [GLUTEN-9594][CH] Reorder join/aggregate keys, ensure keys of all clauses are in the same order by @lgbo-ustc in #9595
  • [GLUTEN-9606][CH]Support CH MergeTree + Delta DeletionVector reading by @zzcclp in #9609
  • [GLUTEN-9567][FLINK] Add container class Velox4JBean for proper JSON-based object serialization by @zhztheplayer in #9600
  • [GLUTEN-9623][FLINK] Fix blocking queue not close may cause memory leak by @KevinyhZou in #9624
  • [GLUTEN-9625][VL] Add explicit GCS auth type check for service account key file by @pratham76 in #9627
  • [GLUTEN-9631][CH] Fix: AggregateDescription.parameters is not set in AggregateGroupLimitRelParser by @lgbo-ustc in #9632
  • [GLUTEN-9611][CORE] Improve performance of PayloadCloser Iterator by @wForget in #9612
  • [INFRA] Add path filter to CH backend code style checks workflow by @yuanhang-dev in #9652
  • [GLUTEN-9647][CH] Fix SimplifySumRule on different types by @lwz9103 in #9648
  • [GLUTEN-9628][CH] Fix cache data on partition table with escaped value by @lwz9103 in #9629
  • Revert "[GLUTEN-9567][FLINK] Add container class Velox4JBean for proper JSON-based object serialization" by @zhztheplayer in #9656
  • [VL] Update dockerfile by @FelixYBW in #9644
  • [GLUTEN-9660][CH]Fix incorrect columns order when deleting from the mergetree table with the delta dv by @zzcclp in #9661
  • [GLUTEN-9646][CH] Fix coalesce project union when has subquery by @loneylee in #9662
  • [GLUTEN-9621][FLINK] Fix rowvector may not close caused memory leak by @KevinyhZou in #9622
  • [CH] Fix crash in static initialization of MergeTreeRelParser by @exmy in #9664
  • [GLUTEN-9626][VL] Support lib geos in vcpkg by @PHILO-HE in #9659
  • [GLUTEN-9361][VL] Enable luhn_check function for spark 3.5 by @dcoliversun in #9665
  • [GLUTEN-9653][Flink] Support boolean in vector-to-row conversion by @lgbo-ustc in #9654
  • [GLUTEN-9679][CH] Fix: invalid address access in str_to_map by @lgbo-ustc in #9680
  • [GLUTEN-8616] [VL] Add support for Existence Join for broadcast nested loop join by @ArnavBalyan in #9588
  • [GLUTEN-8736][VL] Add support for casting bool to timestamp by @ArnavBalyan in #8737
  • [GLUTEN-9666][VL] Extend transition planner to support multiple row types by @zhztheplayer in #9689
  • Change the code style format diff patch to JobSummary by @jinchengchenghh in #9668
  • [GLUTEN-9573][Flink] Implement createStreamOperator and getStreamOperatorClass for GlutenOneInputOperatorFactor by @lgbo-ustc in #9574
  • [GLUTEN-9681][CH] Fix the number of rows read by kafka is incorrect by @loneylee in #9687
  • [GLUTEN-9691][VL]Fix driver core dump when broadcast too large when UnsafeColumnarBuildSideRelation enabled by @zjuwangg in #9692
  • [GLUTEN-9684][VL] Add default config.guess and config.sub by @zml1206 in #9685
  • [VL] Increase the spill start partition bit to 48 to avoid overlapping with the hash bits by @zhztheplayer in #9703
  • [GLUTEN-9697][CH] Add 'reorg' command ut for the mergetree + delta dv by @zzcclp in #9699
  • [GLUTEN-9475][VL][FOLLOWUP] Avoid overflow in calculation of numRows by @wForget in #9704
  • [GLUTEN-9535][FLINK] Refactor the conversion for RexCall by @lgbo-ustc in #9536
  • Fix the TPCH/DS script cannot throw exception by @jinchengchenghh in #9710
  • [GLUTEN-9666][VL] Add BatchCarrierRow in preparation for replacement of FakeRow by @zhztheplayer in #9708
  • [GLUTEN-9717][VL] Update CXX flags to run builds with defaults by @ArnavBalyan in #9718
  • [VL] Fix missing tag after CollapseProjectExecTransformer rule by @li-boxuan in #9716
  • [VL] Fix ioWaitTime's unit to nanos in batch scan metric by @Zouxxyy in #9715
  • [CORE] Remove createTransformContext function by @zml1206 in #9714
  • [GLUTEN-8855][VL] Columnar shuffle code cleanup by @marin-ma in #9693
  • [GLUETN-9700][Flink] Refactor convertion of row to vector by @lgbo-ustc in #9701
  • [VL] Fix VLA error when compiling with apple clang by @marin-ma in #9730
  • [VL] Manually register velox conf on executor start by @Zouxxyy in #9721
  • [GLUTEN-9728][CH] Fix: Inconsistent result from JoinAggregateToAggregateUnion by @lgbo-ustc in #9729
  • [FLINK] Fix error package name by @exmy in #9747
  • [GLUTEN-9736][VL] Fix store ID mismatch by @zhli1142015 in #9746
  • [GLUTEN-9561][CH] Upgrade delta version to 3.3.1 by @lwz9103 in #9562
  • [GLUTEN-9666][CORE][VL][CH] V1 write: Move from FakeRow to BatchCarrierRow to simplify code by @zhztheplayer in #9731
  • [GLUTEN-9682][Flink] Support double in vector-to-row conversion by @lgbo-ustc in #9683
  • [GLUTEN-6609][CH] Remove unnecessary readFile override in GlutenDiskS3 by @baibaichen in #9649
  • Add a Shim layer for PartitionedFileUtil of Spark 3.5 by @yaooqinn in #9748
  • [VL] Add sqrt support by @yaooqinn in #9725
  • [FOLLOWUP] Fix splitFilesByPathMethod reflection by @yaooqinn in #9763
  • [GLUTEN-9764][VL] Use auto re2 source for openEuler24 by @kevinw66 in #9765
  • [VL] RAS: Add internal property MemoRole to reduce duplications in plan enumeration for rule applications by @zhztheplayer in #9749
  • [GLUTEN-9719][VL] Unify output definitions for Broadcast/Hash/Columnar shuffle joins by @ArnavBalyan in #9720
  • [CH] Support deletion vector optimize for mergetree(add ut) by @loneylee in #9762
  • [VL] Add mirror for ftp.gnu.org by @zml1206 in #9760
  • [FLINK] Fix incorrect Flink-to-Velox conversion on varbinary literals by @zhztheplayer in #9669
  • [GLUTEN-9163][VL][FOLLOWUP] Fix segfault triggered by fixed-width inputs by @marin-ma in #9766
  • [GLUTEN-9774][VL] Fix AWS, GCS and etc. storge backend builds by @ashahba in #9786
  • [GLUTEN-9779][VL] Remove redundant GetTimestamp by @zml1206 in #9782
  • [FLINK] Add a basic CI job for flink module by @PHILO-HE in #9778
  • [GLUTEN-9796][VL] Disable CI on ubuntu-20.04 by @zhouyuan in #9800
  • [VL] RAS: Various fixes by @zhztheplayer in #9803
  • [CORE] Test trait WithQueryPlanListener for ensuring no fallback query plan node conveniently by @zhztheplayer in #9802
  • [GLUTEN-9790][Flink] Support resolving different converters for Rex calls that share the same operator name by @lgbo-ustc in #9792
  • [GLUTEN-9752][FLINK] Support array/map in row/vector conversion by @lgbo-ustc in #9757
  • [GLUTEN-8663][VL] Fix column directory structure for partitioned writes by @dmsuehir in #9733
  • [GLUTEN-9801] Delete the written file if the task failed by @JkSelf in #9808
  • [GLUTEN-9769][CELEBORN] Use the correct shuffle time metrics for celeborn columnar shuffle by @wankunde in #9770
  • [VL] Following #9802, fix the plan listener not working for Scala 2.12 by @zhztheplayer in #9815
  • [GLUTEN-9755][FLINK] Support null by @lgbo-ustc in #9806
  • [GLUTEN-9823] Exclude jsr305 from Gluten shaded jar to reduce redundancy by @wangyum in #9822
  • [VL] Minor fix: simplify a few function tests' verbose names by @PHILO-HE in #9848
  • [FLINK] Run unit tests in the Flink CI process by @yuanhang-dev in #9811
  • Revert "[GLUTEN-9796][VL] Disable CI on ubuntu-20.04" by @zhouyuan in #9852
  • [VL] Enable BatchEvalPythonExecSuite by @acvictor in #9780
  • [MINOR] Rename ConfigProvider to GlutenConfigProvider by @wangyum in #9821
  • [GLUTEN-9756][VL] Support jemalloc memory profile dump on exit by @wForget in #9759
  • [VL] Fix link issues found in release process by @PHILO-HE in #9851
  • [VL] Minor: Change sort shuffle partition threshold to 4000 by @marin-ma in #9866
  • [GLUTEN-9382][VL]Support bucket write with non partition table by @JkSelf in #9575
  • [GLUTEN-9873][VL] Fix install pandas by @zml1206 in #9874
  • [GLUIEN-9840][Flink] Support type CHAR(N) by @lgbo-ustc in #9842
  • [VL] add rhel setup script support by @FelixYBW in #9863
  • [VL] Disable requested type check by @rui-mo in #9868
  • [CORE] AQE: Move GlutenCostEvaluator from shim layer to gluten-core by @zhztheplayer in #9882
  • [GLUTEN-9812][Flink] Support type Date by @lgbo-ustc in #9813
  • [GLUTEN-9860][VL] Sync docs to apache nightly by @zhouyuan in #9875
  • Add RichSparkConf to simplify the interoperations with gluten config entries by @yaooqinn in #9876
  • [GLUTEN-9878] Update LICENSE and NOTICE to list all licenses used for copied code. by @weiting-chen in #9879
  • [GLUTEN-9860] followup: Fix rsync version to 5.2 by @zhouyuan in #9889
  • [GLUTEN-9880][CORE] Move GlutenConfig.scala from shim layer to gluten-core by @zhztheplayer in #9883
  • [GLUTEN-9860] followup: Fix rsync remote path by @zhouyuan in #9899
  • [Core] Minor: Port SPARK-34541 to ColumnarShuffleManager by @marin-ma in #9884
  • Revert "Add RichSparkConf to simplify the interoperations with gluten config entries (#9876) by @baibaichen in #9905
  • [GLUTEN-8181][VL] Add CI job running on Arm by @kevinw66 in #9793
  • [GLUTEN-9836][VL] Remove ShuffleMemoryPool and support creating arrow pool instances by @marin-ma in #9869
  • [VL] Add outputOrdering for VeloxBroadcastNestedLoopJoinExecTransformer by @zml1206 in #9872
  • [GLUTEN-6269][VL][UNIFFLE] Remove incBytesWritten in VeloxUniffleColumnarShuffleWriter by @wForget in #9897
  • [VL] Fix DPP not working for Arrow Scan by @surnaik in #9829
  • [VL] Minor: Fix TestSuite run twice in the document generation script by @marin-ma in #9919
  • [GLUTEN-9960][CH] Remove hardcoded path to cmake-format by @ashahba in #9907
  • [CORE][VL] Fix fallback for spark literal unsafe map data as input by @Zouxxyy in #9908
  • [GLUTEN-8851][VL] Link Velox cudf vector library by @jinchengchenghh in #9925
  • [VL] Add trunc support by @zml1206 in #9761
  • [GLUTEN-8851][VL] Fix the machine may not install dkms by @jinchengchenghh in #9917
  • [GLUTEN-9860][VL] Upload nightly releases to apache nightly site by @zhouyuan in #9888
  • Add a RichSparkConf to simplify interoperations gluten config entries by @yaooqinn in #9914
  • [GLUTEN-9929] [VL] Add GPU compilation ci and update docker branch by @jinchengchenghh in #9933
  • [GLUTEN-8851][VL] Fix cudf debug config not make effect by @jinchengchenghh in #9941
  • [GLUTEN-9929][VL] Disable arrow build in cudf ci by @jinchengchenghh in #9951
  • [GLUTEN-9952][VL] Python Security updates June 2025 by @ashahba in #9954
  • [VL] Add GlutenExtractPythonUDFsSuite by @acvictor in #9877
  • [GLUTEN-9881] Minimize module dependency set of module gluten-core by @zhztheplayer in #9900
  • [MINOR] Use passed TaskContext instead of TaskContext#get by @Yohahaha in #9980
  • [VL] Refine LocalPartitionWriter implementation by @marin-ma in #9982
  • [GLUTEN-9950][FLINK] Make nexmark source num events configurable by @KevinyhZou in #9978
  • [GLUTEN-9860][VL] enable iceberg/hudi/delta in nightly release by @zhouyuan in #9990
  • [GLUTEN-9860][VL] convert markdown to html doc by @zhouyuan in #9991
  • [VL] Add SupportsColumnarShuffle trait by @marin-ma in #9984
  • [GLUTEN-9942] Refactor the JNI for the native ShuffleWriter and PartitionWriter by @marin-ma in #9944
  • [GLUTEN-9953][VL] Add weekly jobs to verify Gluten's functionality on openEuler by @kevinw66 in #9955
  • [CORE] Minor: Fix gluten-it decimal as double decimal match by @jinchengchenghh in #9926
  • [VL] Fix wget automake and gcc by @zml1206 in #9992
  • [GLUTEN-9995][CORE] Source folder control for different Spark versions in all Maven modules by @zhztheplayer in #9996
  • [GLUTEN-9909][VL] Enable higher-order-functions.sql test by @rui-mo in #10004
  • [GLUEN-9791][CH]Fix map key cannot be nullable by @taiyang-li in #9794
  • [GLUTEN-9949][VL] Enable enhanced features compile by @jinchengchenghh in #9957
  • [GLUTEN-9328][VL] Update GlutenSQLQueryTestSuite error handling for missing resource files and doc updates by @dmsuehir in #9969
  • [VL] Sort by buffer size before hash shuffle evict partition buffers minSize by @wForget in #10009
  • [GLUTEN-9994][VL] Log velox memory usage stats when VeloxMemoryManager is shrinking by @wForget in #10000
  • [CORE] Minor refactors on ValidatablePlan by @zhztheplayer in #10001
  • [VL] update geos version to 3.10.7 by @FelixYBW in #10017
  • [VL] Fix FallbackTags for Delta ops by @zhli1142015 in #10005
  • [VL] add dependency libraries to module by @FelixYBW in #9972
  • [GLUTEN-10011][VL] Fix java test not enable by property skipTests by @jinchengchenghh in #10014
  • [VL] Use Velox config in plan converter by @Yohahaha in #10012
  • [GLUTEN-9892][FLINK] Add validation test for Nexmark q0 by @yuanhang-dev in #9893
  • [VL] Refactor IndexRange usage in serializeColumnarBatches by @zhli1142015 in #10006
  • Fix UnsupportedOperationException when converting EmptyHashedRelation by @yaooqinn in #10042
  • [VL] Apply provided Velox PR patch for developer testing by @PHILO-HE in #10043
  • [VL] Move SupportsColumnarShuffle.scala to module gluten-substrait by @zhztheplayer in #10045
  • [GLUTEN-9291][VL] Fix Spark try_cast and cast function error case handling by @zhli1142015 in #9292
  • [GLUTEN-10035][VL] Fix PullOutDuplicateProject rule when projecting a column multiple times without an alias by @NEUpanning in #10036
  • [GLUTEN-9994][VL][FOLLOWUP] Use stringstream to construct velox memory usage stats log by @wForget in #10052
  • [GLUTEN-8948][VL] Rename the iceberg test classes to follow surefire plugin's name pattern and fix tests by @jinchengchenghh in #9927
  • [GLUTEN-8855][VL] Support dictionary in hash based shuffle by @marin-ma in #9727
  • [CORE] Shim: Remove unused Spark33Scan / Spark34Scan / Spark35Scan by @zhztheplayer in #10051
  • [GLUTEN-9930][VL] Update GPU document by @jinchengchenghh in #10061
  • [VL] Fix missing logical links when converting query plans in rules by @li-boxuan in #10030
  • [GLUTEN-9849][VL] Avoid VeloxBloomFilterMightContain being applied to FileSourceScan partition filters by @wForget in #9850
  • [GLUTEN-9901][VL] RAS: DistinguishIdenticalScans rule to distinguish identical scans by @li-boxuan in #9915
  • [VL] Remove one unnecessary overriding method: isNullIntolerant by @PHILO-HE in #10086
  • [GLUTEN-8851][VL] GPU validate the plan and use runtime config to enable it by @jinchengchenghh in #9634
  • [GLUTEN-10087][VL] PartialProject avoid ColumnarBatch not being released when exception occurs by @wForget in #10088
  • [CORE] Make each RewriteSingleNode evaluates its own isRewritable by @Zouxxyy in #9935
  • [GLUTEN-9939][FLINK] Support nexmark q3 by @shuai-xu in #9940
  • [CORE] Fix a class's placement to match its declared package by @PHILO-HE in #10101
  • [GLUTEN-9798][VL] Cleanup setup scripts for ubuntu/centos by @zhouyuan in #9799
  • [GLUTEN-10094][VL] Fix unsupported OS(, ) error by @liujiayi771 in #10093
  • [CORE] Add logs in spill for more Spillers by @liujiayi771 in #10117
  • [GLUTEN-10050][FLINK] Support decimal type by @lgbo-ustc in #10049
  • [CORE] Enable wartremover check for Scala 2.13 by @zhztheplayer in #10122
  • [GLUTEN-10131][FLINK] Fix duplicate release of RowVector issue in GlutenSingleInputOperator by @yuanhang-dev in #10132
  • [DOC] Fix VeloxStageResourceAdj config table style by @zjuwangg in #10110
  • [GLUTEN-10033][FLINK] Fix memory leak caused by unclosed RowVector in GlutenSourceFunction by @KevinyhZou in #10034
  • [GLUTEN-9949][VL] Update enhanced velox repo/branch by @zhouyuan in #10142
  • [GLUTEN-9540][FLINK] Add UT for filter & project operators by @yuanhang-dev in #10073
  • [Core] Support Jvm memory shrinking for DynamicOffHeapSizingMemoryTarget by @zhli1142015 in #9585
  • [CORE] Refactor: Make GlutenFormatWriterInjects#executeWriterWrappedSparkPlan return the wrapped query plan rather than the executed RDD by @zhztheplayer in #10147
  • [VL] Enable base64 and unbase64 functions by @zhli1142015 in #9596
  • [GLUTEN-10151][CELEBORN] Bump Celeborn version to 0.6.0 by @SteNicholas in #10152
  • [GLUTEN-8821][VL] Weekly Update Velox function support docs (2025_07_14) by @GlutenPerfBot in #10176
  • [VL] Clarify the code objective of VeloxDataSourceJniWrapper#splitBlockByPartitionAndBucket by @zhztheplayer in #10179
  • [GLUTEN-10149][VL] Fix incorrect NestedLoopJoin metrics by @NEUpanning in #10169
  • [VL] Spark 3.2 / Spark 3.3, V1 write: Dynamic partition write by @zhztheplayer in #10183
  • [CORE] Remove the class-overriding of InsertIntoHadoopFsRelationCommand by @zhztheplayer in #10187
  • [GLUTEN-10103][VL] Fall back to vanilla spark when UnresolvedException occurs in the schema validation by @JkSelf in #10138
  • [GLUTEN-10181][CH] Not reuse a HashJoinwith when have different BroadCastHashJoinContext by @lgbo-ustc in #10182
  • [CORE] Reflect resource changes in configurations when applying new resource profile by @PHILO-HE in #10172
  • [GLUTEN-8070][CORE][VL] Add mechanism for generating configuration files and check configuration content by @yikf in #10190
  • [GLUTEN-10163][VL] Optimize S3 network parameters by @weixiuli in #10167
  • [GLUTEN-10112][VL] Fix deadLock caused by objectStore when native writer throws oom exception by @zjuwangg in #10111
  • [GLUTEN-10155][INFRA] Fix build on macOS by @zml1206 in #10158
  • [GLUTEN-10046][Flink] Improve function validation by @lgbo-ustc in #10047
  • [GLUTEN-10168][VL] Sort shuffle produce wrong partition lengths in case of spill by @marin-ma in #10208
  • [GLUTEN-8332][VL] Support explode/posexplode/inline outer by @marin-ma in #10202
  • [CORE][VL] Minor cleanups for pom.xml by @zhztheplayer in #10217
  • [GLUTEN-8821][VL] Weekly Update Velox function support docs (2025_07_21) by @GlutenPerfBot in #10225
  • [GLUTEN-10091][FLINK] Fix premature termination in NexmarkTest by @yuanhang-dev in #10092
  • [VL] Fix broken JAVA_HOME for gluten-it by @zhztheplayer in #10232
  • [MINOR] Import config keys instead of hard-coding string values for tests in backends by @yongkyunlee in #9855
  • [VL] Minor fixes for the memory API + dynamic off-heap sizing code by @zhztheplayer in #10234
  • [GLUTEN-10200][VL] Fix estimating row vector size logic of rss shuffle writer by @NEUpanning in #10235
  • [CH] Change the expression name timestamp_add to timestampadd by @zml1206 in #10198
  • [GLUTEN-9801] Only delete the files written by the failed task when calling the abortTask() method by @JkSelf in #9844
  • [GLUTEN-9335][VL] Support iceberg write unpartitioned table by @jinchengchenghh in #9397
  • [DOC][Flink] Update flink build command to skip gpg and spotless check by @zjuwangg in #10205
  • [GLUTEN-9540][FLINK] Add UT for join operator by @yuanhang-dev in #10180
  • Bump com.fasterxml.jackson.core:jackson-core from 2.13.5 to 2.15.0 by @dependabot[bot] in #10237
  • [GLUTEN-10192][VL] Fix sort shuffle read segfault in some cases by @marin-ma in #10193
  • [GLUTEN-9571][VL] Respect parquet configs, parquet.page.size and parquet.compression.codec.zstd.level etc. by @WangGuangxin in #9572
  • [GLUTEN-9860][VL] Add nightly packages for ARM by @zhouyuan in #10204
  • [GLUTEN-9849][VL] Reenable native might_contain evaluation that was disabled in #9850 by @zhztheplayer in #10240
  • [VL] update readme, use general table name by @FelixYBW in #10256
  • [GLUTEN-10226][Flink] Support decimal arithmetic by @lgbo-ustc in #10218
  • [CORE] Fix missing build information by @PHILO-HE in #10243
  • [GLUTEN-10236][VL] Support both sort and rss_sort shuffle writer for Celeborn by @marin-ma in #10244
  • [GLUTEN-10212][FLINK] Fix resources not being released in GlutenRowVectorSerializer by @shuai-xu in #10265
  • [CORE] Gen config also verify config as expected by @yikf in #10259
  • [GLUTEN-10048][FLINK] Add test for Nexmark Q1, Q2, Q3 by @yuanhang-dev in #10125
  • [GLUTEN-10118][VL] Support writing task statistics to the event log by @marin-ma in #10119
  • [MINOR] Remove unused code from SubstraitBackend by @beliefer in #10273
  • [GLUTEN-10175][VL] Enable abs test by @rui-mo in #10276
  • [GLUTEN-10118][VL] follow-up: fix wrong time unit by @marin-ma in #10281
  • [VL] Remove variable EXTRA_FLAGS in velox_backend_x86.yml by @zhztheplayer in #10274
  • [GLUTEN-10254][VL] Adding Centos-9 based dev docker image by @zhouyuan in #10231
  • [GLUTEN-10279][VL] Reset reserved counter when allocationChanged fails by @wForget in #10280
  • [VL] RAS: Embed internal MemoRole property into MemoRoleAwarePropertySet by @zhztheplayer in #10288
  • [VL] gluten-it: Fixes for clickbench by @zhztheplayer in #10291
  • [VL] Add GlutenQueryExecutionSuite by @acvictor in #10268
  • [CH] Support map_concat function by @exmy in #9841
  • [VL] Support casting array element from more types to varchar by @zml1206 in #10270
  • [MINOR][CORE] Simplify fillWithTransitions for InsertTransitions by @beliefer in #10297
  • [GLUTEN-10298][VL] Avoid using StringUtils.isNotBlank by @beliefer in #10299
  • [DOC][VL] adding back Velox parquet write configuration doc by @zhouyuan in #10282
  • [GLUTEN-10002][FLINK] Support nexmark q4-q9 by @shuai-xu in #10095
  • [FLINK] Add command to skip gpg in the build by @PHILO-HE in #10289
  • [VL] Fix generate all configuration by @zml1206 in #10303
  • [GLUTEN-10306] Package libgeos when in dynamic compile mode by @xinghuayu007 in #10307
  • [MINOR] String concatenation should follows scala style by @beliefer in #10305
  • [MINOR] Improve moveToWorkDir for JniLibLoader by @beliefer in #10301
  • [VL] Reuse getMethodIdOrError to get jmethodID by @beliefer in #10347
  • [GLUTEN-10332] Remove unnecessary constructor for PlanNode by @beliefer in #10333
  • [GLUTEN-10324] Improve the failValidationWithException for ValidatablePlan by @beliefer in #10325
  • [GLUTEN-10318] Improve the compareMajorMinorVersion for SparkVersionUtil by @beliefer in #10319
  • [GLUTEN-10330] Correct the exception msg when making the Substrait plan by @beliefer in #10331
  • [MINOR] Avoid using the fields of StructType by @beliefer in #10328
  • [GLUTEN-10254][VL] fix centos9 docker image build by @zhouyuan in #10312
  • [GLUTEN-9809][VL] Add timestampdiff support by @zml1206 in #9810
  • [MINOR] Correct the comments for primaryBatchType by @beliefer in #10329
  • [VL] Remove RAS rule: RemoveSort by @li-boxuan in #10322
  • [GLUTEN-10364][VL] Fix get unset GlutenCoreConfig by @zml1206 in #10365
  • [GLUTEN-10309][CORE] Improve the implementation of NativeWritePostRule by @beliefer in #10310
  • [GLUTEN-10334] Share the results compare the spark version by @beliefer in #10335
  • [VL] gluten-it: Print error details by @zhztheplayer in #10293
  • [CORE][VL][CH] Add some comments for VeloxRuleApi / CHRuleApi by @zhztheplayer in #10278
  • [VL] Support cast from array to string by @zml1206 in #10300
  • [VL] Close WriteTask ColumarBatch to fix the potential leak by @boneanxs in #10304
  • [GLUTEN-10379][CH] Fix: invalid result from CoalesceAggregationUnion by @lgbo-ustc in #10380
  • [GLUTEN-9962][HUDI] Remove unnecessary meta field validation by @alexr17 in #9963
  • [GLUTEN-10354][CORE] Just check if sort columns are a subset of partition columns by @wForget in #10355
  • [GLUTEN-10383][VL] Refactor makeParquetWriteOption method from VeloxParquetDataSource into a utility file by @JkSelf in #10384
  • [GLUTEN-10351] Extract immutable collection as reusable field by @beliefer in #10353
  • [MINOR] Use static constexpr to follow C++ style by @beliefer in #10402
  • [VL] Enable to_json function by @wecharyu in #9357
  • [GLUTEN-10377][CH] Move specialized delta internal column mapping to CH backend by @zhztheplayer in #10381
  • [VL] Rewrite disabled test in GlutenFileDataSourceV2FallBackSuite by @acvictor in #10323
  • [GLUTEN-8852][CORE] PART0: Adding Spark400 support by @zhouyuan in #9768
  • [GLUTEN-10210][CORE] Remove Reducer.java in shim layers by @zhztheplayer in #10418
  • [GLUTEN-10349] Remove the unnecessary Set and use enum directly by @beliefer in #10350
  • [GLUTEN-9737][VL] Pass ArrowMemoryPool into velox parquet writer by @JkSelf in #10121
  • [GLUTEN-10419][CORE] Fix type compatibility for HttpServletRequest through alias by @PHILO-HE in #10420
  • [GLUTEN-10403] Extract the common code from ObjectStore as a new method lookup by @beliefer in #10404
  • [INFRA] Update pull request template and remove title check by @PHILO-HE in #10376
  • [MINOR] Remove expired config by @zml1206 in #10425
  • [GLUTEN-10429]Fix spark 400 compile unused KeyGroupedShuffleSpec import by @zjuwangg in #10428
  • [GLUTEN-10397][VL] Add timestampadd support by @zml1206 in #10400
  • [GLUTEN-10275] Refine exception for not fully supported functions and update generate doc script by @marin-ma in #10391
  • [CORE] Build: Enforce -P scala-2.13 / JDK 17 for Spark 4.0 by @zhztheplayer in #10426
  • [GLUTEN-8889][CORE] Bump Spark version from 3.5.2 to 3.5.5 by @jackylee-ch in #8890
  • [VL][CI] Add compilation check for Spark-4.0 by @PHILO-HE in #10439
  • [VL] Enable get_array_struct_fields function by @zhli1142015 in #10313
  • [GLUTEN-10454] Fix CUDF docker build by @zhouyuan in #10455
  • [GLUTEN-10175][VL] Enable test 'function current_timestamp and now' by @rui-mo in #10456
  • [VL] Add Spark configuration horhored status in Gluten by @FelixYBW in #10442
  • [GLUTEN-10388] Reduce off-heap memory request for partial stage fallback by @xinghuayu007 in #10389
  • [GLUTEN-8852][VL] Fix compilation issues in existing tests for Spark 4.0.0 by @zjuwangg in #10434
  • [GLUTEN-9337][VL] Support read Paimon non-PK table by @liujiayi771 in #10186
  • [GLUTEN-8852][CORE] Improve the use of Spark's classic SparkSession, Column and Dataset with implicit conversion by @PHILO-HE in #10462
  • [GLUTEN-9392][VL] Support casting complex types by @kevinwilfong in #10443
  • [GLUTEN-10421][CORE] Force evaluation of newProjections to prevent empty expressionMap by @wecharyu in #10422
  • [GLUTEN-10356][VL] Support basic arithmetic expressions with ANSI mode by @nimesh1601 in #10357
  • [GLUTEN-10452][VL] Remove HBM support by @marin-ma in #10478
  • [GLUTEN-10452][VL] Remove IAA support by @marin-ma in #10480
  • [GLUTEN-10473] Change TypeNode from interface to abstract class by @beliefer in #10474
  • [GLUTEN-8953][VL] Fix iceberg codec uncompressed by @jinchengchenghh in #10477
  • [GLUTEN-10482] Improve collectAttributeNamesDFS to avoid repeated check calls by @beliefer in #10483
  • [GLUTEN-10470] Remove redundant getPartitions by @beliefer in #10471
  • [GLUEN-10107] Introduce NeedCustomColumnarBatchSerializer trait to make columnarBatchSerializerClass custom by rss implementation by @zjuwangg in #10201
  • [GLUTEN-10484] Improve getTypeNode for StructType by @beliefer in #10485
  • [GLUTEN-10317][FLINK] Support state related operation by @shuai-xu in #10320
  • [GLUTEN-8852][TEST] Support Spark 4.0 in gluten-it by @PHILO-HE in #10479
  • [GLUTEN-8852] Fix code compatibility issues for two tests on Spark-4.0 by @PHILO-HE in #10487
  • [GLUTEN-9335][VL] Supports collect iceberg statistics by @jinchengchenghh in #10495
  • [GLUTEN-10457][VL] Iceberg supports copy on write by @jinchengchenghh in #10458
  • [GLUTEN-10170][VL] Offload try arithmetic functions regardless of ANSI configuration by @nimesh1601 in #10267
  • [GLUTEN-10491][VL] Simplify the extractStructNeeded for HashAggregateExecTransformer by @beliefer in #10492
  • [HotFix] Fix iceberg PRs conflict by @jinchengchenghh in #10498
  • [GLUTEN-10352][FLINK] Support nexmark test from q4 to q9 by @KevinyhZou in #10468
  • [GLUTEN-10493][VL] Simplify addFunctionNode for HashAggregateExecTransformer by @beliefer in #10494
  • [GLUTEN-9335][VL] Enable tests ignored because lack metadata and enable spark35 CI by @jinchengchenghh in #10496
  • [GLUTEN-10506] Include compression time in the shuffle write metric for Uniffle by @zuston in #10507
  • [VL] Minor: Fix cpp test compilation failure on macos by @marin-ma in #10504
  • [GLUTEN-10392][VL] Fix filter fallback in scan-only execution by @rui-mo in #10505
  • [GLUTEN-8851][VL] Set kCudfEnabledDefault to true if Gluten GPU is enabled by @jinchengchenghh in #10509
  • [GLUTEN-9004][DOC] Document Partial Projection feature by @zhouyuan in #10135
  • [GLUTEN-9344][VL] Document dynamic offheap sizing feature by @zhouyuan in #9391
  • [GLUTEN-9402][VL] Fix tzdata compatibility by @zhouyuan in #10481
  • [FLINK] Decouple Flink module POM from Gluten root POM as its parent by @PHILO-HE in #10515
  • [GLUTEN-10574][Branch-1.5] Back port important fixes to branch-1.5 by @zhouyuan in #10576
  • [GLUTEN-10574][CORE] Backport some key fixes to branch 1.5 by @PHILO-HE in #10722
  • [GLUTEN-10574][VL][1.5] Backport #10541 to fix broadcast exchange stackoverflow due to Kryo serialization by @wForget in #10733
  • [GLUTEN-10574][1.5] Backport #10793 and #10807 to update documents and automate release process by @PHILO-HE in #10827
  • [GLUTEN-10574][1.5] Bump version to 1.5.0 to finalize release by @PHILO-HE in #10829

New Contributors

Full Changelog: v1.4.0...v1.5.0