HIVE-29439: Upgrade slf4j-api to 2.0.13#6295
HIVE-29439: Upgrade slf4j-api to 2.0.13#6295humashankar26 wants to merge 4 commits intoapache:masterfrom
Conversation
|
|
What changes were proposed in this pull request? To ensure this didn't break Hive's complex logging and metrics reporting, I performed a full "bridge and provider" realignment across the following key areas: Version Force: Updated slf4j-api to 2.0.13 in the root pom.xml, standalone-metastore/pom.xml, and storage-api/pom.xml. Provider Migration: Swapped the old slf4j-reload4j (1.7.36) for version 2.0.13 to support the Java ServiceLoader mechanism required by SLF4J 2.x. Bridge Realignment: Updated jcl-over-slf4j to 2.0.13 and introduced log4j-slf4j2-impl (v2.24.3) in the root and service modules. This was necessary to allow Hive's Log4j2-based metrics and timing systems to communicate with the new SLF4J 2.0 API. Why are the changes needed? Security: It moves Hive away from the aging 1.7.x line, resolving several security scanner flags and moving toward the more secure ServiceLoader/SPI architecture. Reliability: It fixes a series of "silent" failures where metrics and timing events were being dropped because SLF4J 2.0 couldn't find a valid 1.7.x binder. This ensures that Atlas, Ranger, and internal Metastore timers report data accurately. Does this PR introduce any user-facing change? How was this patch tested? Dependency Tree Audit: Verified via mvn dependency:tree that all 56 modules have converged on version 2.0.13 and that no "zombie" 1.7.x bindings remain. Fixing Metric Regressions: Resolved failures in TestAtlasLoadTask, TestRangerDumpTask, and TestMetrics where metrics were returning 0 due to NOP-logger defaults. Timing & Operations: Verified TestHiveRemote and TestOperationLogManager to ensure that query-specific logs and API timing strings are correctly captured. Local Build: Ran a full mvn clean install -DskipTests to confirm that all modules (especially ql and service) compile correctly with the new SLF4J 2.x bridges. |



What changes were proposed in this pull request?
This PR upgrades the slf4j-api dependency from 1.7.30 to 2.0.13.
Because Hive is a large multi-module project, I’ve applied the update in the following key areas to ensure the new version is forced everywhere:
The root pom.xml (global version management).
standalone-metastore/pom.xml (to ensure the metastore doesn't pull in older versions independently).
storage-api/pom.xml (to align the storage layer with the new API).
I also verified that transitive dependencies (like those coming from Hadoop or ORC) are being correctly "managed" (overridden) by this new version.
Why are the changes needed?
The primary goal is to address CVE-2022-2047 and other related security vulnerabilities found in older versions of the SLF4J library.
Beyond security, version 2.0.13 fixes a long-standing bug (SLF4J Issue 409) where logs would sometimes report incorrect line numbers or class names. Upgrading also moves Hive toward the modern Java ServiceLoader mechanism for logging, which is more stable than the old static binder approach used in the 1.7.x line.
Does this PR introduce any user-facing change?
No. This is a backend dependency update. Users shouldn't notice any change in behavior, though developers may notice more accurate source-location reporting in the logs during debugging.
How was this patch tested?
Since this is a dependency change, I focused on build integrity and dependency convergence:
Dependency Tree Audit: Ran mvn dependency:tree -Dincludes='org.slf4j:*' across the entire project. Verified that all modules (including the tricky ones like hive-exec and metastore-server) are now resolving to version 2.0.13.
Local Build: Successfully ran a clean build (mvn clean install -DskipTests) to ensure no compilation errors were introduced by the 2.x API changes.
Security Scan: Verified the fix using the OWASP Dependency-Check tool. The report confirmed that slf4j-api is no longer flagged for known vulnerabilities.