Skip to content

Commit bf4ea3a

Browse files
authored
Feature/glue write support (#355)
* Changed hive-exec dep from core to shaded jar, so kryo dependency doesn't conflict with dependency for Redis. * Added docs for supporting Glue writes
1 parent 0afcbee commit bf4ea3a

File tree

3 files changed

+37
-7
lines changed

3 files changed

+37
-7
lines changed

CHANGELOG.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11

2-
## [4.1.2] - TBD
3-
### Changed
2+
## [4.1.2] - 2025-11-03
3+
### Changed
4+
- `hive-exec` dependency to use shaded jar to avoid Kryo conflicts.
5+
- Added section about Glue write support via Waggle Dance.
46
- Lazy loading database mappping create to avoid doing work when tcp connections are being opened (e.g. from LoadBalancers).
57
- Removed unnecessary logging in invocation logsfor setConf (it's a local call not a federated call).
68

README.md

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -280,9 +280,9 @@ The GlueConfig configuration should be used if federation to Glue is needed.
280280
glue-account-id: 1234566789012
281281
glue-endpoint: glue.us-east-1.amazonaws.com
282282

283-
As with Hive federation, the IAM permissions need to be setup to read underlying data. IAM permissions are not setup by this code, but are usually setup by the Terraform code that deploys WaggleDance, such as (apiary-federation)[https://github.com/ExpediaGroup/apiary-federation].
283+
As with Hive federation, the IAM permissions need to be setup to read underlying data. IAM permissions are not setup by this code, but are usually setup by the Terraform code that deploys WaggleDance, such as [apiary-federation](https://github.com/ExpediaGroup/apiary-federation).
284284

285-
If federating across AWS accounts, the correct (cross account federation permissions)[https://docs.aws.amazon.com/glue/latest/dg/cross-account-access.html] needs to be setup as well.
285+
If federating across AWS accounts, the correct [cross account federation permissions](https://docs.aws.amazon.com/glue/latest/dg/cross-account-access.html) needs to be setup as well.
286286
The policy giving access to the role running Waggle Dance will need at least these IAM Glue actions:
287287

288288
actions = [
@@ -298,6 +298,34 @@ The policy giving access to the role running Waggle Dance will need at least the
298298
"glue:GetUserDefinedFunctions"
299299
]
300300

301+
##### Federate to AWS Glue Catalog for writes
302+
303+
Writes to Glue are supported as best effort via the same glue library as is used for reads.
304+
This maps the most common thrift methods to representative Glue calls. Not everything is support as Glue doesn't support the full thrift stacks as it's not HMS. In some cases it will be better to use direct glue access through EMR or connecting to HMS directly.
305+
306+
Basic stuff is tested and works: creating tables, adding partitions, dropping tables, alter tables.
307+
308+
Deployment changes to support this functionality:
309+
* Expand your Glue policy to allow for create/update operations, search AWS documentation for most up to date list.
310+
* Add an S3 policy that allows for object reading and creating. Similar permissions as HMS would have. This is needed because upon table creation, the table location path will be created in S3. Note that this would normally happen in HMS and will now happen in Waggle Dance.
311+
312+
Add for instance in the waggle-dance-server.yml configuration to support s3 FileSystem and set the correct credentials provider for example:
313+
314+
```
315+
configuration-properties:
316+
fs.s3.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
317+
fs.s3n.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
318+
fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
319+
fs.s3a.aws.credentials.provider: com.amazonaws.auth.DefaultAWSCredentialsProviderChain
320+
```
321+
322+
Note:
323+
Iceberg tables are also supported but iceberg out of the box comes with HMS locking and does calls that are not support by Glue. To workaround disable file locking in your client (See also: [Iceberg docs](https://iceberg.apache.org/docs/latest/configuration/#hadoop-configuration)):
324+
325+
```
326+
--conf spark.hadoop.iceberg.engine.hive.lock-enabled=false
327+
```
328+
301329
#### Configuring a SSH tunnel
302330

303331
Each federation in Waggle Dance can be configured to use a SSH tunnel to access a remote Hive metastore in cases where certain network restrictions prevent a direct connection from the machine running Waggle Dance to the machine running the Thrift Hive metastore service. A SSH tunnel consists of one or more hops or jump-boxes. The connection between each pair of nodes requires a user - which if not specified defaults to the current user - and a private key to establish the SSH connection.

waggle-dance-core/pom.xml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -146,10 +146,10 @@
146146
<artifactId>hive-standalone-metastore</artifactId>
147147
</dependency>
148148
<dependency>
149+
<!-- using hive-exec shaded jar otherwise we get kryo conflicts with kryo dependency we need for Redis. -->
149150
<groupId>org.apache.hive</groupId>
150151
<artifactId>hive-exec</artifactId>
151152
<version>${hive.version}</version>
152-
<classifier>core</classifier>
153153
<exclusions>
154154
<exclusion>
155155
<groupId>log4j</groupId>
@@ -266,14 +266,14 @@
266266
<artifactId>kryo</artifactId>
267267
<version>5.5.0</version>
268268
</dependency>
269-
<!-- END Glue dependency -->
269+
<!-- END Glue dependency -->
270270
<dependency>
271271
<groupId>org.apache.hadoop</groupId>
272272
<artifactId>hadoop-aws</artifactId>
273273
<version>${hadoop.version}</version>
274274
<exclusions>
275275
<exclusion>
276-
<!-- best to exclude as it comes bundled with jackon dependencies and we get conflicts. -->
276+
<!-- best to exclude as it comes bundled with jackon dependencies and we get conflicts. -->
277277
<groupId>com.amazonaws</groupId>
278278
<artifactId>aws-java-sdk-bundle</artifactId>
279279
</exclusion>

0 commit comments

Comments
 (0)