You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Changed hive-exec dep from core to shaded jar, so kryo dependency doesn't conflict with dependency for Redis.
* Added docs for supporting Glue writes
Copy file name to clipboardExpand all lines: README.md
+30-2Lines changed: 30 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -280,9 +280,9 @@ The GlueConfig configuration should be used if federation to Glue is needed.
280
280
glue-account-id: 1234566789012
281
281
glue-endpoint: glue.us-east-1.amazonaws.com
282
282
283
-
As with Hive federation, the IAM permissions need to be setup to read underlying data. IAM permissions are not setup by this code, but are usually setup by the Terraform code that deploys WaggleDance, such as (apiary-federation)[https://github.com/ExpediaGroup/apiary-federation].
283
+
As with Hive federation, the IAM permissions need to be setup to read underlying data. IAM permissions are not setup by this code, but are usually setup by the Terraform code that deploys WaggleDance, such as [apiary-federation](https://github.com/ExpediaGroup/apiary-federation).
284
284
285
-
If federating across AWS accounts, the correct (cross account federation permissions)[https://docs.aws.amazon.com/glue/latest/dg/cross-account-access.html] needs to be setup as well.
285
+
If federating across AWS accounts, the correct [cross account federation permissions](https://docs.aws.amazon.com/glue/latest/dg/cross-account-access.html) needs to be setup as well.
286
286
The policy giving access to the role running Waggle Dance will need at least these IAM Glue actions:
287
287
288
288
actions = [
@@ -298,6 +298,34 @@ The policy giving access to the role running Waggle Dance will need at least the
298
298
"glue:GetUserDefinedFunctions"
299
299
]
300
300
301
+
##### Federate to AWS Glue Catalog for writes
302
+
303
+
Writes to Glue are supported as best effort via the same glue library as is used for reads.
304
+
This maps the most common thrift methods to representative Glue calls. Not everything is support as Glue doesn't support the full thrift stacks as it's not HMS. In some cases it will be better to use direct glue access through EMR or connecting to HMS directly.
305
+
306
+
Basic stuff is tested and works: creating tables, adding partitions, dropping tables, alter tables.
307
+
308
+
Deployment changes to support this functionality:
309
+
* Expand your Glue policy to allow for create/update operations, search AWS documentation for most up to date list.
310
+
* Add an S3 policy that allows for object reading and creating. Similar permissions as HMS would have. This is needed because upon table creation, the table location path will be created in S3. Note that this would normally happen in HMS and will now happen in Waggle Dance.
311
+
312
+
Add for instance in the waggle-dance-server.yml configuration to support s3 FileSystem and set the correct credentials provider for example:
Iceberg tables are also supported but iceberg out of the box comes with HMS locking and does calls that are not support by Glue. To workaround disable file locking in your client (See also: [Iceberg docs](https://iceberg.apache.org/docs/latest/configuration/#hadoop-configuration)):
Each federation in Waggle Dance can be configured to use a SSH tunnel to access a remote Hive metastore in cases where certain network restrictions prevent a direct connection from the machine running Waggle Dance to the machine running the Thrift Hive metastore service. A SSH tunnel consists of one or more hops or jump-boxes. The connection between each pair of nodes requires a user - which if not specified defaults to the current user - and a private key to establish the SSH connection.
0 commit comments