Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an optional connection_timeout parameter to PyHive's Hive connection class to prevent connections from hanging indefinitely when issues occur. The timeout is specified in milliseconds and applies to both HTTP and TSocket transport types.
Changes:
- Added
connection_timeoutparameter to theConnection.__init__method signature and documentation - Implemented timeout setting for both HTTP (THttpClient) and TSocket transport paths
- Added integration test to verify connections work with the timeout parameter
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| python/pyhive/hive.py | Added connection_timeout parameter to Connection class, implemented setTimeout calls for both HTTP and socket transports, and added parameter documentation |
| python/pyhive/tests/test_hive.py | Added integration test to verify connection works with timeout parameter |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7300 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 698 698
Lines 43646 43646
Branches 5894 5894
======================================
Misses 43646 43646 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| auth = 'NONE' | ||
| socket = thrift.transport.TSocket.TSocket(host, port) | ||
| if connection_timeout: | ||
| socket.setTimeout(connection_timeout) |
There was a problem hiding this comment.
FYI this sets both connection and socket timeout. There's a nice description of the difference here.
| socket.setTimeout(connection_timeout) | |
| socket.setConnectTimeout(connection_timeout) |
Why are the changes needed?
Adds an optional parameter to pyhive hive connection class init to allow setting the socket connection timeout. By default, thrift socket connections have no timeout which causes connections to hang indefinitely when something goes wrong. This helps us in dbt by allowing for multi-threaded management of connections while continuing to rely on pyhive thrift setup.
Reference thrift lines:
https://github.com/apache/thrift/blob/0d18fb2e97a00fc56997fa059d6819e54cdff64b/lib/py/src/transport/THttpClient.py#L130
https://github.com/apache/thrift/blob/0d18fb2e97a00fc56997fa059d6819e54cdff64b/lib/py/src/transport/TSocket.py#L111
How was this patch tested?
Manually on TSocket path and added unit test
Was this patch authored or co-authored using generative AI tooling?
No