Resolving the "TTransportException" Error in Superset Hive Database Connection
Issue Description:
After successfully installing Superset, running the command superset run -p 8088 -h 0.0.0.0
to start the application, and attempting to configure the Hive database connection through the interface, an error occurs. The server logs display the following:
1 | Unexpected error TTransportException |
Upon seeing the TTransportException
, the initial assumption is that some required dependencies (either Python or system packages) might be missing. However, the logs do not provide specific error details, making it challenging to identify the missing package or component.
Possible Causes
Based on online searches, several potential causes for the issue include:
- Missing dependencies such as:
sasl
thrift
thrift-sasl
pythrifthiveapi
pure-sasl
- …and others.
- A mismatch in the Hive driver versions (e.g.,
pyhive
,pyhive[presto]
,pyhive==0.6.0
). - Incorrectly configured Hive connection parameters, such as those for Kerberos authentication.
However, no clear indications of which specific dependency is missing or which parameters are incorrectly configured are available.
Solution
While we could test each dependency by installing the suggested packages and specifying the correct versions one by one, this process would be time-consuming and inefficient. Instead, we can take a more direct approach to troubleshoot the issue.
Since Superset uses pyhive
to connect to the Hive database, we can test the connection manually using a Python script. Below is a sample script that can be used to verify the connection:
1 | import json |
Executing the script:
Save the script as test.py
and run it:
1 | python3 test.py |
Error Observed:
The following error might occur:
1 | Traceback (most recent call last): |
Root Cause:
The issue is caused by missing system dependencies related to SASL authentication. The specific error is:
1 | thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found' |
This indicates that the following system packages are required but not installed:
cyrus-sasl-plain
cyrus-sasl-devel
cyrus-sasl-gssapi
Solution
To resolve the issue, install the missing packages using yum
:
1 | yum install cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl-gssapi |
After installing these dependencies, rerun the Python script:
1 | python3 test.py |
Successful Output:
You should see the following output, indicating that the connection to Hive is successful:
1 | partition |
Final Step:
Now, go back to the Superset interface, reconfigure the Hive database connection, and click the “TEST CONNECTION” button. If successful, you will see the message: Seems OK!
, confirming that the connection to Hive is established.
Hive Connection String:
1 | hive://hive@10.10.17.61:10000/mydw |