Issue Description:

After successfully installing Superset, running the command superset run -p 8088 -h 0.0.0.0 to start the application, and attempting to configure the Hive database connection through the interface, an error occurs. The server logs display the following:

1
2
3
Unexpected error TTransportException
WARNING: superset.views.core: Unexpected error TTransportException
INFO: werkzeug:10.10.17.34 - - [11/Jan/2022 12:07:37] "POST /superset/testconn HTTP/1.1" 400 -

Upon seeing the TTransportException, the initial assumption is that some required dependencies (either Python or system packages) might be missing. However, the logs do not provide specific error details, making it challenging to identify the missing package or component.

Possible Causes

Based on online searches, several potential causes for the issue include:

  • Missing dependencies such as:
    • sasl
    • thrift
    • thrift-sasl
    • pythrifthiveapi
    • pure-sasl
    • …and others.
  • A mismatch in the Hive driver versions (e.g., pyhive, pyhive[presto], pyhive==0.6.0).
  • Incorrectly configured Hive connection parameters, such as those for Kerberos authentication.

However, no clear indications of which specific dependency is missing or which parameters are incorrectly configured are available.

Solution

While we could test each dependency by installing the suggested packages and specifying the correct versions one by one, this process would be time-consuming and inefficient. Instead, we can take a more direct approach to troubleshoot the issue.

Since Superset uses pyhive to connect to the Hive database, we can test the connection manually using a Python script. Below is a sample script that can be used to verify the connection:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import json
from pyhive import hive
import pandas as pd

# Connect to the Hive database
conn = hive.connect(
host="10.10.17.61",
port=10000, # Server address
# auth="KERBEROS",
# kerberos_service_name="sql_prc", # Authentication method
configuration={
'mapreduce.map.memory.mb':'4096',
'mapreduce.reduce.memory.mb':'4096',
'mapreduce.map.java.opts':'-Xmx3072m',
'mapreduce.reduce.java.opts':'-Xmx3072m',
'hive.input.format':'org.apache.hadoop.hive.ql.io.HiveInputFormat',
'hive.limit.optimize.enable':'false',
'mapreduce.job.queuename':'root.default' # Queue
}
)

# Query data
sql = "SHOW PARTITIONS dw.dm_user_info"
df = pd.read_sql(sql, conn)
print(df)

Executing the script:

Save the script as test.py and run it:

1
python3 test.py

Error Observed:

The following error might occur:

1
2
3
4
5
6
7
8
9
10
Traceback (most recent call last):
File "test.py", line 15, in <module>
'mapreduce.job.queuename':'root.production.miot_group.test.data' # Queue
File "/home/hadoop/.local/lib/python3.6/site-packages/pyhive/hive.py", line 104, in connect
return Connection(*args, **kwargs)
File "/home/hadoop/.local/lib/python3.6/site-packages/pyhive/hive.py", line 243, in __init__
self._transport.open()
File "/home/hadoop/.local/lib/python3.6/site-packages/thrift_sasl/__init__.py", line 85, in open
message=("Could not start SASL: %s" % self.sasl.getError()))
thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found'

Root Cause:

The issue is caused by missing system dependencies related to SASL authentication. The specific error is:

1
thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found'

This indicates that the following system packages are required but not installed:

  • cyrus-sasl-plain
  • cyrus-sasl-devel
  • cyrus-sasl-gssapi

Solution

To resolve the issue, install the missing packages using yum:

1
yum install cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl-gssapi

After installing these dependencies, rerun the Python script:

1
python3 test.py

Successful Output:

You should see the following output, indicating that the connection to Hive is successful:

1
2
3
4
5
6
7
8
9
    partition
0 dt=20211203
1 dt=20211226
2 dt=20211227
3 dt=20211228
4 dt=20211229
5 dt=20211230
6 dt=20211231
7 dt=20220101

Final Step:

Now, go back to the Superset interface, reconfigure the Hive database connection, and click the “TEST CONNECTION” button. If successful, you will see the message: Seems OK!, confirming that the connection to Hive is established.

Hive Connection String:

1
hive://hive@10.10.17.61:10000/mydw