Error 'Server not found in Kerberos database' with Centrify Configuration

0 votes
0 views
asked Aug 30, 2017 in Hadoop by admin (4,410 points)
Summary

Symptoms

In a kerberised cluster, the error: "Server not found in Kerberos database" may appear for various services (CDH or not) e.g. Hue, Namenode, Cloudera Manager Agent etc.

Examples of the errors are mentioned below.

1. NN cannot be started. The error in NN log shows:

Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:710)
...
Caused by: KrbException: Server not found in Kerberos database (7)
at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:70)
at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:192)
...
Caused by: KrbException: Identifier doesn't match expected value (906)
at sun.security.krb5.internal.KDCRep.init(KDCRep.java:143)
at sun.security.krb5.internal.TGSRep.init(TGSRep.java:66)


2. Cloudera Manager Agent is failing to communicate with the web server and the Cloudera Manager agent log shows:

[12/Aug/2016 16:10:44 +0000] 25780 MonitorDaemon-Scheduler urllib2_kerberos CRITICAL GSSAPI Error: Unspecified GSS failure. Minor code may provide more information/Server not found in Kerberos database
[12/Aug/2016 16:10:45 +0000] 25780 MonitorDaemon-Scheduler urllib2_kerberos CRITICAL GSSAPI Error: Unspecified GSS failure. Minor code may provide more information/Server not found in Kerberos database
[12/Aug/2016 16:10:45 +0000] 25780 Monitor-GenericMonitor urllib2_kerberos CRITICAL GSSAPI Error: Unspecified GSS failure. Minor code may provide more information/Server not found in Kerberos database
[12/Aug/2016 16:10:45 +0000] 25780 Monitor-GenericMonitor throttling_logger ERROR (35 skipped) Error fetching metrics at 'http://host1.example.com:1006/jmx'
.
.
HTTPError: HTTP Error 401: Authentication required


3. HUE error.log

11/Oct/2016 16:12:34 -0400] connectionpool INFO Starting new HTTPS connection (1): host1.example.com
[11/Oct/2016 16:12:34 -0400] kerberos_ ERROR generate_request_header(): authGSSClientStep() failed:
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/lib/hue/build/env/lib/python2.6/site-packages/requests_kerberos-0.6.1-py2.6.egg/requests_kerberos/kerberos_.py", line 114, in generate_request_header
_negotiate_value(response))
GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Server not found in Kerberos database', -1765328377))
[11/Oct/2016 16:12:34 -0400] kerberos_ ERROR authenticate_server(): authGSSClientStep() failed:
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/lib/hue/build/env/lib/python2.6/site-packages/requests_kerberos-0.6.1-py2.6.egg/requests_kerberos/kerberos_.py", line 229, in authenticate_server
_negotiate_value(response))
GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Server not found in Kerberos database', -1765328377))
[11/Oct/2016 16:12:34 -0400] kerberos_ ERROR handle_mutual_auth(): Mutual authentication failed

Applies To

CDH 5

Cause

 Centrify is not configured correctly for CDH.
 If Centrify is added to a cluster that already has kerberos enabled it can negate the existing HTTP principal
 There can be duplicate principals (http/host1.example.com@REALM.COM and HTTP/host1.example.com@REALM.COM). Centrify adds the lowercase http/host1.example.com@REALM.COM. 
 
It is also possible that HTTP/host1.example.com service principal is in wrong distinguished name (DN) in LDAP.  Example: expect the host to be under OU=BigData but the setspn shows it exists under OU=Unix Servers.

Investigation tips:
1. From the error identify which web address is the call made to e,g,
        notice the web address is : http://host1.example.com:1006/jmx

2. From Windows AD :
       setSPN -q http/host1.example.com:    

Checking domain DC=cloudera,DC=com,DC=us 
CN=host1.example.com:,OU=Unix Servers,OU=Servers, DC=cloudera,DC=com,DC=us 
nfs/host1.example.com
nfs/host1
http/host1.example.com       (Expect this to be HTTP uppercase)
http/host1                   (Expect this to be HTTP uppercase)
host/host1.example.com
host/host1
ftp/host1.example.com
ftp/host1
cifs/host1.example.com
cifs/host1

Instructions



NOTE: Some of steps require Centrify and AD/LDAP administrator to do the tasks. Please consult your Centrify or AD/LDAP administrator for exact commands. The following solution provides an overall steps.

1. Use CM to Stop the cluster  (or the impacted service)

2. From the hostname having incorrect http/hostname entry :
       adleave   (to leave the host from the AD)
        
3.  Reference:  https://docs.centrify.com/en/css/suite2016/centrify-cloudera-guide.pdf
    Page 18, step 5 

    Update  /etc/centrifydc/centrifydc.conf and remove  http from the adclient.krb5.service.principals line.

4. Find the http host principal in AD and delete it. (use 'setspn -q http*' to identify as mentioned above)
      On AD use ADSI edit or equivalent 

5. Join the computer to Active Directory using 'adjoin' command.  

6. Generate principals: CM -> Administration -> Security -> Kerberos Credentials -> Regenerate Missing Credentials.

7.  Use CM to Start the cluster  (or the impacted service)

Please log in or register to answer this question.

...