Bug 1125316

Summary: After upgrade to 3.3 users unable to authenticate through kerberos
Product: Red Hat Enterprise Virtualization Manager Reporter: Michael Everette <meverett>
Component: ovirt-engine-setupAssignee: Oved Ourfali <oourfali>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Pavel Stehlik <pstehlik>
Severity: high Docs Contact:
Priority: high    
Version: 3.3.0CC: acathrow, bazulay, ecohen, iheim, meverett, oourfali, Rhev-m-bugs, sbonazzo, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-19 11:45:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Michael Everette 2014-07-31 14:44:05 UTC
Created attachment 922953 [details]
m03

Description of problem:
After upgrade from 3.2 to 3.3 users were unable to log into WebAdmin UI using Kerberos/LDAP authentication. Attempt to remove and re-add domain failed with error:

# rhevm-manage-domains -action=delete -domain=******.com
# service ovirt-engine restart
# rhevm-manage-domains -action=add -provider=RHDS -user=rhevm -interactive -domain=******.com
Enter password:
Error: Kerberos error. Please check log for further details.
The domain ******.com has been added to the engine as an authentication source but no users from that domain have been granted permissions within the oVirt Manager.
Users from this domain can be granted permissions by editing the domain using -action=edit and specifying -addPermissions or from the Web administration interface logging in as admin@internal user.
oVirt Engine restart is required in order for the changes to take place (service ovirt-engine restart).
Manage Domains completed successfully
]# service ovirt-engine restart
# rhevm-manage-domains -action=list
Domain: ******.com
        User name: rhevm@******.COM
Manage Domains completed successfully

The engine log reports: 
014-07-23 05:47:44,174 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-12) Failed ldap search server ldap://ldap01.******.******.******.com:389 using user rhevm@******.COM due to Kerberos error. Please check log for further details.. We should not try the next server

Additional info:

Tried to point to different LDAP server returns same error:
# rhevm-manage-domains -action=edit -domain=******.com -provider=RHDS -ldapServers=******.******.******.com
Error:  exception message: Connection refused
Failure while testing domain ******.com. Details: Kerberos error. Please check log for further details.

Shows reachable however:

# ping ******.******.******.com
PING ******.******.******.com (XX.X.XXX.X) 56(84) bytes of data.
64 bytes from ******.******.******.com (XX.X.XXX.X): icmp_seq=1 ttl=254 time=0.291 ms
64 bytes from ******.******.******.com (XX.X.XXX.X): icmp_seq=2 ttl=254 time=0.348 ms
^C
--- ******.******.******.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1826ms
rtt min/avg/max/mdev = 0.291/0.319/0.348/0.033 ms

# telnet ******.******.******.com 389
Trying XX.X.XXX.X...
Connected to ******.******.******.com.
Escape character is '^]'.
^]
telnet> Connection closed.

# rhevm-manage-domains -action=list
Domain: ******.com
        User name: rhevm@******.COM
Manage Domains completed successfully


See this in var/log/ovirt-engine/engine-manage-domains.log
2014-07-23 05:22:59,616 INFO  [org.ovirt.engine.core.domains.ManageDomains] Creating kerberos configuration for domain(s): ******.com
2014-07-23 05:22:59,628 INFO  [org.ovirt.engine.core.domains.ManageDomains] Successfully created kerberos configuration for domain(s): ******.com
2014-07-23 05:22:59,629 INFO  [org.ovirt.engine.core.domains.ManageDomains] Testing kerberos configuration for domain: ******.com
2014-07-23 05:22:59,849 ERROR [org.ovirt.engine.core.utils.kerberos.JndiAction] Error during login to kerberos. Detailed information is: [LDAP: error code 49 - SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information ()]

User is able to kinit and ldapsearch shows user details:

# ldapsearch -LLL -Z -x -W -b dc=******,dc=com -D uid=rhevm,ou=serviceaccounts,dc=******,dc=com -h ldap01.******.******.******.com uid=rhevm
Enter LDAP Password: 
dn: uid=rhevm,ou=serviceaccounts,dc=*******,dc=com
sn: rhevm
givenName: Service
cn: Service RHEV
objectClass: inetOrgPerson
objectClass: top
objectClass: organizationalPerson
objectClass: person
uid: rhevm
description: RHEVM

As you can see above, they did LDAP query to the host ldap01.******.******.******.com (from RHEV-M host) as rhevm user which had issues according to the engine.log: 
ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (DefaultQuartzScheduler_Worker-1) Failed ldap search server LDAP://ldap01.******.******.******.com:389 using user rhevm@******.COM due to Kerberos error. Please check log for further details.. We should not try the next server


Workaround at this time:
Using different user was able to add successfully and user able to authenticate.


Version-Release number of selected component (if applicable):
rhevm-3.3.4-0.53.el6ev.noarch

How reproducible:
seen twice with same customer on different environments. 


Note: Both had a stopped upgrade that was re ran. One was interrupted by user (by answering "no" to some question) and other by error " ERROR - Failed to execute stage 'Closing up': Command '/sbin/service' failed to execute"

I am attaching upgrade/log snippets for both (m03.prod-upgrade20140723.txt and 	m04.prod-upgrade20140731.txt). I have log collectors as well..please let me know if there is something else I can provide or ask for.

Could this be a failure to clear the cache or corruption of cache?

Comment 7 Michael Everette 2014-08-04 13:12:56 UTC
Oved,

I have attached logs for both environments. I will reach out to Zoli and see if I can get you access. Let me know if you need anything else.

I have also dropped the urgency to high.

Comment 8 Oved Ourfali 2014-08-04 13:19:49 UTC
(In reply to Michael Everette from comment #7)
> Oved,
> 
> I have attached logs for both environments. I will reach out to Zoli and see
> if I can get you access. Let me know if you need anything else.
> 
> I have also dropped the urgency to high.

Thank you.
As for my question, did it work well by just setting another user to do the LDAP queries via the manage-domains utility?

Comment 11 Oved Ourfali 2014-08-19 11:45:56 UTC
Closing. If it happens again please reopen.