Created attachment 922953 [details] m03 Description of problem: After upgrade from 3.2 to 3.3 users were unable to log into WebAdmin UI using Kerberos/LDAP authentication. Attempt to remove and re-add domain failed with error: # rhevm-manage-domains -action=delete -domain=******.com # service ovirt-engine restart # rhevm-manage-domains -action=add -provider=RHDS -user=rhevm -interactive -domain=******.com Enter password: Error: Kerberos error. Please check log for further details. The domain ******.com has been added to the engine as an authentication source but no users from that domain have been granted permissions within the oVirt Manager. Users from this domain can be granted permissions by editing the domain using -action=edit and specifying -addPermissions or from the Web administration interface logging in as admin@internal user. oVirt Engine restart is required in order for the changes to take place (service ovirt-engine restart). Manage Domains completed successfully ]# service ovirt-engine restart # rhevm-manage-domains -action=list Domain: ******.com User name: rhevm@******.COM Manage Domains completed successfully The engine log reports: 014-07-23 05:47:44,174 ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (ajp-/127.0.0.1:8702-12) Failed ldap search server ldap://ldap01.******.******.******.com:389 using user rhevm@******.COM due to Kerberos error. Please check log for further details.. We should not try the next server Additional info: Tried to point to different LDAP server returns same error: # rhevm-manage-domains -action=edit -domain=******.com -provider=RHDS -ldapServers=******.******.******.com Error: exception message: Connection refused Failure while testing domain ******.com. Details: Kerberos error. Please check log for further details. Shows reachable however: # ping ******.******.******.com PING ******.******.******.com (XX.X.XXX.X) 56(84) bytes of data. 64 bytes from ******.******.******.com (XX.X.XXX.X): icmp_seq=1 ttl=254 time=0.291 ms 64 bytes from ******.******.******.com (XX.X.XXX.X): icmp_seq=2 ttl=254 time=0.348 ms ^C --- ******.******.******.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1826ms rtt min/avg/max/mdev = 0.291/0.319/0.348/0.033 ms # telnet ******.******.******.com 389 Trying XX.X.XXX.X... Connected to ******.******.******.com. Escape character is '^]'. ^] telnet> Connection closed. # rhevm-manage-domains -action=list Domain: ******.com User name: rhevm@******.COM Manage Domains completed successfully See this in var/log/ovirt-engine/engine-manage-domains.log 2014-07-23 05:22:59,616 INFO [org.ovirt.engine.core.domains.ManageDomains] Creating kerberos configuration for domain(s): ******.com 2014-07-23 05:22:59,628 INFO [org.ovirt.engine.core.domains.ManageDomains] Successfully created kerberos configuration for domain(s): ******.com 2014-07-23 05:22:59,629 INFO [org.ovirt.engine.core.domains.ManageDomains] Testing kerberos configuration for domain: ******.com 2014-07-23 05:22:59,849 ERROR [org.ovirt.engine.core.utils.kerberos.JndiAction] Error during login to kerberos. Detailed information is: [LDAP: error code 49 - SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information ()] User is able to kinit and ldapsearch shows user details: # ldapsearch -LLL -Z -x -W -b dc=******,dc=com -D uid=rhevm,ou=serviceaccounts,dc=******,dc=com -h ldap01.******.******.******.com uid=rhevm Enter LDAP Password: dn: uid=rhevm,ou=serviceaccounts,dc=*******,dc=com sn: rhevm givenName: Service cn: Service RHEV objectClass: inetOrgPerson objectClass: top objectClass: organizationalPerson objectClass: person uid: rhevm description: RHEVM As you can see above, they did LDAP query to the host ldap01.******.******.******.com (from RHEV-M host) as rhevm user which had issues according to the engine.log: ERROR [org.ovirt.engine.core.bll.adbroker.DirectorySearcher] (DefaultQuartzScheduler_Worker-1) Failed ldap search server LDAP://ldap01.******.******.******.com:389 using user rhevm@******.COM due to Kerberos error. Please check log for further details.. We should not try the next server Workaround at this time: Using different user was able to add successfully and user able to authenticate. Version-Release number of selected component (if applicable): rhevm-3.3.4-0.53.el6ev.noarch How reproducible: seen twice with same customer on different environments. Note: Both had a stopped upgrade that was re ran. One was interrupted by user (by answering "no" to some question) and other by error " ERROR - Failed to execute stage 'Closing up': Command '/sbin/service' failed to execute" I am attaching upgrade/log snippets for both (m03.prod-upgrade20140723.txt and m04.prod-upgrade20140731.txt). I have log collectors as well..please let me know if there is something else I can provide or ask for. Could this be a failure to clear the cache or corruption of cache?
Oved, I have attached logs for both environments. I will reach out to Zoli and see if I can get you access. Let me know if you need anything else. I have also dropped the urgency to high.
(In reply to Michael Everette from comment #7) > Oved, > > I have attached logs for both environments. I will reach out to Zoli and see > if I can get you access. Let me know if you need anything else. > > I have also dropped the urgency to high. Thank you. As for my question, did it work well by just setting another user to do the LDAP queries via the manage-domains utility?
Closing. If it happens again please reopen.