| Summary: | some trusted users are not able to logon | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Silvio Wanka <Silvio.Wanka> | ||||
| Component: | sssd | Assignee: | SSSD Maintainers <sssd-maint> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Steeve Goveas <sgoveas> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.2 | CC: | grajaiya, jhrozek, lslebodn, mkosek, mzidek, pbrezina, preichl, Silvio.Wanka | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-03-29 19:50:49 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Well, the answer is right there in the logs: (Thu Mar 17 10:16:51 2016) [sssd[be[ipa.company.com]]] [write_pipe_handler] (0x0400): All data has been sent! (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [krb5_child_timeout] (0x0040): Timeout for child [23166] reached. In case KDC is distant or network is slow you may consider increasing value of krb5_auth_timeout. (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [krb5_auth_done] (0x0020): child timed out! (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [_be_fo_set_port_status] (0x8000): Setting status: PORT_NOT_WORKING. Called from: src/providers/krb5/krb5_auth.c: krb5_auth_done: 836 (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [fo_set_port_status] (0x0100): Marking port 0 of server 'dedcs1-idm02.ipa.company.com' as 'not working' (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [fo_set_port_status] (0x0400): Marking port 0 of duplicate server 'dedcs1-idm02.ipa.company.com' as 'not working' (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA' (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [get_server_status] (0x1000): Status of server 'dedcs1-idm01.ipa.company.com' is 'working' (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [get_port_status] (0x1000): Port status of port 389 for server 'dedcs1-idm01.ipa.company.com' is 'not working' (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [get_server_status] (0x1000): Status of server 'dedcs1-idm02.ipa.company.com' is 'name resolved' (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [get_port_status] (0x1000): Port status of port 389 for server 'dedcs1-idm02.ipa.company.com' is 'not working' (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [get_server_status] (0x1000): Status of server 'dedcs1-idm02.ipa.company.com' is 'name resolved' (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [get_port_status] (0x1000): Port status of port 0 for server 'dedcs1-idm02.ipa.company.com' is 'not working' (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [fo_resolve_service_send] (0x0020): No available servers for service 'IPA' (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [be_resolve_server_done] (0x1000): Server resolution failed: 5 (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [be_mark_dom_offline] (0x1000): Marking subdomain ad-company.com offline (Thu Mar 17 10:16:57 2016) [sssd[be[ipa.company.com]]] [be_mark_subdom_offline] (0x1000): Marking subdomain ad-company.com as inactive Please try increasing the timeout. > Please try increasing the timeout.
As I wrote in some posts the current IPA (sssd?) version uses the AD site informations. So there should be no timeouts because the AD DCs are in same datacenter only different subnetworks.
But I will try it and must ask which timeout in which config file?
TIA,
Silvio
(In reply to Silvio Wanka from comment #3) > As I wrote in some posts the current IPA (sssd?) version uses the AD site As I read in some ... / Sorry for this mistake What can also happen during the authentication is that the PAC responder might e contacted and start resolving SIDs that are not cached. You can try if ignore_group_members helps, see: https://jhrozek.wordpress.com/2015/08/19/performance-tuning-sssd-for-large-ipa-ad-trust-deployments/ (In reply to Jakub Hrozek from comment #5) > You can try if ignore_group_members helps, see: This is already configured on both IdM Servers because otherwise it takes to much time at all, from sssd.conf on both IdM Server: subdomain_inherit = ldap_user_principal, ignore_group_members ldap_user_principal = nosuchattr ignore_group_members = True in domain section. And id "user@ad-domain" works without problems also for the user which can't logon. OK, so according to the krb5_child.log, the communication to AD is really slow: (Thu Mar 17 10:16:51 2016) [[sssd[krb5_child[23166]]]] [sss_child_krb5_trace_cb] (0x4000): [23166] 1458206211.828116: Getting initial credentials for ste-die (Thu Mar 17 10:16:51 2016) [[sssd[krb5_child[23166]]]] [sss_child_krb5_trace_cb] (0x4000): [23166] 1458206211.828200: FAST armor ccache: MEMORY:/var/lib/sss/db/fast_ccache_IPA.COMPANY.COM (Thu Mar 17 10:16:51 2016) [[sssd[krb5_child[23166]]]] [sss_child_krb5_trace_cb] (0x4000): [23166] 1458206211.828243: Retrieving host/dedcs1-orat01.ipa.company.com.COM -> krb5_ccache_conf_data/fast_avail/krbtgt\/AD-COMPANY.COM\@AD-COMPANY.COM@X-CACHECONF: from MEMORY:/var/lib/sss/db/fast_ccache_IPA.COMPANY.COM with result: -1765328243/Matching credential not found (Thu Mar 17 10:16:51 2016) [[sssd[krb5_child[23166]]]] [sss_child_krb5_trace_cb] (0x4000): [23166] 1458206211.828316: Sending request (174 bytes) to AD-COMPANY.COM (Thu Mar 17 10:16:55 2016) [[sssd[krb5_child[23166]]]] [sss_child_krb5_trace_cb] (0x4000): [23166] 1458206215.240126: Resolving hostname tosdead02.ad-company.com. (Thu Mar 17 10:16:57 2016) [[sssd[krb5_child[23167]]]] [main] (0x0400): krb5_child started. 10:16:57 is when sssd_be killed the child process and another one with a different PID started. As you can see from the log, just resolving tosdead02.ad-company.com took 2 seconds.. (In reply to Jakub Hrozek from comment #7) > OK, so according to the krb5_child.log, the communication to AD is really > slow:... Hi, this is maybe a problem but then all users should not be able to logon and not only 2 of 4. And the othe question is, these logs are from IPA client, why this client resolves the name from the AD DC, as I understud any KDC communication to AD DCs will be done by the IPA master servers not by the clients. I have now simple tried "kinit ste-die" and it works (also very speedy): # klist Ticket cache: KEYRING:persistent:142603269:krb_ccache_Q1SntRv Default principal: ste-die Valid starting Expires Service principal 03/17/2016 16:31:00 03/18/2016 02:31:00 krbtgt/AD-COMPANY.COM renew until 03/18/2016 02:31:00 So what is the problem here? (In reply to Silvio Wanka from comment #8) > (In reply to Jakub Hrozek from comment #7) > > OK, so according to the krb5_child.log, the communication to AD is really > > slow:... > > Hi, > > this is maybe a problem but then all users should not be able to logon and > not only 2 of 4. > Can you check if the authentications that time out are always against the same DC? > And the othe question is, these logs are from IPA client, why this client > resolves the name from the AD DC, as I understud any KDC communication to AD > DCs will be done by the IPA master servers not by the clients. > No, authentication is always done directly to the AD DCs, only identity lookups are proxied through IPA masters. > I have now simple tried "kinit ste-die" and it works (also > very speedy): > Did > # klist > Ticket cache: KEYRING:persistent:142603269:krb_ccache_Q1SntRv > Default principal: ste-die > > Valid starting Expires Service principal > 03/17/2016 16:31:00 03/18/2016 02:31:00 > krbtgt/AD-COMPANY.COM > renew until 03/18/2016 02:31:00 > > So what is the problem here? I really don't know more than the logs tell me, sorry.. I guess it could help to prepend KRB5_TRACE=/dev/stderr before the kinit to see the same amount of debug data as we see in krb5_child. That way, maybe you could single out one DC that is acting up. Also maybe grabbing tcpdump from a slow login might show where the communication lags.. Another idea, stracing the krb5_child might show which syscall takes so long, see: https://fedorahosted.org/sssd/wiki/DevelTips#UsingstracetotracktheSSSDprocesses I have solved the problem in meantime by "sss_cache -E". IMO shows this that there is a bug in the sssd cache management. But because I can't deliver any helpfully logs anymore to find this bug you can decide to close this issue as unreproducable or what else. THX Yes, I'm afraid with logs I need to close the bug. Thank you for reporting it nonetheless and I'm glad SSSD works for you now. |
Created attachment 1137385 [details] debug_level = 0xfff0 in domain section IPA domain with trust to AD. 4 AD users are allowed to logon to a server (hbac rule from anny source to this server any services). 2 of them are able to logon, the two others not. They are member of the same external group which is member of a posix user group which is used in the hbac rule. Logon tried via SSH and on the console, always the same.