Bug 1681279
Summary: | AD user not found after establishing trust and restarting sssd | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Sergey Orlov <sorlov> | ||||||||||||||||
Component: | sssd | Assignee: | SSSD Maintainers <sssd-maint> | ||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | ipa-qe <ipa-qe> | ||||||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||||||
Priority: | unspecified | ||||||||||||||||||
Version: | 8.0 | CC: | abokovoy, amore, atikhono, dbula, grajaiya, jhrozek, ksiddiqu, lslebodn, mkosek, mzidek, nsoman, pbrezina, sbose, sgoveas, toneata, tscherf, wchadwic | ||||||||||||||||
Target Milestone: | rc | Keywords: | Regression, ZStream | ||||||||||||||||
Target Release: | 8.0 | ||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
Fixed In Version: | sssd-2.2.0-1.el8 | Doc Type: | If docs needed, set a value | ||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||
Clone Of: | |||||||||||||||||||
: | 1696596 (view as bug list) | Environment: | |||||||||||||||||
Last Closed: | 2019-11-05 22:34:01 UTC | Type: | Bug | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
Embargoed: | |||||||||||||||||||
Bug Depends On: | |||||||||||||||||||
Bug Blocks: | 1696596 | ||||||||||||||||||
Attachments: |
|
Description
Sergey Orlov
2019-02-25 16:33:39 UTC
Created attachment 1538510 [details]
sssd logs for successful scenario
Added attachments with log for both scenarios
Corresponding log excerpts: ***** Responder: > (Mon Feb 25 09:22:43 2019) [sssd[nss]] [nss_getby_name] (0x0400): Input name: Administrator ... > (Mon Feb 25 09:22:43 2019) [sssd[nss]] [cache_req_search_dp] (0x0400): CR #5: Looking up [administrator] in data provider > (Mon Feb 25 09:22:43 2019) [sssd[nss]] [sss_dp_get_account_send] (0x0400): Creating request for [ad.test][0x1][BE_REQ_USER][name=administrator:-] > (Mon Feb 25 09:22:43 2019) [sssd[nss]] [cache_req_common_process_dp_reply] (0x0040): CR #5: Data Provider Error: 3, 22, Invalid argument > (Mon Feb 25 09:22:43 2019) [sssd[nss]] [cache_req_common_process_dp_reply] (0x0400): CR #5: Due to an error we will return cached data ... > (Mon Feb 25 09:22:43 2019) [sssd[nss]] [cache_req_search_cache] (0x0400): CR #5: Object [administrator] was not found in cache ***** Provider: > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sbus_method_handler] (0x2000): Received D-Bus method sssd.dataprovider.getAccountInfo on /sssd ... > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_print_server] (0x2000): Searching 192.168.50.161:389 > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(cn=ipaConfig)(objectClass=ipaGuiConfig))][cn=etc,dc=testrealm,dc=test]. > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaDomainResolutionOrder] > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 12 ... > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [dp_get_account_info_send] (0x0200): Got request for [0x1][BE_REQ_USER][name=administrator] > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [dp_attach_req] (0x0400): DP Request [Account #3]: New request. Flags [0x0001]. ... > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_print_server] (0x2000): Searching 192.168.50.161:389 > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(objectClass=ipaUserOverride)(uid=administrator))][cn=Default Trust View,cn=views,cn=accounts,dc=testrealm,dc=test]. > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 13 ... > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT] > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_get_generic_op_finished] (0x0400): Search result: Success(0), no errmsg set > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_op_destructor] (0x2000): Operation 13 finished > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_get_ad_override_done] (0x4000): No override found with filter [(&(objectClass=ipaUserOverride)(uid=administrator))]. > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_id_op_destroy] (0x4000): releasing operation connection > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain ad.test is Active > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_srv_ad_acct_lookup_step] (0x0400): Looking up AD account > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain ad.test is Active > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [be_mark_dom_offline] (0x1000): Marking subdomain ad.test offline ^^^ Why? > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [be_mark_subdom_offline] (0x1000): Marking subdomain ad.test as inactive > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_srv_ad_acct_lookup_done] (0x0040): ipa_get_*_acct request failed: [22]: Invalid argument. > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_subdomain_account_done] (0x0040): ipa_get_*_acct request failed: [22]: Invalid argument. (In reply to Alexey Tikhonov from comment #2) > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_srv_ad_acct_lookup_step] (0x0400): Looking up AD account -- this is `ipa_srv_ad_acct_lookup_step()` sending `ipa_get_ad_acct_send()` > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain ad.test is Active > (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [be_mark_dom_offline] (0x1000): Marking subdomain ad.test offline > ^^^ Why? -- this is``ipa_srv_ad_acct_lookup_done()` with error code and calling `be_mark_dom_offline()` But I do not see any clue in logs why this lookup fails. This seems to be a reason: (Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [ipa_getkeytab_send] (0x0400): Retrieving keytab for HOST161$@AD.TEST from host161.testrealm.test into /var/lib/sss/keytabs/ad.test.keytab3WVRKr using ccache /var/lib/sss/db/ccache_TESTREALM.TEST ... (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_getkeytab_timeout] (0x0020): Timeout reached for retrieving keytab from IPA server (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_getkeytab_recv] (0x2000): ipa-getkeytab status 110 (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_server_trust_1way_kt_done] (0x0080): ipa_getkeytab_recv failed: 1432158267 (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [check_file] (0x0400): lstat for [/var/lib/sss/keytabs/ad.test.keytab3WVRKr] failed: [2][No such file or directory]. (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_check_keytab] (0x0400): Keytab /var/lib/sss/keytabs/ad.test.keytab3WVRKr is not present (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_server_trust_1way_kt_done] (0x0080): Trying to recover and use the previous keytab, if available (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [check_file] (0x0400): lstat for [/var/lib/sss/keytabs/ad.test.keytab] failed: [2][No such file or directory]. (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_check_keytab] (0x0400): Keytab /var/lib/sss/keytabs/ad.test.keytab is not present (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_server_trust_1way_kt_done] (0x0040): Cannot use the old keytab: 2 Created attachment 1538884 [details]
strace logs for sssd restart
I attached strace logs for strace
How logs were collected:
sed -i -re "s|(ExecStart=)(.+)|\1strace -ff -o /tmp/sssd -s 255 -y \2|" /usr/lib/systemd/system/sssd.service
systemctl daemon-reload
echo vagrant | /usr/bin/ipa trust-add ad.test --admin Administrator --type=ad --external=true
systemctl restart sssd
=> Job for sssd.service failed because a timeout was exceeded.
Another observation: if I disable selinux with setenforce 0 than issue us gone, id Administrator works after sssd restart I am attaching relevant excerpt from journal and output of sealert with selinix in enforcing mode Created attachment 1538896 [details]
excerpt from journal with selinux alert
Created attachment 1538897 [details]
Output of sealert -l
Created attachment 1538908 [details]
dirsrv logs
Created attachment 1538926 [details]
audit.log
(In reply to Sergey Orlov from comment #7) > excerpt from journal with selinux alert Details of this alert are: > type=AVC msg=audit(1551194163.529:2026): avc: denied { dac_override } for pid=25664 comm="ipa-getkeytab"... > type=SYSCALL msg=audit(1551194163.529:2026): ... success=no exit=-13 ... comm="ipa-getkeytab" exe="/usr/sbin/ipa-getkeytab" ... SYSCALL=openat ... $ date --date='@1551194163.529' Tue Feb 26 16:16:03 CET 2019 Corresponding DS log: [26/Feb/2019:10:16:03.496601797 -0500] conn=40 fd=123 slot=123 connection from 192.168.50.71 to 192.168.50.71 [26/Feb/2019:10:16:03.496801083 -0500] conn=40 op=0 EXT oid="1.3.6.1.4.1.1466.20037" name="start_tls_plugin" [26/Feb/2019:10:16:03.496826125 -0500] conn=40 op=0 RESULT err=0 tag=120 nentries=0 etime=0.0000183680 [26/Feb/2019:10:16:03.501380035 -0500] conn=40 TLS1.3 128-bit AES-GCM [26/Feb/2019:10:16:03.503727247 -0500] conn=40 op=1 BIND dn="" method=sasl version=3 mech=GSSAPI [26/Feb/2019:10:16:03.518528264 -0500] conn=40 op=1 RESULT err=14 tag=97 nentries=0 etime=0.0021523558, SASL bind in progress [26/Feb/2019:10:16:03.518961068 -0500] conn=40 op=2 BIND dn="" method=sasl version=3 mech=GSSAPI [26/Feb/2019:10:16:03.519875302 -0500] conn=40 op=2 RESULT err=14 tag=97 nentries=0 etime=0.0000969496, SASL bind in progress [26/Feb/2019:10:16:03.520128459 -0500] conn=40 op=3 BIND dn="" method=sasl version=3 mech=GSSAPI [26/Feb/2019:10:16:03.520623373 -0500] conn=40 op=3 RESULT err=0 tag=97 nentries=0 etime=0.0000546083 dn="krbprincipalname=cifs/host71.testrealm.test,cn=services,cn=accounts,dc=testrealm,dc=test" [26/Feb/2019:10:16:03.520898015 -0500] conn=40 op=4 EXT oid="2.16.840.1.113730.3.8.10.5" name="IPA Password Manager" [26/Feb/2019:10:16:03.521557160 -0500] conn=40 op=4 RESULT err=0 tag=120 nentries=0 etime=0.0000735079 [26/Feb/2019:10:16:03.532329920 -0500] conn=40 op=5 UNBIND [26/Feb/2019:10:16:03.532366403 -0500] conn=40 op=5 fd=123 closed - U1 I guess it is somehow relevant to this issue due to comment: (In reply to Sergey Orlov from comment #6) > Another observation: if I disable selinux with setenforce 0 than issue us gone But I do not understand how. Because all of this happens *before* sssd in restarted and ipa-getkeytab fails with timeout: > Feb 26 10:16:06 host71.testrealm.test systemd[1]: Starting System Security Services Daemon... ... > Feb 26 10:16:06 host71.testrealm.test ipa-getkeytab[25681]: GSSAPI client step 1 -- this corresponds to strace log: > sendto(5<socket:[106305]>, "<39>Feb 26 10:16:08 ipa-getkeytab: GSSAPI client step 1", 55, MSG_NOSIGNAL, NULL, 0) = 55 -- and DS log: > [26/Feb/2019:10:16:06.933077478 -0500] conn=42 fd=109 slot=109 connection from 192.168.50.71 to 192.168.50.71 > [26/Feb/2019:10:16:06.933299133 -0500] conn=42 op=0 EXT oid="1.3.6.1.4.1.1466.20037" name="start_tls_plugin" > [26/Feb/2019:10:16:06.933331509 -0500] conn=42 op=0 RESULT err=0 tag=120 nentries=0 etime=0.0000210699 > [26/Feb/2019:10:16:06.938048380 -0500] conn=42 TLS1.3 128-bit AES-GCM ... note 5 secs timeout here > [26/Feb/2019:10:16:11.942641852 -0500] conn=42 op=-1 fd=109 closed - B1 What ipa-getkeytab process is busy before being terminated is: > connect(10<socket:[106316]>, {sa_family=AF_UNIX, sun_path="/var/lib/sss/pipes/nss"}, 110) = 0 > fstat(10<socket:[106316]>, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 > poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}]) > sendto(10<socket:[106316]>, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 > poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}]) > sendto(10<socket:[106316]>, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 > poll([{fd=10<socket:[106316]>, events=POLLIN}], 1, 300000) = 1 ([{fd=10, revents=POLLIN}]) > read(10<socket:[106316]>, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 > poll([{fd=10<socket:[106316]>, events=POLLIN}], 1, 300000) = 1 ([{fd=10, revents=POLLIN}]) > read(10<socket:[106316]>, "\1\0\0\0", 4) = 4 > poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}]) > sendto(10<socket:[106316]>, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 > poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}]) > sendto(10<socket:[106316]>, "\0\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 > poll([{fd=10<socket:[106316]>, events=POLLIN}], 1, 300000 <unfinished ...>) = ? > +++ killed by SIGKILL +++ Why ipa-getkeytab calls NSS API and why is it blocked? (In reply to Alexey Tikhonov from comment #11) > Why ipa-getkeytab calls NSS API and why is it blocked? I do not have sssd logs corresponding to strace log, but relevant part of responder log from previous run seems to be: > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [accept_fd_handler] (0x0400): Client connected! > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1]. > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1]. > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [nss_getby_id] (0x0400): Input ID: 0 > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_set_plugin] (0x2000): CR #1: Setting "User by ID" plugin > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #1: New request 'User by ID' > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sbus_requests_add] (0x4000): Chaining request: -:0:sssd.dataprovider.getDomains:/sssd: > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [get_client_cred] (0x4000): Client creds: euid[995] egid[992] pid[6682]. > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [setup_client_idle_timer] (0x4000): Idle timer re-set for client [0x5626b1959b00][30] > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [accept_fd_handler] (0x0400): Client connected! > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1]. > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1]. > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [nss_getby_name] (0x0400): Input name: root > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_set_plugin] (0x2000): CR #2: Setting "User by name" plugin > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #2: New request 'User by name' > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_process_input] (0x0400): CR #2: Parsing input name [root] > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_parse_inp_send] (0x0200): Requesting info for [(null)] from [<ALL>] > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sbus_requests_add] (0x4000): Chaining request: -:0:sssd.dataprovider.getDomains:/sssd: ***** take a note of 5 secs timeout -- `ipa-getkeytab` fails in the same time frame > (Mon Feb 25 09:22:43 2019) [sssd[nss]] [sbus_dispatch] (0x4000): Dispatching. May it be some kind of loop (subdomain -> getketab -> nss ->subdomain), as Jakub supposed? (In reply to Alexey Tikhonov from comment #11) > (In reply to Sergey Orlov from comment #7) > > excerpt from journal with selinux alert > > Details of this alert are: > > > type=AVC msg=audit(1551194163.529:2026): avc: denied { dac_override } for pid=25664 comm="ipa-getkeytab"... > > type=SYSCALL msg=audit(1551194163.529:2026): ... success=no exit=-13 ... comm="ipa-getkeytab" exe="/usr/sbin/ipa-getkeytab" ... SYSCALL=openat ... This should have been fixed with SELinux policy changes in selinux-policy-3.14.1-56 (bug 1668168) already. > Because all of this happens *before* sssd in restarted and ipa-getkeytab > fails with timeout: > > > Feb 26 10:16:06 host71.testrealm.test systemd[1]: Starting System Security Services Daemon... > ... > > Feb 26 10:16:06 host71.testrealm.test ipa-getkeytab[25681]: GSSAPI client step 1 > > -- this corresponds to strace log: > > sendto(5<socket:[106305]>, "<39>Feb 26 10:16:08 ipa-getkeytab: GSSAPI client step 1", 55, MSG_NOSIGNAL, NULL, 0) = 55 > > -- and DS log: > > [26/Feb/2019:10:16:06.933077478 -0500] conn=42 fd=109 slot=109 connection from 192.168.50.71 to 192.168.50.71 > > [26/Feb/2019:10:16:06.933299133 -0500] conn=42 op=0 EXT oid="1.3.6.1.4.1.1466.20037" name="start_tls_plugin" > > [26/Feb/2019:10:16:06.933331509 -0500] conn=42 op=0 RESULT err=0 tag=120 nentries=0 etime=0.0000210699 > > [26/Feb/2019:10:16:06.938048380 -0500] conn=42 TLS1.3 128-bit AES-GCM > ... note 5 secs timeout here > > [26/Feb/2019:10:16:11.942641852 -0500] conn=42 op=-1 fd=109 closed - B1 > > > > What ipa-getkeytab process is busy before being terminated is: > > connect(10<socket:[106316]>, {sa_family=AF_UNIX, sun_path="/var/lib/sss/pipes/nss"}, 110) = 0 > > fstat(10<socket:[106316]>, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 > > poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}]) > > sendto(10<socket:[106316]>, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 > > poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}]) > > sendto(10<socket:[106316]>, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 > > poll([{fd=10<socket:[106316]>, events=POLLIN}], 1, 300000) = 1 ([{fd=10, revents=POLLIN}]) > > read(10<socket:[106316]>, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 > > poll([{fd=10<socket:[106316]>, events=POLLIN}], 1, 300000) = 1 ([{fd=10, revents=POLLIN}]) > > read(10<socket:[106316]>, "\1\0\0\0", 4) = 4 > > poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}]) > > sendto(10<socket:[106316]>, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 > > poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}]) > > sendto(10<socket:[106316]>, "\0\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 > > poll([{fd=10<socket:[106316]>, events=POLLIN}], 1, 300000 <unfinished ...>) = ? > > +++ killed by SIGKILL +++ > > > Why ipa-getkeytab calls NSS API and why is it blocked? libkrb5 uses NSS API calls to resolve default ccache or default keytab location if its name template contains references to user names or uids. (In reply to Alexey Tikhonov from comment #11) > I guess it is somehow relevant to this issue due to comment: > > Another observation: if I disable selinux with setenforce 0 than issue us gone > But I do not understand how. As Alexander explained: ``` this is our trust fetch domains calling oddjobd helper and then the helper is retrieving TDO creds to talk to AD DC this is should be fixed by selinux update so logs Sergey provided aren't up to date we need to update the vm and then re-run the test because then you'll have a keytab saved by oddjobd helper already and sssd will not fetch the keytab again ``` So when selinux is disabled, keytab is pre-fetched by oddjobd helper and sssd doesn't fetch it. This is how disabling selinux resolves issue. But question "why sssd fails to fetch keytab" is open. As Alexander pointed out: ``` you may want to add one more CPU to VM and see if that helps because this might be similar to one contention lock issue we saw in past between LDAP and Kerberos ``` (In reply to Alexey Tikhonov from comment #12) > > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_set_plugin] (0x2000): CR #1: Setting "User by ID" plugin > > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #1: New request 'User by ID' > > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sbus_requests_add] (0x4000): Chaining request: -:0:sssd.dataprovider.getDomains:/sssd: ... > > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_set_plugin] (0x2000): CR #2: Setting "User by name" plugin > > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #2: New request 'User by name' > > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sbus_requests_add] (0x4000): Chaining request: -:0:sssd.dataprovider.getDomains:/sssd: > ***** take a note of 5 secs timeout -- `ipa-getkeytab` fails in the same time frame > > (Mon Feb 25 09:22:43 2019) [sssd[nss]] [sbus_dispatch] (0x4000): Dispatching. > > May it be some kind of loop (subdomain -> getketab -> nss ->subdomain), as Jakub supposed? As Pavel explained: "Chaining request" means there is already an outstanding request and when it is finished the new request will reuse its result without issuing another dbus call. The cache req plugins in this log are different, but both of them first run this getDomains() method to refresh list of domains and these methods are the same. And this log actually means there is *already* outstanding getDomains() happened before this part of log. So yes, it looks like loop as Jakub told. I think there are two issues. Looking at comment #12: > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [nss_getby_id] (0x0400): Input ID: 0 > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_set_plugin] (0x2000): CR #1: Setting "User by ID" plugin > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #1: New request 'User by ID' > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sbus_requests_add] (0x4000): Chaining request: -:0:sssd.dataprovider.getDomains:/sssd: > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [get_client_cred] (0x4000): Client creds: euid[995] egid[992] pid[6682]. > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [setup_client_idle_timer] (0x4000): Idle timer re-set for client [0x5626b1959b00][30] Here we see getpwuid(0) triggering getDomains. This is wrong, the ID 0 and name root should just return ENOENT before issuing getDomains. I think this one should be easy to fix. But separately, Kaleem also pinged me about this issue and I think in his case I see even a different problem: Here we see that ipa-getkeytab is starting. This is during the subdomains request: (Wed Feb 27 06:14:44 2019) [sssd[be[trustcli16.test]]] [ipa_getkeytab_send] (0x0400): Retrieving keytab for TRUSTCLI16$@DTREE.QE from host-8-245-208.trustcli 16.test into /var/lib/sss/keytabs/dtree.qe.keytab2FjVce using ccache /var/lib/sss/db/ccache_TRUSTCLI16.TEST At the same time, there is a request for user ipaapi in the NSS log (getpwnam(ipaapi): (Wed Feb 27 06:14:44 2019) [sssd[nss]] [cache_req_set_plugin] (0x2000): CR #0: Setting "User by name" plugin (Wed Feb 27 06:14:44 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #0: New request 'User by name' (Wed Feb 27 06:14:44 2019) [sssd[nss]] [cache_req_process_input] (0x0400): CR #0: Parsing input name [ipaapi] (Wed Feb 27 06:14:44 2019) [sssd[nss]] [sss_parse_inp_send] (0x0200): Requesting info for [(null)] from [<ALL>] (Wed Feb 27 06:14:44 2019) [sssd[nss]] [sss_domain_get_state] (0x1000): Domain trustcli16.test is Active (Wed Feb 27 06:14:44 2019) [sssd[nss]] [sbus_requests_add] (0x4000): Chaining request: -:0:sssd.dataprovider.getDomains:/sssd: And this request chains to the getDomains which triggered the NSS call in the first place. I'm not sure how to solve the second issue. Since the ipaapi user in this case exists, in theory it might help to make sure lookups agains the files domain do not trigger getDomains. This would help for entries that do exist. But I'm worried that if there was a request for a non-existent user for one reason or another, then this would go to the other domains and trigger the getDomains as well. I actually wonder if it makes sense to wait for chained getDomains at all, at least for non-qualified or numerical inputs. Because then any lookup performed by any part of the chain above sssd's subdomains provider (ipa-getkeytab, ipa, kdc itself) would just chain itself to getDomains and never finish. I would welcome other opinions. From my point of view, it doesn't matter what was the trigger of `ipa-getkeytab`. If `ipa-getkeytab` leads to getDomains() call (that can't be completed before `ipa-getkeytab` is finished) then it is already self-locked. Doesn't matter if it was first or chained getDomains() request. And if this assumption is correct than addition of cpu cores wont help. I executed scenario with two CPUs -- it does not change result, id command still fails. After updating selinux-policy to 3.14.1-61 the issue is gone, id command succeeds after restarting the sssd. (In reply to Sergey Orlov from comment #18) > I executed scenario with two CPUs -- it does not change result, id command > still fails. > > After updating selinux-policy to 3.14.1-61 the issue is gone, id command > succeeds after restarting the sssd. Because the oddjob helper fetches the keytab, then sssd's fetch still fails, but if sssd's ipa-getkeytab fails, then sssd checks if the keytab already exists and has the required principals and tries to use them. So this helps mask the sssd issue. Upstream ticket: https://pagure.io/SSSD/sssd/issue/3971 Upstream ticket: https://pagure.io/SSSD/sssd/issue/3972 In conversation with Alexander we found out that when ipa trust-add is called with --oneway=True it does not prefetch AD keytab. Related code is https://pagure.io/freeipa/blob/master/f/ipaserver/plugins/trust.py#_761 Alexander: "it is actually intended behavior right now and sssd should not depend on the keytab being produced by the helper" It is the other way around -- we don't call to oddjobd helper in a case of two-way trust (as we have cross-realm TGT there). We'll get this fixed eventually and all situations will be handled with a prefetch but right now this means for two-way trust you can reproduce SSSD issue. (In reply to Alexander Bokovoy from comment #23) > It is the other way around -- we don't call to oddjobd helper in a case of > two-way trust (as we have cross-realm TGT there). We'll get this fixed > eventually and all situations will be handled with a prefetch but right now > this means for two-way trust you can reproduce SSSD issue. OK, this sort of raises the severity of the issue. btw do we not test the two-way trust? I really wonder why are we finding this bug now.. Somehow I mixed one-way and two-way trusts... So, for future verification: this command ipa trust-add ipaad2012r2.test --admin Administrator --password --two-way=True does not prefetch keytab and can be used to check if subsequent calls to id/getent cause sssd to fetch keytab Also for verification we can manually delete keytab file from /var/lib/sssd/keytabs after establishing one-way trust. * master: d409c10 * sssd-1-16: b927dc7 Verified Using Version: ipa-server-4.8.0-8.module+el8.1.0+3977+ec23ef34.x86_64 sssd-ipa-2.2.0-11.el8.x86_64 This needs sanity check only : Adding test-run of trust-test suite. ---------------------------------------------------------- :: [ 14:37:17 ] :: [ BEGIN ] :: Running 'echo Secret123 | ipa trust-add ipaad2k16cin.test --admin Administrator --range-type=ipa-ad-trust --password --two-way=True' *** Current Time: Fri Aug 16 14:37:20 2019 Localwatchdog at: Sat Aug 17 13:51:19 2019 ---------------------------------------------------------- Added Active Directory trust for realm "ipaad2k16cin.test" ---------------------------------------------------------- Realm name: ipaad2k16cin.test Domain NetBIOS name: IPAAD2K16CIN Domain Security Identifier: S-1-5-21-2842256260-195550463-1751006347 Trust direction: Two-way trust Trust type: Active Directory domain Trust status: Established and verified :: [ 14:37:22 ] :: [ PASS ] :: Command 'echo Secret123 | ipa trust-add ipaad2k16cin.test --admin Administrator --range-type=ipa-ad-trust --password --two-way=True' (Expected 0, got 0) :: [ 14:37:22 ] :: [ BEGIN ] :: Running 'systemctl stop sssd; rm -rf /var/lib/sss/{db,mc}/*; systemctl start sssd; sleep 60' *** Current Time: Fri Aug 16 14:37:23 2019 Localwatchdog at: Sat Aug 17 13:51:23 2019 *** Current Time: Fri Aug 16 14:38:20 2019 Localwatchdog at: Sat Aug 17 13:51:19 2019 :: [ 14:38:22 ] :: [ PASS ] :: Command 'systemctl stop sssd; rm -rf /var/lib/sss/{db,mc}/*; systemctl start sssd; sleep 60' (Expected 0, got 0) :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: RHEL7 ipa adtrust install fails if netbios hostname longer than 15 characters, bz1030517 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ 14:38:22 ] :: [ BEGIN ] :: Running 'sleep 60' *** Current Time: Fri Aug 16 14:38:23 2019 Localwatchdog at: Sat Aug 17 13:51:23 2019 *** Current Time: Fri Aug 16 14:39:20 2019 Localwatchdog at: Sat Aug 17 13:51:19 2019 :: [ 14:39:22 ] :: [ PASS ] :: Command 'sleep 60' (Expected 0, got 0) :: [ 14:39:22 ] :: [ LOG ] :: Hostname longer than 15 chars, checking for bug :: [ 14:39:23 ] :: [ BEGIN ] :: Running 'id Administrator' 2019-08-16T18:39:23+0000 [ci-vm-10-0-153-183.h] uid=879000500(administrator) gid=879000500(administrator) groups=879000500(administrator),879000518(schema admins),879000519(enterprise admins),879000512(domain admins),879000513(domain users),879000520(group policy creator owners) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:3651 |