RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1681279 - AD user not found after establishing trust and restarting sssd
Summary: AD user not found after establishing trust and restarting sssd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: SSSD Maintainers
QA Contact: ipa-qe
URL:
Whiteboard:
Depends On:
Blocks: 1696596
TreeView+ depends on / blocked
 
Reported: 2019-02-25 16:33 UTC by Sergey Orlov
Modified: 2020-05-02 19:08 UTC (History)
17 users (show)

Fixed In Version: sssd-2.2.0-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1696596 (view as bug list)
Environment:
Last Closed: 2019-11-05 22:34:01 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sssd logs for failing scenario (258.72 KB, application/zip)
2019-02-25 16:33 UTC, Sergey Orlov
no flags Details
sssd logs for successful scenario (165.56 KB, application/zip)
2019-02-25 16:37 UTC, Sergey Orlov
no flags Details
strace logs for sssd restart (317.31 KB, application/zip)
2019-02-26 15:45 UTC, Sergey Orlov
no flags Details
excerpt from journal with selinux alert (9.53 KB, text/plain)
2019-02-26 15:57 UTC, Sergey Orlov
no flags Details
Output of sealert -l (2.80 KB, text/plain)
2019-02-26 15:58 UTC, Sergey Orlov
no flags Details
dirsrv logs (225.36 KB, application/zip)
2019-02-26 16:45 UTC, Sergey Orlov
no flags Details
audit.log (726.12 KB, text/plain)
2019-02-26 17:40 UTC, Sergey Orlov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 4944 0 None open Retrieving UID 0 or name root should not trigger getDomains 2020-09-29 19:20:04 UTC
Github SSSD sssd issues 4945 0 None closed Circular dependency between subdomains update and NSS responder invoking getDomains 2020-09-29 19:20:04 UTC
Github SSSD sssd issues 4964 0 None closed ipa-getkeytab can call NSS operation which might deadlock the subdomains request 2020-09-29 19:20:04 UTC
Red Hat Product Errata RHSA-2019:3651 0 None None None 2019-11-05 22:34:15 UTC

Description Sergey Orlov 2019-02-25 16:33:39 UTC
Created attachment 1538497 [details]
sssd logs for failing scenario

Description of problem:
After establishing trust with Windows AD and restarting sssd the query for AD user fails.

Version-Release number of selected component (if applicable):
sssd-common-2.0.0-43.el8.x86_64

How reproducible:
stable


Steps to Reproduce:
1. /usr/bin/ipa trust-add ad.test --admin Administrator --type=ad --external=true
2. systemctl restart sssd
3. repeat "id Administrator" for 200 seconds

Actual results:
id: ‘Administrator’: no such user,

Expected results:
User Administrator properties

Additional info:

If we reduce scenario to two steps:
1. /usr/bin/ipa trust-add ad.test --admin Administrator --type=ad --external=true
2. repeat "id Administrator" for 200 seconds
then we get expected result.

This BZ is a foollow up of https://bugzilla.redhat.com/show_bug.cgi?id=1659498

Comment 1 Sergey Orlov 2019-02-25 16:37:46 UTC
Created attachment 1538510 [details]
sssd logs for successful scenario

Added attachments with log for both scenarios

Comment 2 Alexey Tikhonov 2019-02-26 12:10:51 UTC
Corresponding log excerpts:

 ***** Responder:
> (Mon Feb 25 09:22:43 2019) [sssd[nss]] [nss_getby_name] (0x0400): Input name: Administrator
...
> (Mon Feb 25 09:22:43 2019) [sssd[nss]] [cache_req_search_dp] (0x0400): CR #5: Looking up [administrator] in data provider
> (Mon Feb 25 09:22:43 2019) [sssd[nss]] [sss_dp_get_account_send] (0x0400): Creating request for [ad.test][0x1][BE_REQ_USER][name=administrator:-]
> (Mon Feb 25 09:22:43 2019) [sssd[nss]] [cache_req_common_process_dp_reply] (0x0040): CR #5: Data Provider Error: 3, 22, Invalid argument
> (Mon Feb 25 09:22:43 2019) [sssd[nss]] [cache_req_common_process_dp_reply] (0x0400): CR #5: Due to an error we will return cached data
...
> (Mon Feb 25 09:22:43 2019) [sssd[nss]] [cache_req_search_cache] (0x0400): CR #5: Object [administrator] was not found in cache


 ***** Provider:
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sbus_method_handler] (0x2000): Received D-Bus method sssd.dataprovider.getAccountInfo on /sssd
...
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_print_server] (0x2000): Searching 192.168.50.161:389
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(cn=ipaConfig)(objectClass=ipaGuiConfig))][cn=etc,dc=testrealm,dc=test].
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipaDomainResolutionOrder]
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 12
...
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [dp_get_account_info_send] (0x0200): Got request for [0x1][BE_REQ_USER][name=administrator]
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [dp_attach_req] (0x0400): DP Request [Account #3]: New request. Flags [0x0001].
...
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_print_server] (0x2000): Searching 192.168.50.161:389
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(objectClass=ipaUserOverride)(uid=administrator))][cn=Default Trust View,cn=views,cn=accounts,dc=testrealm,dc=test].
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 13
...
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT]
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_get_generic_op_finished] (0x0400): Search result: Success(0), no errmsg set
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_op_destructor] (0x2000): Operation 13 finished
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_get_ad_override_done] (0x4000): No override found with filter [(&(objectClass=ipaUserOverride)(uid=administrator))].
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sdap_id_op_destroy] (0x4000): releasing operation connection
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain ad.test is Active
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_srv_ad_acct_lookup_step] (0x0400): Looking up AD account
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain ad.test is Active
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [be_mark_dom_offline] (0x1000): Marking subdomain ad.test offline
^^^ Why?
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [be_mark_subdom_offline] (0x1000): Marking subdomain ad.test as inactive
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_srv_ad_acct_lookup_done] (0x0040): ipa_get_*_acct request failed: [22]: Invalid argument.
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_subdomain_account_done] (0x0040): ipa_get_*_acct request failed: [22]: Invalid argument.

Comment 3 Alexey Tikhonov 2019-02-26 12:47:53 UTC
(In reply to Alexey Tikhonov from comment #2)
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_srv_ad_acct_lookup_step] (0x0400): Looking up AD account
  --  this is `ipa_srv_ad_acct_lookup_step()` sending `ipa_get_ad_acct_send()`

> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain ad.test is Active
> (Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [be_mark_dom_offline] (0x1000): Marking subdomain ad.test offline
> ^^^ Why?
  -- this is``ipa_srv_ad_acct_lookup_done()` with error code and calling `be_mark_dom_offline()`
But I do not see any clue in logs why this lookup fails.

Comment 4 Alexey Tikhonov 2019-02-26 13:30:56 UTC
This seems to be a reason:

(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [ipa_getkeytab_send] (0x0400): Retrieving keytab for HOST161$@AD.TEST from host161.testrealm.test into /var/lib/sss/keytabs/ad.test.keytab3WVRKr using ccache /var/lib/sss/db/ccache_TESTREALM.TEST
...
(Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_getkeytab_timeout] (0x0020): Timeout reached for retrieving keytab from IPA server
(Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_getkeytab_recv] (0x2000): ipa-getkeytab status 110
(Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_server_trust_1way_kt_done] (0x0080): ipa_getkeytab_recv failed: 1432158267
(Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [check_file] (0x0400): lstat for [/var/lib/sss/keytabs/ad.test.keytab3WVRKr] failed: [2][No such file or directory].
(Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_check_keytab] (0x0400): Keytab /var/lib/sss/keytabs/ad.test.keytab3WVRKr is not present
(Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_server_trust_1way_kt_done] (0x0080): Trying to recover and use the previous keytab, if available
(Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [check_file] (0x0400): lstat for [/var/lib/sss/keytabs/ad.test.keytab] failed: [2][No such file or directory].
(Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_check_keytab] (0x0400): Keytab /var/lib/sss/keytabs/ad.test.keytab is not present
(Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_server_trust_1way_kt_done] (0x0040): Cannot use the old keytab: 2

Comment 5 Sergey Orlov 2019-02-26 15:45:47 UTC
Created attachment 1538884 [details]
strace logs for sssd restart

I attached strace logs for strace

How logs were collected:
sed -i -re "s|(ExecStart=)(.+)|\1strace -ff -o /tmp/sssd -s 255 -y \2|" /usr/lib/systemd/system/sssd.service
systemctl daemon-reload
echo vagrant | /usr/bin/ipa trust-add ad.test --admin Administrator --type=ad --external=true
systemctl restart sssd
=>  Job for sssd.service failed because a timeout was exceeded.

Comment 6 Sergey Orlov 2019-02-26 15:52:38 UTC
Another observation: if I disable selinux with setenforce 0 than issue us gone, id Administrator works after sssd restart
I am attaching relevant excerpt from journal and output of sealert with selinix in enforcing mode

Comment 7 Sergey Orlov 2019-02-26 15:57:35 UTC
Created attachment 1538896 [details]
excerpt from journal with selinux alert

Comment 8 Sergey Orlov 2019-02-26 15:58:47 UTC
Created attachment 1538897 [details]
Output of sealert -l

Comment 9 Sergey Orlov 2019-02-26 16:45:50 UTC
Created attachment 1538908 [details]
dirsrv logs

Comment 10 Sergey Orlov 2019-02-26 17:40:44 UTC
Created attachment 1538926 [details]
audit.log

Comment 11 Alexey Tikhonov 2019-02-26 18:17:37 UTC
(In reply to Sergey Orlov from comment #7)
> excerpt from journal with selinux alert

Details of this alert are:

> type=AVC msg=audit(1551194163.529:2026): avc:  denied  { dac_override } for  pid=25664 comm="ipa-getkeytab"...
> type=SYSCALL msg=audit(1551194163.529:2026): ... success=no exit=-13 ... comm="ipa-getkeytab" exe="/usr/sbin/ipa-getkeytab" ... SYSCALL=openat ...

$ date --date='@1551194163.529'
Tue Feb 26 16:16:03 CET 2019

Corresponding DS log:
[26/Feb/2019:10:16:03.496601797 -0500] conn=40 fd=123 slot=123 connection from 192.168.50.71 to 192.168.50.71
[26/Feb/2019:10:16:03.496801083 -0500] conn=40 op=0 EXT oid="1.3.6.1.4.1.1466.20037" name="start_tls_plugin"
[26/Feb/2019:10:16:03.496826125 -0500] conn=40 op=0 RESULT err=0 tag=120 nentries=0 etime=0.0000183680
[26/Feb/2019:10:16:03.501380035 -0500] conn=40 TLS1.3 128-bit AES-GCM
[26/Feb/2019:10:16:03.503727247 -0500] conn=40 op=1 BIND dn="" method=sasl version=3 mech=GSSAPI
[26/Feb/2019:10:16:03.518528264 -0500] conn=40 op=1 RESULT err=14 tag=97 nentries=0 etime=0.0021523558, SASL bind in progress
[26/Feb/2019:10:16:03.518961068 -0500] conn=40 op=2 BIND dn="" method=sasl version=3 mech=GSSAPI
[26/Feb/2019:10:16:03.519875302 -0500] conn=40 op=2 RESULT err=14 tag=97 nentries=0 etime=0.0000969496, SASL bind in progress
[26/Feb/2019:10:16:03.520128459 -0500] conn=40 op=3 BIND dn="" method=sasl version=3 mech=GSSAPI
[26/Feb/2019:10:16:03.520623373 -0500] conn=40 op=3 RESULT err=0 tag=97 nentries=0 etime=0.0000546083 dn="krbprincipalname=cifs/host71.testrealm.test,cn=services,cn=accounts,dc=testrealm,dc=test"
[26/Feb/2019:10:16:03.520898015 -0500] conn=40 op=4 EXT oid="2.16.840.1.113730.3.8.10.5" name="IPA Password Manager"
[26/Feb/2019:10:16:03.521557160 -0500] conn=40 op=4 RESULT err=0 tag=120 nentries=0 etime=0.0000735079
[26/Feb/2019:10:16:03.532329920 -0500] conn=40 op=5 UNBIND
[26/Feb/2019:10:16:03.532366403 -0500] conn=40 op=5 fd=123 closed - U1



I guess it is somehow relevant to this issue due to comment:

(In reply to Sergey Orlov from comment #6)
> Another observation: if I disable selinux with setenforce 0 than issue us gone

But I do not understand how.


Because all of this happens *before* sssd in restarted and ipa-getkeytab fails with timeout:

> Feb 26 10:16:06 host71.testrealm.test systemd[1]: Starting System Security Services Daemon...
...
> Feb 26 10:16:06 host71.testrealm.test ipa-getkeytab[25681]: GSSAPI client step 1

  --  this corresponds to strace log:
> sendto(5<socket:[106305]>, "<39>Feb 26 10:16:08 ipa-getkeytab: GSSAPI client step 1", 55, MSG_NOSIGNAL, NULL, 0) = 55

  --  and DS log:
> [26/Feb/2019:10:16:06.933077478 -0500] conn=42 fd=109 slot=109 connection from 192.168.50.71 to 192.168.50.71
> [26/Feb/2019:10:16:06.933299133 -0500] conn=42 op=0 EXT oid="1.3.6.1.4.1.1466.20037" name="start_tls_plugin"
> [26/Feb/2019:10:16:06.933331509 -0500] conn=42 op=0 RESULT err=0 tag=120 nentries=0 etime=0.0000210699
> [26/Feb/2019:10:16:06.938048380 -0500] conn=42 TLS1.3 128-bit AES-GCM
 ... note 5 secs timeout here
> [26/Feb/2019:10:16:11.942641852 -0500] conn=42 op=-1 fd=109 closed - B1



What ipa-getkeytab process is busy before being terminated is:
> connect(10<socket:[106316]>, {sa_family=AF_UNIX, sun_path="/var/lib/sss/pipes/nss"}, 110) = 0
> fstat(10<socket:[106316]>, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
> poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}])
> sendto(10<socket:[106316]>, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16
> poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}])
> sendto(10<socket:[106316]>, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
> poll([{fd=10<socket:[106316]>, events=POLLIN}], 1, 300000) = 1 ([{fd=10, revents=POLLIN}])
> read(10<socket:[106316]>, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
> poll([{fd=10<socket:[106316]>, events=POLLIN}], 1, 300000) = 1 ([{fd=10, revents=POLLIN}])
> read(10<socket:[106316]>, "\1\0\0\0", 4) = 4
> poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}])
> sendto(10<socket:[106316]>, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16
> poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}])
> sendto(10<socket:[106316]>, "\0\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
> poll([{fd=10<socket:[106316]>, events=POLLIN}], 1, 300000 <unfinished ...>) = ?
> +++ killed by SIGKILL +++


Why ipa-getkeytab calls NSS API and why is it blocked?

Comment 12 Alexey Tikhonov 2019-02-26 19:19:58 UTC
(In reply to Alexey Tikhonov from comment #11)

> Why ipa-getkeytab calls NSS API and why is it blocked?

I do not have sssd logs corresponding to strace log, but relevant part of responder log from previous run seems to be:

> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [accept_fd_handler] (0x0400): Client connected!
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1].
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1].
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [nss_getby_id] (0x0400): Input ID: 0
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_set_plugin] (0x2000): CR #1: Setting "User by ID" plugin
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #1: New request 'User by ID'
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sbus_requests_add] (0x4000): Chaining request: -:0:sssd.dataprovider.getDomains:/sssd:
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [get_client_cred] (0x4000): Client creds: euid[995] egid[992] pid[6682].
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [setup_client_idle_timer] (0x4000): Idle timer re-set for client [0x5626b1959b00][30]
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [accept_fd_handler] (0x0400): Client connected!
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1].
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1].
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [nss_getby_name] (0x0400): Input name: root
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_set_plugin] (0x2000): CR #2: Setting "User by name" plugin
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #2: New request 'User by name'
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_process_input] (0x0400): CR #2: Parsing input name [root]
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_parse_inp_send] (0x0200): Requesting info for [(null)] from [<ALL>]
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sbus_requests_add] (0x4000): Chaining request: -:0:sssd.dataprovider.getDomains:/sssd:
 ***** take a note of 5 secs timeout  --  `ipa-getkeytab` fails in the same time frame
> (Mon Feb 25 09:22:43 2019) [sssd[nss]] [sbus_dispatch] (0x4000): Dispatching.


May it be some kind of loop (subdomain -> getketab -> nss ->subdomain), as Jakub supposed?

Comment 13 Alexander Bokovoy 2019-02-26 19:31:19 UTC
(In reply to Alexey Tikhonov from comment #11)
> (In reply to Sergey Orlov from comment #7)
> > excerpt from journal with selinux alert
> 
> Details of this alert are:
> 
> > type=AVC msg=audit(1551194163.529:2026): avc:  denied  { dac_override } for  pid=25664 comm="ipa-getkeytab"...
> > type=SYSCALL msg=audit(1551194163.529:2026): ... success=no exit=-13 ... comm="ipa-getkeytab" exe="/usr/sbin/ipa-getkeytab" ... SYSCALL=openat ...

This should have been fixed with SELinux policy changes in selinux-policy-3.14.1-56 (bug 1668168) already.


> Because all of this happens *before* sssd in restarted and ipa-getkeytab
> fails with timeout:
> 
> > Feb 26 10:16:06 host71.testrealm.test systemd[1]: Starting System Security Services Daemon...
> ...
> > Feb 26 10:16:06 host71.testrealm.test ipa-getkeytab[25681]: GSSAPI client step 1
> 
>   --  this corresponds to strace log:
> > sendto(5<socket:[106305]>, "<39>Feb 26 10:16:08 ipa-getkeytab: GSSAPI client step 1", 55, MSG_NOSIGNAL, NULL, 0) = 55
> 
>   --  and DS log:
> > [26/Feb/2019:10:16:06.933077478 -0500] conn=42 fd=109 slot=109 connection from 192.168.50.71 to 192.168.50.71
> > [26/Feb/2019:10:16:06.933299133 -0500] conn=42 op=0 EXT oid="1.3.6.1.4.1.1466.20037" name="start_tls_plugin"
> > [26/Feb/2019:10:16:06.933331509 -0500] conn=42 op=0 RESULT err=0 tag=120 nentries=0 etime=0.0000210699
> > [26/Feb/2019:10:16:06.938048380 -0500] conn=42 TLS1.3 128-bit AES-GCM
>  ... note 5 secs timeout here
> > [26/Feb/2019:10:16:11.942641852 -0500] conn=42 op=-1 fd=109 closed - B1
> 
> 
> 
> What ipa-getkeytab process is busy before being terminated is:
> > connect(10<socket:[106316]>, {sa_family=AF_UNIX, sun_path="/var/lib/sss/pipes/nss"}, 110) = 0
> > fstat(10<socket:[106316]>, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
> > poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}])
> > sendto(10<socket:[106316]>, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16
> > poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}])
> > sendto(10<socket:[106316]>, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
> > poll([{fd=10<socket:[106316]>, events=POLLIN}], 1, 300000) = 1 ([{fd=10, revents=POLLIN}])
> > read(10<socket:[106316]>, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
> > poll([{fd=10<socket:[106316]>, events=POLLIN}], 1, 300000) = 1 ([{fd=10, revents=POLLIN}])
> > read(10<socket:[106316]>, "\1\0\0\0", 4) = 4
> > poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}])
> > sendto(10<socket:[106316]>, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16
> > poll([{fd=10<socket:[106316]>, events=POLLOUT}], 1, 300000) = 1 ([{fd=10, revents=POLLOUT}])
> > sendto(10<socket:[106316]>, "\0\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
> > poll([{fd=10<socket:[106316]>, events=POLLIN}], 1, 300000 <unfinished ...>) = ?
> > +++ killed by SIGKILL +++
> 
> 
> Why ipa-getkeytab calls NSS API and why is it blocked?
libkrb5 uses NSS API calls to resolve default ccache or default keytab location if its name template contains references to user names or uids.

Comment 14 Alexey Tikhonov 2019-02-26 20:05:20 UTC
(In reply to Alexey Tikhonov from comment #11)

> I guess it is somehow relevant to this issue due to comment:
> > Another observation: if I disable selinux with setenforce 0 than issue us gone
> But I do not understand how.


As Alexander explained:
```
this is our trust fetch domains calling oddjobd helper and then the helper is retrieving TDO creds to talk to AD DC
this is should be fixed by selinux update
so logs Sergey provided aren't up to date
we need to update the vm and then re-run the test
because then you'll have a keytab saved by oddjobd helper already
and sssd will not fetch the keytab again
```

So when selinux is disabled, keytab is pre-fetched by oddjobd helper and sssd doesn't fetch it. This is how disabling selinux resolves issue.


But question "why sssd fails to fetch keytab" is open.

As Alexander pointed out:
```
you may want to add one more CPU to VM and see if that helps
because this might be similar to one contention lock issue we saw in past between LDAP and Kerberos
```

Comment 15 Alexey Tikhonov 2019-02-27 11:28:59 UTC
(In reply to Alexey Tikhonov from comment #12)
> > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_set_plugin] (0x2000): CR #1: Setting "User by ID" plugin
> > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #1: New request 'User by ID'
> > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sbus_requests_add] (0x4000): Chaining request: -:0:sssd.dataprovider.getDomains:/sssd:
...
> > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_set_plugin] (0x2000): CR #2: Setting "User by name" plugin
> > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #2: New request 'User by name'
> > (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sbus_requests_add] (0x4000): Chaining request: -:0:sssd.dataprovider.getDomains:/sssd:
>  ***** take a note of 5 secs timeout  --  `ipa-getkeytab` fails in the same time frame
> > (Mon Feb 25 09:22:43 2019) [sssd[nss]] [sbus_dispatch] (0x4000): Dispatching.
> 
> May it be some kind of loop (subdomain -> getketab -> nss ->subdomain), as Jakub supposed?


As Pavel explained:
"Chaining request" means there is already an outstanding request and when it is finished the new request will reuse its result without issuing another dbus call.
The cache req plugins in this log are different, but both of them first run this getDomains() method to refresh list of domains and these methods  are the  same.
And this log actually means there is *already* outstanding getDomains() happened before this part of log.

So yes, it looks like loop as Jakub told.

Comment 16 Jakub Hrozek 2019-02-27 11:52:02 UTC
I think there are two issues. Looking at comment #12:

> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [nss_getby_id] (0x0400): Input ID: 0
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_set_plugin] (0x2000): CR #1: Setting "User by ID" plugin
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #1: New request 'User by ID'
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [sbus_requests_add] (0x4000): Chaining request: -:0:sssd.dataprovider.getDomains:/sssd:
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [get_client_cred] (0x4000): Client creds: euid[995] egid[992] pid[6682].
> (Mon Feb 25 09:22:38 2019) [sssd[nss]] [setup_client_idle_timer] (0x4000): Idle timer re-set for client [0x5626b1959b00][30]

Here we see getpwuid(0) triggering getDomains. This is wrong, the ID 0 and name root should just return ENOENT before issuing getDomains. I think this one should be easy to fix.

But separately, Kaleem also pinged me about this issue and I think in his case I see even a different problem:

Here we see that ipa-getkeytab is starting. This is during the subdomains request:
(Wed Feb 27 06:14:44 2019) [sssd[be[trustcli16.test]]] [ipa_getkeytab_send] (0x0400): Retrieving keytab for TRUSTCLI16$@DTREE.QE from host-8-245-208.trustcli
16.test into /var/lib/sss/keytabs/dtree.qe.keytab2FjVce using ccache /var/lib/sss/db/ccache_TRUSTCLI16.TEST

At the same time, there is a request for user ipaapi in the NSS log (getpwnam(ipaapi):
(Wed Feb 27 06:14:44 2019) [sssd[nss]] [cache_req_set_plugin] (0x2000): CR #0: Setting "User by name" plugin
(Wed Feb 27 06:14:44 2019) [sssd[nss]] [cache_req_send] (0x0400): CR #0: New request 'User by name'
(Wed Feb 27 06:14:44 2019) [sssd[nss]] [cache_req_process_input] (0x0400): CR #0: Parsing input name [ipaapi]
(Wed Feb 27 06:14:44 2019) [sssd[nss]] [sss_parse_inp_send] (0x0200): Requesting info for [(null)] from [<ALL>]
(Wed Feb 27 06:14:44 2019) [sssd[nss]] [sss_domain_get_state] (0x1000): Domain trustcli16.test is Active
(Wed Feb 27 06:14:44 2019) [sssd[nss]] [sbus_requests_add] (0x4000): Chaining request: -:0:sssd.dataprovider.getDomains:/sssd:

And this request chains to the getDomains which triggered the NSS call in the first place.

I'm not sure how to solve the second issue. Since the ipaapi user in this case exists, in theory it might help to make sure lookups agains the files domain do not trigger getDomains. This would help for entries that do exist. But I'm worried that if there was a request for a non-existent user for one reason or another, then this would go to the other domains and trigger the getDomains as well.

I actually wonder if it makes sense to wait for chained getDomains at all, at least for non-qualified or numerical inputs. Because then any lookup performed by any part of the chain above sssd's subdomains provider (ipa-getkeytab, ipa, kdc itself) would just chain itself to getDomains and never finish.

I would welcome other opinions.

Comment 17 Alexey Tikhonov 2019-02-27 12:20:27 UTC
From my point of view, it doesn't matter what was the trigger of `ipa-getkeytab`.
If `ipa-getkeytab` leads to getDomains() call (that can't be completed before `ipa-getkeytab` is finished) then it is already self-locked. Doesn't matter if it was first or chained getDomains() request.
And if this assumption is correct than addition of cpu cores wont help.

Comment 18 Sergey Orlov 2019-02-27 14:21:13 UTC
I executed scenario with two CPUs -- it does not change result, id command still fails.
 
After updating selinux-policy to 3.14.1-61 the issue is gone, id command succeeds after restarting the sssd.

Comment 19 Jakub Hrozek 2019-02-27 14:47:21 UTC
(In reply to Sergey Orlov from comment #18)
> I executed scenario with two CPUs -- it does not change result, id command
> still fails.
>  
> After updating selinux-policy to 3.14.1-61 the issue is gone, id command
> succeeds after restarting the sssd.

Because the oddjob helper fetches the keytab, then sssd's fetch still fails, but if sssd's ipa-getkeytab fails, then sssd checks if the keytab already exists and has the required principals and tries to use them.

So this helps mask the sssd issue.

Comment 20 Jakub Hrozek 2019-02-27 16:14:39 UTC
Upstream ticket:
https://pagure.io/SSSD/sssd/issue/3971

Comment 21 Jakub Hrozek 2019-02-27 16:15:08 UTC
Upstream ticket:
https://pagure.io/SSSD/sssd/issue/3972

Comment 22 Sergey Orlov 2019-03-05 16:23:07 UTC
In conversation with Alexander we found out that when ipa trust-add is called with --oneway=True it does not prefetch AD keytab. 
Related code is https://pagure.io/freeipa/blob/master/f/ipaserver/plugins/trust.py#_761

Alexander: "it is actually intended behavior right now and sssd should not depend on the keytab being produced by the helper"

Comment 23 Alexander Bokovoy 2019-03-05 16:48:04 UTC
It is the other way around -- we don't call to oddjobd helper in a case of two-way trust (as we have cross-realm TGT there). We'll get this fixed eventually and all situations will be handled with a prefetch but right now this means for two-way trust you can reproduce SSSD issue.

Comment 24 Jakub Hrozek 2019-03-05 20:28:59 UTC
(In reply to Alexander Bokovoy from comment #23)
> It is the other way around -- we don't call to oddjobd helper in a case of
> two-way trust (as we have cross-realm TGT there). We'll get this fixed
> eventually and all situations will be handled with a prefetch but right now
> this means for two-way trust you can reproduce SSSD issue.

OK, this sort of raises the severity of the issue.

btw do we not test the two-way trust? I really wonder why are we finding this bug now..

Comment 25 Sergey Orlov 2019-03-06 09:21:53 UTC
Somehow I mixed one-way and two-way trusts...
So, for future verification: this command

ipa trust-add ipaad2012r2.test --admin Administrator --password --two-way=True

does not prefetch keytab and can be used to check if subsequent calls to id/getent cause sssd to fetch keytab
Also for verification we can manually delete keytab file from /var/lib/sssd/keytabs after establishing one-way trust.

Comment 30 Jakub Hrozek 2019-04-02 21:09:24 UTC
 * master: d409c10
 * sssd-1-16: b927dc7

Comment 34 anuja 2019-08-19 07:51:13 UTC
Verified Using Version:
ipa-server-4.8.0-8.module+el8.1.0+3977+ec23ef34.x86_64
sssd-ipa-2.2.0-11.el8.x86_64

This needs sanity check only :
Adding test-run of trust-test suite.
----------------------------------------------------------

 :: [ 14:37:17 ] :: [  BEGIN   ] :: Running 'echo Secret123 | ipa trust-add ipaad2k16cin.test --admin Administrator                 --range-type=ipa-ad-trust --password --two-way=True'
*** Current Time: Fri Aug 16 14:37:20 2019 Localwatchdog at: Sat Aug 17 13:51:19 2019
 ----------------------------------------------------------
 Added Active Directory trust for realm "ipaad2k16cin.test"
 ----------------------------------------------------------
  Realm name: ipaad2k16cin.test
  Domain NetBIOS name: IPAAD2K16CIN
  Domain Security Identifier: S-1-5-21-2842256260-195550463-1751006347
  Trust direction: Two-way trust
  Trust type: Active Directory domain
  Trust status: Established and verified
:: [ 14:37:22 ] :: [   PASS   ] :: Command 'echo Secret123 | ipa trust-add ipaad2k16cin.test --admin Administrator                 --range-type=ipa-ad-trust --password --two-way=True' (Expected 0, got 0)
:: [ 14:37:22 ] :: [  BEGIN   ] :: Running 'systemctl stop sssd; rm -rf /var/lib/sss/{db,mc}/*; systemctl start sssd; sleep 60'
 *** Current Time: Fri Aug 16 14:37:23 2019 Localwatchdog at: Sat Aug 17 13:51:23 2019
 *** Current Time: Fri Aug 16 14:38:20 2019 Localwatchdog at: Sat Aug 17 13:51:19 2019
:: [ 14:38:22 ] :: [   PASS   ] :: Command 'systemctl stop sssd; rm -rf /var/lib/sss/{db,mc}/*; systemctl start sssd; sleep 60' (Expected 0, got 0)
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   RHEL7 ipa adtrust install fails if netbios hostname longer than 15 characters, bz1030517
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
 :: [ 14:38:22 ] :: [  BEGIN   ] :: Running 'sleep 60'
*** Current Time: Fri Aug 16 14:38:23 2019 Localwatchdog at: Sat Aug 17 13:51:23 2019
 *** Current Time: Fri Aug 16 14:39:20 2019 Localwatchdog at: Sat Aug 17 13:51:19 2019
:: [ 14:39:22 ] :: [   PASS   ] :: Command 'sleep 60' (Expected 0, got 0)
 :: [ 14:39:22 ] :: [   LOG    ] :: Hostname longer than 15 chars, checking for bug
:: [ 14:39:23 ] :: [  BEGIN   ] :: Running 'id Administrator'
2019-08-16T18:39:23+0000 [ci-vm-10-0-153-183.h] uid=879000500(administrator) gid=879000500(administrator) groups=879000500(administrator),879000518(schema admins),879000519(enterprise admins),879000512(domain admins),879000513(domain users),879000520(group policy creator owners)

Comment 37 errata-xmlrpc 2019-11-05 22:34:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3651


Note You need to log in before you can comment on or make changes to this bug.