RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1659498 - Re-setting the trusted AD domain fails due to wrong subdomain service name being used
Summary: Re-setting the trusted AD domain fails due to wrong subdomain service name be...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: SSSD Maintainers
QA Contact: Kaleem
URL:
Whiteboard:
Depends On:
Blocks: 1652753
TreeView+ depends on / blocked
 
Reported: 2018-12-14 14:31 UTC by Sergey Orlov
Modified: 2020-05-02 19:05 UTC (History)
10 users (show)

Fixed In Version: sssd-2.0.0-41.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-14 02:05:40 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
reproducer script, sssd logs, sssd configs (215.87 KB, application/zip)
2018-12-14 14:31 UTC, Sergey Orlov
no flags Details
ssd log for case1 (165.56 KB, application/zip)
2019-02-25 12:52 UTC, Sergey Orlov
no flags Details
ssd logs for case1 (216.29 KB, application/zip)
2019-02-25 12:53 UTC, Sergey Orlov
no flags Details
ssd logs for case3 (1.20 MB, application/zip)
2019-02-25 12:53 UTC, Sergey Orlov
no flags Details
case 4 sssd logs (258.72 KB, application/zip)
2019-02-25 14:44 UTC, Sergey Orlov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 4896 0 None closed Re-setting the trusted AD domain fails due to wrong subdomain service name being used 2021-02-11 01:43:08 UTC

Description Sergey Orlov 2018-12-14 14:31:33 UTC
Created attachment 1514383 [details]
reproducer script, sssd logs, sssd configs

Description of problem:
After establishing trust between IPA server and AD we need to few for several minutes before we can get result of "id user@ADOMAIN" command.

Version-Release number of selected component (if applicable):
sssd-common-2.0.0-23.el8.x86_64


How reproducible:
Stable

Steps to Reproduce:
1. have IPA server and AD server configured and prepared for establishing trust
2. ipa trust-add windows.domain --admin Administrator --type=ad --external=true
3. run "id administrator"
4. repeatedly execute same command for 1-2 minutes

Actual results:
for about one or two minutes we get
"id: ‘administrator’: no such user"
and then finally we get user id and group membership

Expected results:
user id and group membership displayed on first invocation of "id"


Additional info:
I attach:
 * reproducer script
 * output of script run
 * log files of sssd collected during this run
 * sssd config files


Comment by Jakub Hrozek:

For reasons I forgot we renamed all subdomain services to start with the
"sd_" prefix:
    (Thu Dec 13 05:29:20 2018) [sssd[be[testrelm.test]]] [fo_new_service] (0x0400): Creating new service 'sd_ipaad2016.test'

But what happens in your environment is that the first lookups fails:
    (Thu Dec 13 05:29:30 2018) [sssd[be[testrelm.test]]] [sdap_kinit_done] (0x0100): Could not get TGT: 14 [Bad address]
    (Thu Dec 13 05:29:30 2018) [sssd[be[testrelm.test]]] [sdap_cli_kinit_done] (0x0400): Cannot get a TGT: ret [1432158226](Authentication Failed)

And then when we want to re-set the domain status, we use the name
without the "sd_" prefix apparently:
    (Thu Dec 13 05:29:31 2018) [sssd[be[testrelm.test]]] [ipa_srv_ad_acct_retried] (0x0400): Subdomain re-set, will retry lookup
    (Thu Dec 13 05:29:31 2018) [sssd[be[testrelm.test]]] [be_fo_reset_svc] (0x1000): Resetting all servers in service ipaad2016.test
    (Thu Dec 13 05:29:31 2018) [sssd[be[testrelm.test]]] [be_fo_reset_svc] (0x0080): Cannot retrieve service [ipaad2016.test]

So a domain is not re-set until an internal timeout expires.

Comment 1 Jakub Hrozek 2018-12-17 10:20:58 UTC
While I'm not going to dispute the Regression keyword in general, I think it is fair to add that the issue was added in 1.14, so RHEL-7.4. The fact that nobody hit the issue in 7.5 or 7.6 makes the issue a little less concerning to me.

Comment 2 Jakub Hrozek 2018-12-20 14:22:04 UTC
Upstream ticket:
https://pagure.io/SSSD/sssd/issue/3911

Comment 3 Jakub Hrozek 2018-12-20 21:13:22 UTC
Sergey, can you qa_ack the bug? There is a PR on review and I passed a test build to Kaleem and Anuja.

Comment 7 Jakub Hrozek 2019-01-28 20:45:06 UTC
* master:
 * aaaa9a3e836e7b7af1ff0fc5058ddddfeef120a8
 * b3285f9f8a5eac3e4e70ed3bd6b74c15ad806e9e

Comment 9 Sergey Orlov 2019-02-25 12:29:45 UTC
Still not working as expected:

Case 1:
 * /usr/bin/ipa trust-add ad.test --admin Administrator --type=ad --external=true
 * id Administrator 
 * id Vagrant 
 => both users found immediately

Case 2:
 * /usr/bin/ipa trust-add ad.test --admin Administrator --type=ad --external=true
 * reset_sss_cache
 * repeat "id Administrator" for 200 seconds
  => id: ‘Administrator’: no such user, 

Case 3:
 * /usr/bin/ipa trust-add ad.test --admin Administrator --type=ad --external=true
 * reset sssd cache and restart sssd
 * repeat "reset_sss_cache; id Administrator" for 200 seconds
  => id: ‘Administrator’: no such user, 


For reference: 
reset_sss_cache() {
    systemctl stop sssd
    rm -f /var/lib/sss/{db,mc}/*
    find /var/lib/sss/db -name '*.ldb' | xargs rm -fv
    systemctl start sssd
    echo Secret123 | kinit admin > /dev/null
}


All tests executed on clean VMs

Comment 10 Jakub Hrozek 2019-02-25 12:36:25 UTC
That's interesting because previously you said the issue was resolved by a test build with this fix.

Anyway, I can't say more without seeing logs.

Comment 11 Sergey Orlov 2019-02-25 12:52:41 UTC
Created attachment 1538423 [details]
ssd log for case1

Comment 12 Sergey Orlov 2019-02-25 12:53:07 UTC
Created attachment 1538424 [details]
ssd logs for case1

Comment 13 Sergey Orlov 2019-02-25 12:53:33 UTC
Created attachment 1538425 [details]
ssd logs for case3

Comment 14 Sergey Orlov 2019-02-25 12:57:01 UTC
I think I checked only case1  in that test.


I attached logs for three cases. (sorry, I mistyped in comment for case2, but filename is correct)
sssd version: sssd-common-2.0.0-43.el8.x86_64

Comment 15 Jakub Hrozek 2019-02-25 13:41:36 UTC
This seems to be a different issue. In the second and third case, the is not fetched during the first-time setup:
(Mon Feb 25 04:38:00 2019) [sssd[be[testrealm.test]]] [ipa_getkeytab_timeout] (0x0020): Timeout reached for retrieving keytab from IPA server
(Mon Feb 25 04:38:00 2019) [sssd[be[testrealm.test]]] [ipa_getkeytab_recv] (0x2000): ipa-getkeytab status 110

Then the lookups always fail immediately:
(Mon Feb 25 04:38:02 2019) [sssd[be[testrealm.test]]] [ipa_srv_ad_acct_lookup_step] (0x0400): Looking up AD account
(Mon Feb 25 04:38:02 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain testrealm.test is Active
(Mon Feb 25 04:38:02 2019) [sssd[be[testrealm.test]]] [sss_domain_get_state] (0x1000): Domain ad.test is Inactive
(Mon Feb 25 04:38:02 2019) [sssd[be[testrealm.test]]] [be_mark_dom_offline] (0x1000): Marking subdomain ad.test offline
(Mon Feb 25 04:38:02 2019) [sssd[be[testrealm.test]]] [ldb] (0x4000): Added timed event "ltdb_callback": 0x55f91fc23570

(Mon Feb 25 04:38:02 2019) [sssd[be[testrealm.test]]] [ldb] (0x4000): Added timed event "ltdb_timeout": 0x55f91fc0adc0

(Mon Feb 25 04:38:02 2019) [sssd[be[testrealm.test]]] [ldb] (0x4000): Running timer event 0x55f91fc23570 "ltdb_callback"

(Mon Feb 25 04:38:02 2019) [sssd[be[testrealm.test]]] [ldb] (0x4000): Destroying timer event 0x55f91fc0adc0 "ltdb_timeout"

(Mon Feb 25 04:38:02 2019) [sssd[be[testrealm.test]]] [ldb] (0x4000): Destroying timer event 0x55f91fc23570 "ltdb_callback"

(Mon Feb 25 04:38:02 2019) [sssd[be[testrealm.test]]] [be_mark_subdom_offline] (0x4000): Subdomain already inactive
(Mon Feb 25 04:38:02 2019) [sssd[be[testrealm.test]]] [ipa_srv_ad_acct_lookup_done] (0x0040): ipa_get_*_acct request failed: [22]: Invalid argument.
(Mon Feb 25 04:38:02 2019) [sssd[be[testrealm.test]]] [ipa_subdomain_account_done] (0x0040): ipa_get_*_acct request failed: [22]: Invalid argument

Looking at when does ipa_srv_ad_acct_lookup_step() return EINVAL, it's when the subdomain does not have the ad_id_ctx created at all. There are two issues:
1) the keytab is not re-fetched on this kind of an error
2) the ad_id_ctx is not created even if the keytab was then fetched sucessfully

Note that this issue only happens in the first keytab retrieval fails and thus the ad_id_ctx is not created at all. This is different from the bug originally reported which was about not retrying a domain that was first marked as offline after being created.

Comment 16 Jakub Hrozek 2019-02-25 13:42:17 UTC
btw making sure that the ipa-getkeytab invocation does not time out would fix all these issues, at least on the IPA masters. Do you know why would the process time out?

Comment 17 Sergey Orlov 2019-02-25 14:43:58 UTC
I added one more test case, where I simply restart sssd instead of resetting cache:

Case 4:
 * /usr/bin/ipa trust-add ad.test --admin Administrator --type=ad --external=true
 * systemctl restart sssd
 * repeat "id Administrator" for 200 seconds
  => id: ‘Administrator’: no such user,

Comment 18 Sergey Orlov 2019-02-25 14:44:28 UTC
Created attachment 1538455 [details]
case 4 sssd logs

Comment 19 Jakub Hrozek 2019-02-25 15:18:07 UTC
(In reply to Sergey Orlov from comment #17)
> I added one more test case, where I simply restart sssd instead of resetting
> cache:
> 
> Case 4:
>  * /usr/bin/ipa trust-add ad.test --admin Administrator --type=ad
> --external=true
>  * systemctl restart sssd
>  * repeat "id Administrator" for 200 seconds
>   => id: ‘Administrator’: no such user,

Still the same issue, it takes more than 5 seconds, for ipa-getkeytab to finish:
(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [ipa_getkeytab_send] (0x0400): Retrieving keytab for HOST161$@AD.TEST from host161.testrealm.test into /var/lib/sss/keytabs/ad.test.keytab3WVRKr using ccache /var/lib/sss/db/ccache_T>
(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [child_handler_setup] (0x2000): Setting up signal handler up for pid [6693]
(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [child_handler_setup] (0x2000): Signal handler set up for pid [6693]
(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [sdap_process_result] (0x2000): Trace: sh[0x564b6ea96bf0], connected[1], ops[(nil)], ldap[0x564b6ea86060]
(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [sdap_process_result] (0x2000): Trace: end of ldap_result list
(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [be_ptask_online_cb] (0x0400): Back end is online
(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [be_ptask_enable] (0x0080): Task [Subdomains Refresh]: already enabled
(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [be_ptask_online_cb] (0x0400): Back end is online
(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [be_ptask_enable] (0x0080): Task [SUDO Smart Refresh]: already enabled
(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [be_ptask_online_cb] (0x0400): Back end is online
(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [be_ptask_enable] (0x0080): Task [SUDO Full Refresh]: already enabled
(Mon Feb 25 09:22:38 2019) [sssd[be[testrealm.test]]] [delayed_online_authentication_callback] (0x0200): Backend is online, starting delayed online authentication.
(Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_getkeytab_timeout] (0x0020): Timeout reached for retrieving keytab from IPA server
(Mon Feb 25 09:22:43 2019) [sssd[be[testrealm.test]]] [ipa_getkeytab_recv] (0x2000): ipa-getkeytab status 110

Comment 20 Sergey Orlov 2019-02-25 16:39:15 UTC
Fix in this BZ is verified by Case 1 in https://bugzilla.redhat.com/show_bug.cgi?id=1659498#c9

New discovered issuued will be handled in https://bugzilla.redhat.com/show_bug.cgi?id=1681279


Note You need to log in before you can comment on or make changes to this bug.