Bug 1474711 - Querying the AD domain for external domain's ID can mark the AD domain offline
Querying the AD domain for external domain's ID can mark the AD domain offline
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sssd (Show other bugs)
7.3
All All
urgent Severity urgent
: rc
: ---
Assigned To: SSSD Maintainers
ipa-qe
: ZStream
Depends On:
Blocks: 1478252
  Show dependency treegraph
 
Reported: 2017-07-25 04:54 EDT by Jakub Hrozek
Modified: 2018-04-10 13:15 EDT (History)
13 users (show)

See Also:
Fixed In Version: sssd-1.16.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1478252 (view as bug list)
Environment:
Last Closed: 2018-04-10 13:13:24 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:0929 None None None 2018-04-10 13:15 EDT

  None (edit)
Description Jakub Hrozek 2017-07-25 04:54:15 EDT
This bug is created as a clone of upstream ticket:
https://pagure.io/SSSD/sssd/issue/3452

I found the following in the logs


    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [ipa_srv_ad_acct_lookup_step] (0x0400): Looking up AD account
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [sss_domain_get_state] (0x1000): Domain ipaf26.devel is Active
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [sss_domain_get_state] (0x1000): Domain ad.devel is Active
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [ad_account_can_shortcut] (0x0080): Mapping ID [733600006] to SID failed: [IDMAP unknown error code]
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [sss_domain_get_state] (0x1000): Domain ad.devel is Active
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [groups_get_send] (0x0080): Mapping ID [733600006] to SID failed: [IDMAP unknown error code]
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [ipa_srv_ad_acct_lookup_done] (0x0080): Sudomain lookup failed, will try to reset sudomain..
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [ipa_server_trusted_dom_setup_send] (0x1000): Trust direction of subdom ad.devel from forest ad.devel is: two-way trust
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [ipa_srv_ad_acct_retried] (0x0400): Sudomain re-set, will retry lookup
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [be_fo_reset_svc] (0x1000): Resetting all servers in service ad.devel
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [be_fo_reset_svc] (0x0080): Cannot retrieve service [ad.devel]
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [ipa_srv_ad_acct_lookup_step] (0x0400): Looking up AD account
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [sss_domain_get_state] (0x1000): Domain ipaf26.devel is Active
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [sss_domain_get_state] (0x1000): Domain ad.devel is Active
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [ad_account_can_shortcut] (0x0080): Mapping ID [733600006] to SID failed: [IDMAP unknown error code]
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [sss_domain_get_state] (0x1000): Domain ad.devel is Active
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [groups_get_send] (0x0080): Mapping ID [733600006] to SID failed: [IDMAP unknown error code]
    (Thu Jul 20 11:58:32 2017) [sssd[be[ipaf26.devel]]] [be_mark_dom_offline] (0x1000): Marking subdomain ad.devel offline

The imap error in ad_account_can_shortcut() does not cause the domain to be skipped but the idmap call is run again during the lookup which then cause the domain to go offline.

ad_account_can_shortcut() should better return 'true' in this case to skip the domain.
Comment 2 Jakub Hrozek 2017-07-25 04:55:24 EDT
* master: a406b52a0d20e0ec502f52d63dee293636d1443a
Comment 3 Jakub Hrozek 2017-07-27 05:36:46 EDT
I'd like to propose this bug for 7.4.1, because it would be really inconvenient for admins and impossible to debug for support engineers to find out why are their AD domains going offline.

The steps to reproduce are:
1) configure IPA-AD trust with POSIX attributes
2) revert the domain order lookup so that the AD domains are checked first
3) run 'id' for the IPA user

(maybe it would be even simpler to check if just running getent passwd -s sss $some_random_id also takes the provider offline, but I didn't check that. If yes, the bug would be just more severe)
Comment 8 Nikhil Dehadrai 2017-12-06 05:22:53 EST
IPA VERSION: ipa-server-4.5.4-6.el7.x86_64
SSSD VERSION: sssd-1.16.0-9.el7.x86_64
IPA CLIENT VERSION : ipa-client-4.5.4-6.el7.x86_64


Verified the bug on the basis of following observations:


ON IPA-MASTER:
-----------------
[root@auto-hv-01-guest10 ~]# rpm -q ipa-server sssd ipa-client
ipa-server-4.5.4-6.el7.x86_64
sssd-1.16.0-9.el7.x86_64
ipa-client-4.5.4-6.el7.x86_64

[root@auto-hv-01-guest10 ~]# echo <password> | ipa trust-add pne.qe --admin Administrator --range-type=ipa-ad-trust-posix --password --two-way=True
-----------------------------------------------
Added Active Directory trust for realm "pne.qe"
-----------------------------------------------
  Realm name: pne.qe
  Domain NetBIOS name: PNE
  Domain Security Identifier: S-1-5-21-2202318585-426110948-4011710778
  Trust direction: Two-way trust
  Trust type: Active Directory domain
  Trust status: Established and verified


[root@auto-hv-01-guest10 ~]# ipa idrange-find
----------------
2 ranges matched
----------------
  Range name: ND0612171.TEST_id_range
  First Posix ID of the range: 1837800000
  Number of IDs in the range: 200000
  First RID of the corresponding RID range: 1000
  First RID of the secondary RID range: 100000000
  Range type: local domain range

  Range name: PNE.QE_id_range
  First Posix ID of the range: 1261600000
  Number of IDs in the range: 200000
  Domain SID of the trusted domain: S-1-5-21-2202318585-426110948-4011710778
  Range type: Active Directory trust range with POSIX attributes
----------------------------
Number of entries returned 2
----------------------------


[root@auto-hv-01-guest10 ~]# id aduser1@pne.qe
uid=10000(aduser1@pne.qe) gid=454547(adgroup1@pne.qe) groups=454547(adgroup1@pne.qe),454548(adgroup2@pne.qe)


[root@auto-hv-01-guest10 ~]# ipa config-mod --domain-resolution-order='pne.qe:nd0612171.test'
  Maximum username length: 32
  Home directory base: /home
  Default shell: /bin/sh
  Default users group: ipausers
  Default e-mail domain: nd0612171.test
  Search time limit: 2
  Search size limit: 100
  User search fields: uid,givenname,sn,telephonenumber,ou,title
  Group search fields: cn,description
  Enable migration mode: FALSE
  Certificate Subject base: O=ND0612171.TEST
  Password Expiration Notification (days): 4
  Password plugin features: AllowNThash, KDC:Disable Last Success
  SELinux user map order: guest_u:s0$xguest_u:s0$user_u:s0$staff_u:s0-s0:c0.c1023$unconfined_u:s0-s0:c0.c1023
  Default SELinux user: unconfined_u:s0-s0:c0.c1023
  Default PAC types: MS-PAC, nfs:NONE
  IPA masters: auto-hv-01-guest10.nd0612171.test
  IPA CA servers: auto-hv-01-guest10.nd0612171.test
  IPA NTP servers: auto-hv-01-guest10.nd0612171.test
  IPA CA renewal master: auto-hv-01-guest10.nd0612171.test
  IPA master capable of PKINIT: auto-hv-01-guest10.nd0612171.test
  Domain resolution order: pne.qe:nd0612171.test

[root@auto-hv-01-guest10 ~]# id admin
uid=1837800000(admin) gid=1837800000(admins) groups=1837800000(admins)


ON IPA-CLIENT:
----------------
[root@auto-hv-01-guest06 ~]# getent passwd -s sss 120000
aduser1@pne.qe:*:120000:454547:aduser1:/home/aduser1:/bin/sh

[root@auto-hv-01-guest06 ~]# getent passwd -s sss aduser1@pne.qe
aduser1@pne.qe:*:120000:454547:aduser1:/home/aduser1:/bin/sh


[root@auto-hv-01-guest06 ~]# cat /var/log/sssd/sssd.log | grep -i "OFFLINE"

[root@auto-hv-01-guest06 ~]# cat /var/log/sssd/sssd_nd0612171.test.log | grep -i "OFFLINE"
(Wed Dec  6 04:12:03 2017) [sssd[be[nd0612171.test]]] [dp_get_options] (0x0400): Option ldap_offline_timeout has value 60
(Wed Dec  6 04:12:03 2017) [sssd[be[nd0612171.test]]] [dp_get_options] (0x0400): Option krb5_store_password_if_offline is TRUE

[root@auto-hv-01-guest06 ~]# cat /var/log/sssd/sssd.log| grep "ad_account_can_shortcut"

[root@auto-hv-01-guest06 ~]# cat /var/log/sssd/sssd_nd0612171.test.log| grep "ad_account_can_shortcut"

Thus on the basis of above observations marking status of bug to "VERIFIED"
Comment 9 Jatin Nansi 2018-01-31 01:20:16 EST
Hello Jakub, sssd engineering team,

I have a customer encountering this bug, can I get a test package to verify fix?

Jatin
Comment 10 Jakub Hrozek 2018-01-31 09:03:17 EST
(In reply to Jatin Nansi from comment #9)
> Hello Jakub, sssd engineering team,
> 
> I have a customer encountering this bug, can I get a test package to verify
> fix?
> 
> Jatin

Yes, the fix is backportable. Can you give me the exact version to create the backport atop?
Comment 15 errata-xmlrpc 2018-04-10 13:13:24 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0929

Note You need to log in before you can comment on or make changes to this bug.