Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2039892

Summary: 2.6.2 regression: Daemon crashes when resolving AD user names
Product: Red Hat Enterprise Linux 8 Reporter: Martin Pitt <mpitt>
Component: sssdAssignee: Iker Pedrosa <ipedrosa>
Status: CLOSED ERRATA QA Contact: Dan Lavu <dlavu>
Severity: high Docs Contact:
Priority: high    
Version: 8.6CC: aborah, atikhono, grajaiya, jhrozek, jvavra, lslebodn, mzidek, pbrezina, sbose, sgoveas, tscherf
Target Milestone: rcKeywords: Regression, Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sync-to-jira
Fixed In Version: sssd-2.6.2-3.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-10 15:26:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1859315    
Deadline: 2022-01-31   
Attachments:
Description Flags
/var/log/sssd/ none

Description Martin Pitt 2022-01-12 16:49:11 UTC
Created attachment 1850416 [details]
/var/log/sssd/

Description of problem: With the recent update of sssd 2.6.2 in RHEL 8.6, Active Directory user names cannot be resolved any more. Instead, sssd goes offline and crashes.

We noticed that in our RHEL 8.6 cockpit image refresh (https://github.com/cockpit-project/bots/pull/2794), and I can reproduce it with 

    dnf update -y sssd-nfs-idmap libsss_autofs libsss_sudo python3-sss-murmur

on our current image (from December 25) with sssd 2.6.1, where things worked.

Version-Release number of selected component (if applicable):

sssd-2.6.2-2.el8.x86_64

How reproducible: Always


Steps to Reproduce:
1. Update sssd to 2.6.2 or start current RHEL 8.6 nightly
2. Join an Active Directory domain:
   realm join -vU Administrator cockpit.lan
3. Try to resolve admin user:
   id Administrator
   su - Administrator

Actual results: Fails with "user Administrator does not exist"

sssd is offline:

# sssctl domain-status cockpit.lan
Online status: Offline

Active servers:
AD Global Catalog: not connected
AD Domain Controller: f0.cockpit.lan

Discovered AD Global Catalog servers:
None so far.
Discovered AD Domain Controller servers:
- f0.cockpit.lan

/var/log/sssd/sssd_cockpit.lan.log has an interesting log:


```
2022-01-12 10:22:33): [be[cockpit.lan]] [be_ptask_done] (0x0040): Task [Subdomains Refresh]: failed with [1432158212]: SSSD is offline
   *  ... skipping repetitive backtrace ...
(2022-01-12 10:22:33): [be[cockpit.lan]] [ad_subdomains_refresh_connect_done] (0x0020): [RID#1] Unable to connect to LDAP [11]: Resource temporarily unavailable
   *  ... skipping repetitive backtrace ...
(2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_issue_request_done] (0x0040): sssd.dataprovider.getAccountInfo: Error [1432158212]: SSSD is offline
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [ad_subdomains_refresh_connect_done] (0x0080): [RID#1] No AD server is available, cannot get the subdomain list while offline
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_req_done] (0x0400): [RID#1] DP Request [Subdomains #1]: Request handler finished [0]: Success
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [_dp_req_recv] (0x0400): [RID#1] DP Request [Subdomains #1]: Receiving request data.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_req_destructor] (0x0400): [RID#1] DP Request [Subdomains #1]: Request removed.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_req_destructor] (0x0400): [RID#1] Number of active DP request: 0
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_req_reply_std] (0x1000): [RID#1] DP Request [Subdomains #1]: Returning [Provider is Offline]: 1,1432158212,Offline
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_issue_request_done] (0x0400): sssd.dataprovider.getDomains: Success
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_issue_request_done] (0x0400): sssd.dataprovider.getDomains: Success
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sdap_id_release_conn_data] (0x4000): Releasing unused connection with fd [-1]
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_dispatch] (0x4000): Dispatching.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_dispatch] (0x4000): Dispatching.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_dispatch] (0x4000): Dispatching.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_dispatch] (0x4000): Dispatching.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_method_handler] (0x2000): Received D-Bus method sssd.dataprovider.getAccountInfo on /sssd
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_senders_lookup] (0x2000): Looking for identity of sender [sssd.nss]
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_get_account_info_send] (0x0200): Got request for [0x1][BE_REQ_USER][name=administrator]
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_attach_req] (0x0400): [RID#2] DP Request [Account #2]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001].
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_attach_req] (0x0400): [RID#2] [CID #1] Backend is offline! Using cached data if available
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_attach_req] (0x0400): [RID#2] Number of active DP request: 1
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sss_domain_get_state] (0x1000): [RID#2] Domain cockpit.lan is Active
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [_dp_req_recv] (0x0400): DP Request [Account #2]: Receiving request data.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_req_destructor] (0x0400): DP Request [Account #2]: Request removed.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_req_destructor] (0x0400): Number of active DP request: 0
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_issue_request_done] (0x0040): sssd.dataprovider.getAccountInfo: Error [1432158212]: SSSD is offline
```

Journal confirms that sssd restarts on each attempt to resolve a user.

Expected results: Resolving users works.


Additional info: I'll attach the complete /var/log/sssd/ and the journal.

Comment 2 Alexey Tikhonov 2022-01-12 17:41:48 UTC
Looks like https://github.com/SSSD/sssd/issues/5947 ?

Could you please attach entire /var/log/sssd/sssd_cockpit.lan.log and sssd.log?

Btw, why do you think it "crashes"?

Jan 12 11:38:18  sssd_nss[861]: Shutting down (status = 0)
Jan 12 11:38:18  sssd_be[859]: Shutting down (status = 0)
Jan 12 11:38:18  systemd[1]: sssd.service: Succeeded.
Jan 12 11:38:18  systemd[1]: Stopped System Security Services Daemon.
Jan 12 11:38:18  systemd[1]: Starting System Security Services Daemon...
Jan 12 11:38:18  sssd[1930]: Starting up
  --  looks like graceful service restart.

Comment 3 Martin Pitt 2022-01-12 17:57:25 UTC
> Looks like https://github.com/SSSD/sssd/issues/5947 ?

Could be -- we had two other test cases which did not fail functionally, but triggered a new SELinux violation:

  avc:  denied  { name_connect } for comm="cockpit-session" dest=389 scontext=system_u:system_r:cockpit_session_t:s0 tcontext=system_u:object_r:ldap_port_t:s0 tclass=tcp_socket

This sounds related then? We can open our SELinux policy to allow that, but it didn't seem obvious to me why cockpit-session would need to talk to the LDAP port. It only talks to sssd-ifp over D-Bus, and contexts shouldn't transition that way.

> Could you please attach entire /var/log/sssd/sssd_cockpit.lan.log and sssd.log

I did, see the "/var/log/sssd" attachment

> why do you think it "crashes"?

Because it keeps stopping itself with each request. It's certainly not a "crash" in the "segfault" sense, just that it seems unusual, and the log keeps complaining about "sssd is offline".

Thanks!

Comment 4 Sumit Bose 2022-01-12 18:15:12 UTC
Hi,

I agree, it is most probably https://github.com/SSSD/sssd/issues/5947. Martin, can you set 'debug_level = 9' in the [domain/...] section and then check ldap_child.log if there is ':389' at the end of the KDC IP address? Alternatively you can check '/var/lib/sss/pubconf/kdcinfo.*' is the addresses end in ':389'.

Btw, Iker is already working on a fix.

bye,
Sumit

Comment 5 Alexey Tikhonov 2022-01-12 18:22:41 UTC
(In reply to Martin Pitt from comment #3)
> 
>   avc:  denied  { name_connect } for comm="cockpit-session" dest=389
> scontext=system_u:system_r:cockpit_session_t:s0
> tcontext=system_u:object_r:ldap_port_t:s0 tclass=tcp_socket
> 
> This sounds related then? We can open our SELinux policy to allow that, but
> it didn't seem obvious to me why cockpit-session would need to talk to the
> LDAP port. It only talks to sssd-ifp over D-Bus, and contexts shouldn't
> transition that way.

Probably 'cockpit-session' uses libkrb5 that talks to port (wrongly) specified by SSSD in '/var/lib/sss/pubconf/kdcinfo.*'?

Comment 7 Martin Pitt 2022-01-13 06:45:22 UTC
> Probably 'cockpit-session' uses libkrb5 that talks to port (wrongly) specified by SSSD

Right, it does use libkrb5. Thanks Alexey -- so this is just a different kind of fallout, and we don't change the SELinux policy for now.

> set 'debug_level = 9' in the [domain/...] section and then check ldap_child.log if there is ':389' at the end of the KDC IP address

Yes, I see lots of

(2022-01-13  1:43:44): [be[cockpit.lan]] [sdap_ldap_connect_callback_add] (0x4000): New connection to [ldap://f0.cockpit.lan:389/??base] with fd [21]
(2022-01-13  1:43:44): [be[cockpit.lan]] [sdap_get_rootdse_send] (0x4000): Getting rootdse
(2022-01-13  1:43:44): [be[cockpit.lan]] [sdap_print_server] (0x2000): Searching 10.111.112.100:389

> Alternatively you can check '/var/lib/sss/pubconf/kdcinfo.*' is the addresses end in ':389'.

# cat /var/lib/sss/pubconf/kdcinfo.COCKPIT.LAN 
10.111.112.100:389

Thanks!

Comment 9 Iker Pedrosa 2022-01-13 10:56:52 UTC
Upstream PR:
https://github.com/SSSD/sssd/pull/5949

Comment 10 Alexey Tikhonov 2022-01-17 17:04:59 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/5949

* `master`
    * ca8cef0fc2f6066811105f4c201070cda38c4064 - krb5: AD and IPA don't change Kerberos port

Comment 17 errata-xmlrpc 2022-05-10 15:26:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2070