RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2039892 - 2.6.2 regression: Daemon crashes when resolving AD user names
Summary: 2.6.2 regression: Daemon crashes when resolving AD user names
Keywords:
Status: CLOSED ERRATA
Alias: None
Deadline: 2022-01-31
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Iker Pedrosa
QA Contact: Dan Lavu
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks: 1859315
TreeView+ depends on / blocked
 
Reported: 2022-01-12 16:49 UTC by Martin Pitt
Modified: 2022-05-10 16:47 UTC (History)
11 users (show)

Fixed In Version: sssd-2.6.2-3.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-10 15:26:44 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
/var/log/sssd/ (104.54 KB, application/gzip)
2022-01-12 16:49 UTC, Martin Pitt
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 5947 0 None open sssd-ad broken in 2.6.2, 389 used as kerberos port 2022-01-13 10:19:56 UTC
Red Hat Issue Tracker RHELPLAN-107608 0 None None None 2022-01-12 16:57:56 UTC
Red Hat Issue Tracker SSSD-4252 0 None None None 2022-01-12 18:54:24 UTC
Red Hat Product Errata RHBA-2022:2070 0 None None None 2022-05-10 15:27:03 UTC

Description Martin Pitt 2022-01-12 16:49:11 UTC
Created attachment 1850416 [details]
/var/log/sssd/

Description of problem: With the recent update of sssd 2.6.2 in RHEL 8.6, Active Directory user names cannot be resolved any more. Instead, sssd goes offline and crashes.

We noticed that in our RHEL 8.6 cockpit image refresh (https://github.com/cockpit-project/bots/pull/2794), and I can reproduce it with 

    dnf update -y sssd-nfs-idmap libsss_autofs libsss_sudo python3-sss-murmur

on our current image (from December 25) with sssd 2.6.1, where things worked.

Version-Release number of selected component (if applicable):

sssd-2.6.2-2.el8.x86_64

How reproducible: Always


Steps to Reproduce:
1. Update sssd to 2.6.2 or start current RHEL 8.6 nightly
2. Join an Active Directory domain:
   realm join -vU Administrator cockpit.lan
3. Try to resolve admin user:
   id Administrator
   su - Administrator

Actual results: Fails with "user Administrator does not exist"

sssd is offline:

# sssctl domain-status cockpit.lan
Online status: Offline

Active servers:
AD Global Catalog: not connected
AD Domain Controller: f0.cockpit.lan

Discovered AD Global Catalog servers:
None so far.
Discovered AD Domain Controller servers:
- f0.cockpit.lan

/var/log/sssd/sssd_cockpit.lan.log has an interesting log:


```
2022-01-12 10:22:33): [be[cockpit.lan]] [be_ptask_done] (0x0040): Task [Subdomains Refresh]: failed with [1432158212]: SSSD is offline
   *  ... skipping repetitive backtrace ...
(2022-01-12 10:22:33): [be[cockpit.lan]] [ad_subdomains_refresh_connect_done] (0x0020): [RID#1] Unable to connect to LDAP [11]: Resource temporarily unavailable
   *  ... skipping repetitive backtrace ...
(2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_issue_request_done] (0x0040): sssd.dataprovider.getAccountInfo: Error [1432158212]: SSSD is offline
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [ad_subdomains_refresh_connect_done] (0x0080): [RID#1] No AD server is available, cannot get the subdomain list while offline
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_req_done] (0x0400): [RID#1] DP Request [Subdomains #1]: Request handler finished [0]: Success
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [_dp_req_recv] (0x0400): [RID#1] DP Request [Subdomains #1]: Receiving request data.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_req_destructor] (0x0400): [RID#1] DP Request [Subdomains #1]: Request removed.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_req_destructor] (0x0400): [RID#1] Number of active DP request: 0
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_req_reply_std] (0x1000): [RID#1] DP Request [Subdomains #1]: Returning [Provider is Offline]: 1,1432158212,Offline
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_issue_request_done] (0x0400): sssd.dataprovider.getDomains: Success
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_issue_request_done] (0x0400): sssd.dataprovider.getDomains: Success
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sdap_id_release_conn_data] (0x4000): Releasing unused connection with fd [-1]
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_dispatch] (0x4000): Dispatching.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_dispatch] (0x4000): Dispatching.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_dispatch] (0x4000): Dispatching.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_dispatch] (0x4000): Dispatching.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_method_handler] (0x2000): Received D-Bus method sssd.dataprovider.getAccountInfo on /sssd
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_senders_lookup] (0x2000): Looking for identity of sender [sssd.nss]
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_get_account_info_send] (0x0200): Got request for [0x1][BE_REQ_USER][name=administrator]
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_attach_req] (0x0400): [RID#2] DP Request [Account #2]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001].
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_attach_req] (0x0400): [RID#2] [CID #1] Backend is offline! Using cached data if available
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_attach_req] (0x0400): [RID#2] Number of active DP request: 1
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sss_domain_get_state] (0x1000): [RID#2] Domain cockpit.lan is Active
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [_dp_req_recv] (0x0400): DP Request [Account #2]: Receiving request data.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_req_destructor] (0x0400): DP Request [Account #2]: Request removed.
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [dp_req_destructor] (0x0400): Number of active DP request: 0
   *  (2022-01-12 10:22:33): [be[cockpit.lan]] [sbus_issue_request_done] (0x0040): sssd.dataprovider.getAccountInfo: Error [1432158212]: SSSD is offline
```

Journal confirms that sssd restarts on each attempt to resolve a user.

Expected results: Resolving users works.


Additional info: I'll attach the complete /var/log/sssd/ and the journal.

Comment 2 Alexey Tikhonov 2022-01-12 17:41:48 UTC
Looks like https://github.com/SSSD/sssd/issues/5947 ?

Could you please attach entire /var/log/sssd/sssd_cockpit.lan.log and sssd.log?

Btw, why do you think it "crashes"?

Jan 12 11:38:18  sssd_nss[861]: Shutting down (status = 0)
Jan 12 11:38:18  sssd_be[859]: Shutting down (status = 0)
Jan 12 11:38:18  systemd[1]: sssd.service: Succeeded.
Jan 12 11:38:18  systemd[1]: Stopped System Security Services Daemon.
Jan 12 11:38:18  systemd[1]: Starting System Security Services Daemon...
Jan 12 11:38:18  sssd[1930]: Starting up
  --  looks like graceful service restart.

Comment 3 Martin Pitt 2022-01-12 17:57:25 UTC
> Looks like https://github.com/SSSD/sssd/issues/5947 ?

Could be -- we had two other test cases which did not fail functionally, but triggered a new SELinux violation:

  avc:  denied  { name_connect } for comm="cockpit-session" dest=389 scontext=system_u:system_r:cockpit_session_t:s0 tcontext=system_u:object_r:ldap_port_t:s0 tclass=tcp_socket

This sounds related then? We can open our SELinux policy to allow that, but it didn't seem obvious to me why cockpit-session would need to talk to the LDAP port. It only talks to sssd-ifp over D-Bus, and contexts shouldn't transition that way.

> Could you please attach entire /var/log/sssd/sssd_cockpit.lan.log and sssd.log

I did, see the "/var/log/sssd" attachment

> why do you think it "crashes"?

Because it keeps stopping itself with each request. It's certainly not a "crash" in the "segfault" sense, just that it seems unusual, and the log keeps complaining about "sssd is offline".

Thanks!

Comment 4 Sumit Bose 2022-01-12 18:15:12 UTC
Hi,

I agree, it is most probably https://github.com/SSSD/sssd/issues/5947. Martin, can you set 'debug_level = 9' in the [domain/...] section and then check ldap_child.log if there is ':389' at the end of the KDC IP address? Alternatively you can check '/var/lib/sss/pubconf/kdcinfo.*' is the addresses end in ':389'.

Btw, Iker is already working on a fix.

bye,
Sumit

Comment 5 Alexey Tikhonov 2022-01-12 18:22:41 UTC
(In reply to Martin Pitt from comment #3)
> 
>   avc:  denied  { name_connect } for comm="cockpit-session" dest=389
> scontext=system_u:system_r:cockpit_session_t:s0
> tcontext=system_u:object_r:ldap_port_t:s0 tclass=tcp_socket
> 
> This sounds related then? We can open our SELinux policy to allow that, but
> it didn't seem obvious to me why cockpit-session would need to talk to the
> LDAP port. It only talks to sssd-ifp over D-Bus, and contexts shouldn't
> transition that way.

Probably 'cockpit-session' uses libkrb5 that talks to port (wrongly) specified by SSSD in '/var/lib/sss/pubconf/kdcinfo.*'?

Comment 7 Martin Pitt 2022-01-13 06:45:22 UTC
> Probably 'cockpit-session' uses libkrb5 that talks to port (wrongly) specified by SSSD

Right, it does use libkrb5. Thanks Alexey -- so this is just a different kind of fallout, and we don't change the SELinux policy for now.

> set 'debug_level = 9' in the [domain/...] section and then check ldap_child.log if there is ':389' at the end of the KDC IP address

Yes, I see lots of

(2022-01-13  1:43:44): [be[cockpit.lan]] [sdap_ldap_connect_callback_add] (0x4000): New connection to [ldap://f0.cockpit.lan:389/??base] with fd [21]
(2022-01-13  1:43:44): [be[cockpit.lan]] [sdap_get_rootdse_send] (0x4000): Getting rootdse
(2022-01-13  1:43:44): [be[cockpit.lan]] [sdap_print_server] (0x2000): Searching 10.111.112.100:389

> Alternatively you can check '/var/lib/sss/pubconf/kdcinfo.*' is the addresses end in ':389'.

# cat /var/lib/sss/pubconf/kdcinfo.COCKPIT.LAN 
10.111.112.100:389

Thanks!

Comment 9 Iker Pedrosa 2022-01-13 10:56:52 UTC
Upstream PR:
https://github.com/SSSD/sssd/pull/5949

Comment 10 Alexey Tikhonov 2022-01-17 17:04:59 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/5949

* `master`
    * ca8cef0fc2f6066811105f4c201070cda38c4064 - krb5: AD and IPA don't change Kerberos port

Comment 17 errata-xmlrpc 2022-05-10 15:26:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2070


Note You need to log in before you can comment on or make changes to this bug.