RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2072050 - sssd_nss exiting (due to missing 'sssd' local user) making SSSD service to restart in a loop
Summary: sssd_nss exiting (due to missing 'sssd' local user) making SSSD service to re...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Alexey Tikhonov
QA Contact: shridhar
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks: 2074648
TreeView+ depends on / blocked
 
Reported: 2022-04-05 13:48 UTC by Micah Abbott
Modified: 2022-11-08 12:41 UTC (History)
19 users (show)

Fixed In Version: sssd-2.7.0-2.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2074648 (view as bug list)
Environment:
Last Closed: 2022-11-08 10:51:22 UTC
Type: Bug
Target Upstream Version:
Embargoed:
sgadekar: needinfo-


Attachments (Terms of Use)
sssd.log (1.07 MB, text/plain)
2022-04-05 13:48 UTC, Micah Abbott
no flags Details
sssd_nss.log (1.17 MB, text/plain)
2022-04-05 13:50 UTC, Micah Abbott
no flags Details
sssd_implicit_files.log (1.13 MB, text/plain)
2022-04-05 14:15 UTC, Micah Abbott
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 6107 0 None open SSSD fails to start if 'sssd user' isn't resolvable by 'libnss_files.so' 2022-04-11 20:39:56 UTC
Red Hat Issue Tracker RHELPLAN-117906 0 None None None 2022-04-05 13:57:58 UTC
Red Hat Issue Tracker SSSD-4548 0 None None None 2022-04-06 08:23:19 UTC
Red Hat Product Errata RHBA-2022:7739 0 None None None 2022-11-08 10:51:43 UTC

Description Micah Abbott 2022-04-05 13:48:49 UTC
Created attachment 1870862 [details]
sssd.log

As part of OpenShift 4.11, RHCOS recently switched to using the RHEL 8.6 Beta content and we started to see evidence of `sssd` crashing and restarting in a loop.

```
Apr 04 22:10:22 test1-4nhln-master-0 systemd[1]: sssd.service: Service RestartSec=100ms expired, scheduling restart.
Apr 04 22:10:22 test1-4nhln-master-0 systemd[1]: sssd.service: Scheduled restart job, restart counter is at 26.
Apr 04 22:10:22 test1-4nhln-master-0 systemd[1]: Stopped System Security Services Daemon.
Apr 04 22:10:22 test1-4nhln-master-0 systemd[1]: sssd.service: Consumed 532ms CPU time
Apr 04 22:10:22 test1-4nhln-master-0 systemd[1]: Starting System Security Services Daemon...
Apr 04 22:10:22 test1-4nhln-master-0 sssd[2615]: Starting up
Apr 04 22:10:22 test1-4nhln-master-0 sssd_be[2616]: Starting up
Apr 04 22:10:22 test1-4nhln-master-0 sssd_nss[2617]: Starting up
Apr 04 22:10:22 test1-4nhln-master-0 sssd_nss[2624]: Starting up
Apr 04 22:10:25 test1-4nhln-master-0 sssd_nss[2638]: Starting up
Apr 04 22:10:29 test1-4nhln-master-0 sssd_nss[2681]: Starting up
Apr 04 22:10:29 test1-4nhln-master-0 sssd[2615]: Exiting the SSSD. Could not restart critical service [nss].
Apr 04 22:10:29 test1-4nhln-master-0 systemd[1]: sssd.service: Main process exited, code=exited, status=1/FAILURE
Apr 04 22:10:29 test1-4nhln-master-0 systemd[1]: sssd.service: Failed with result 'exit-code'.
Apr 04 22:10:29 test1-4nhln-master-0 systemd[1]: Failed to start System Security Services Daemon.
Apr 04 22:10:29 test1-4nhln-master-0 systemd[1]: sssd.service: Consumed 510ms CPU time
Apr 04 22:10:29 test1-4nhln-master-0 systemd[1]: sssd.service: Service RestartSec=100ms expired, scheduling restart.
Apr 04 22:10:29 test1-4nhln-master-0 systemd[1]: sssd.service: Scheduled restart job, restart counter is at 27.
Apr 04 22:10:29 test1-4nhln-master-0 systemd[1]: Stopped System Security Services Daemon.
Apr 04 22:10:29 test1-4nhln-master-0 systemd[1]: sssd.service: Consumed 510ms CPU time
Apr 04 22:10:29 test1-4nhln-master-0 systemd[1]: Starting System Security Services Daemon...
Apr 04 22:10:29 test1-4nhln-master-0 sssd[2683]: Starting up
Apr 04 22:10:29 test1-4nhln-master-0 sssd_be[2684]: Starting up
Apr 04 22:10:29 test1-4nhln-master-0 sssd_nss[2685]: Starting up
Apr 04 22:10:29 test1-4nhln-master-0 sssd_nss[2686]: Starting up
```

The version of `sssd` used in RHCOS is `sssd-0-2.6.2-3.el8-x86_64`


We think this may be related to:

https://bugzilla.redhat.com/show_bug.cgi?id=1796466#c10
https://github.com/SSSD/sssd/issues/5753


The upstream PR:

https://github.com/SSSD/sssd/pull/6075

...may resolve this issue for us.


This is currently blocking the ability for OpenShift clusters to be installed/started successfully.

Comment 1 Micah Abbott 2022-04-05 13:50:10 UTC
Created attachment 1870873 [details]
sssd_nss.log

Comment 2 Alexey Tikhonov 2022-04-05 14:13:41 UTC
(In reply to Micah Abbott from comment #0)
> 
> https://github.com/SSSD/sssd/pull/6075
> 
> ...may resolve this issue for us.

This may "hide" issue, but not resolve it.

Do you have a custom sssd.conf? Could you please share content of /etc/sssd/* and also /var/log/sssd.log?

Comment 6 Micah Abbott 2022-04-05 14:53:05 UTC
This problem is being reproduced in an ephemeral CI cluster, so access to the nodes is difficult as they are torn down after failure.

However, looking at a single RHCOS node (outside of the cluster), the contents of `/etc/sssd/`:


```
$ sudo ls -lR /etc/sssd/
/etc/sssd/:
total 0
drwx--x--x. 2 sssd sssd 6 Apr  5 14:31 conf.d
drwx--x--x. 2 root root 6 Apr  5 14:31 pki

/etc/sssd/conf.d:
total 0

/etc/sssd/pki:
total 0
```

Comment 7 Alexey Tikhonov 2022-04-05 16:09:37 UTC
> [sss_user_by_name_or_uid] (0x0040): [sssd] is neither a valid UID nor a user name which could be resolved by getpwnam()

This happens in `nss_process_init()`->`sssd_supplementary_group()`->`sss_user_by_name_or_uid(SSSD_USER)`:
https://github.com/SSSD/sssd/blob/d1bce130f590e7e81a8472b8c9804ebe63898852/src/responder/nss/nsssrv.c#L404

For RHEL8 SSSD is configured `--with-sssd-user=sssd`

`%pre ipa, krb5-common, common, proxy` sections of a RHEL8 spec-file create this local user.
But it seems this user is missing on your host. Could you please confirm this?

What package is used by RHCOS? Is it the same as for RHEL? How is it installed?

Comment 8 Micah Abbott 2022-04-05 16:41:20 UTC
That would explain it.

```
$ cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
core:x:1000:1000:CoreOS Admin:/var/home/core:/bin/bash
containers:x:1001:995:User for housing the sub ID range for containers:/var/home/containers:/sbin/nologin
```

We are including sssd-2.6.2-3.el8.x86_64 as part of RHCOS.

But the RPM scriptlets are being run as part of the `rpm-ostree compose` process on the server side as part of build process, so that might be where the disconnect is.

We previously haven't encountered this issue with older versions of `sssd`.  We moved from RHEL 8.4 EUS to RHEL 8.6 Beta when this problem showed up.

Specifically, `sssd 2.5.2-2.el8_5.4 -> 2.6.2-3.el8`

Let me reach out to more folks in the CoreOS team about this.

Comment 9 Jonathan Lebon 2022-04-05 17:03:37 UTC
RHCOS uses nss-altfiles to separate out system users into /usr/lib from local users in /etc. Looking at an RHCOS 8.6 pipeline build, we do have the sssd user and group:

[root@cosa-devsh ~]# grep sssd /usr/lib/passwd
sssd:x:995:993:User for sssd:/:/sbin/nologin
[root@cosa-devsh ~]# grep sssd /usr/lib/group
sssd:x:993:
[root@cosa-devsh ~]# getent passwd sssd
sssd:x:995:993:User for sssd:/:/sbin/nologin
[root@cosa-devsh ~]# getent group sssd
sssd:x:993:

Comment 11 Alexey Tikhonov 2022-04-05 17:04:33 UTC
Btw, take a note, that if the only SSSD domain backend running is 'implicit_files' (`sssd_be --domain implicit_files`) than an option could be to just disable SSSD by default (https://access.redhat.com/solutions/6815101), leaving the option to configure it explicitly if network identities are needed on the node.

Comment 12 Alexey Tikhonov 2022-04-05 17:30:23 UTC
(In reply to Jonathan Lebon from comment #9)
> RHCOS uses nss-altfiles to separate out system users into /usr/lib from
> local users in /etc. Looking at an RHCOS 8.6 pipeline build, we do have the
> sssd user and group:
> 
> [root@cosa-devsh ~]# grep sssd /usr/lib/passwd
> sssd:x:995:993:User for sssd:/:/sbin/nologin


Ah, this explains:

(In reply to Micah Abbott from comment #8)
> 
> We previously haven't encountered this issue with older versions of `sssd`. 
> We moved from RHEL 8.4 EUS to RHEL 8.6 Beta when this problem showed up.
> 
> Specifically, `sssd 2.5.2-2.el8_5.4 -> 2.6.2-3.el8`


The reason is https://github.com/SSSD/sssd/pull/5867  --  it was released upstream in sssd-2.6.2

Comment 21 Timothée Ravier 2022-04-07 16:03:42 UTC
AFAIU, upstream did not spot this one due to this change: https://fedoraproject.org/wiki/Changes/FlexibleLocalUserCache

Comment 25 Alexey Tikhonov 2022-04-12 14:44:13 UTC
Upstream PR: https://github.com/SSSD/sssd/pull/6108

Comment 28 Alexey Tikhonov 2022-04-14 09:39:40 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/6108

* `master`
    * 3c6218aa91026e066e793ee26333ea64fd6bc50e - Revert "man: sssd.conf and sssd-ifp clarify user option"
    * 37f90057792a0b4543f34684ed9a240fe8e869c1 - Revert "usertools: force local user for sssd process user"

Comment 33 HuijingHei 2022-05-30 06:38:51 UTC
Maybe it is missing Verified:Tested flag. 
@shridhar, could you help to do some testing based on the new build? And I can help to test from my side.

Comment 42 errata-xmlrpc 2022-11-08 10:51:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7739


Note You need to log in before you can comment on or make changes to this bug.