Bug 2308428 - During upgrade of rpm-ostree based Fedora, sssd-2.10+ doesn't chown KCM/secrets.ldb to proper user
Summary: During upgrade of rpm-ostree based Fedora, sssd-2.10+ doesn't chown KCM/secre...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: sssd
Version: 41
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Alexey Tikhonov
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedFreezeException
: 2311897 (view as bug list)
Depends On:
Blocks: F41FinalFreezeException
TreeView+ depends on / blocked
 
Reported: 2024-08-29 03:25 UTC by Parag Nemade
Modified: 2024-10-17 23:11 UTC (History)
15 users (show)

Fixed In Version: sssd-2.10.0-1.fc41
Clone Of:
Environment:
Last Closed: 2024-10-17 23:11:48 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Parag Nemade 2024-08-29 03:25:02 UTC
I upgraded my Fedora 40 Silverblue system to Fedora 41 Silverblue system. I am using toolbox on this upgraded system to use kinit command. However I found its failed with error "kinit: Connection refused while getting default ccache"

I then checked for any failed services on this system and found below
parag@f41sb:~$ sudo systemctl status sssd-kcm.service
× sssd-kcm.service - SSSD Kerberos Cache Manager
     Loaded: loaded (/usr/lib/systemd/system/sssd-kcm.service; indirect; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d 
             └─10-timeout-abort.conf
     Active: failed (Result: exit-code) since Thu 2024-08-29 08:36:37 IST; 2min 14s ago
   Duration: 17ms
 Invocation: a0e98bf8241640c3b720f1beeeaa5293
TriggeredBy: × sssd-kcm.socket
       Docs: man:sssd-kcm(5)
    Process: 2766 ExecStartPre=/bin/chown -f sssd:sssd /etc/sssd/sssd.conf (code=exited, status=1/FAILURE)
    Process: 2771 ExecStartPre=/bin/chown -f -R sssd:sssd /etc/sssd/conf.d (code=exited, status=0/SUCCESS)
    Process: 2773 ExecStart=/usr/libexec/sssd/sssd_kcm ${DEBUG_LOGGER} (code=exited, status=3)
   Main PID: 2773 (code=exited, status=3)

Aug 29 08:36:37 f41sb sssd_kcm[2773]: Failed to connect to '/var/lib/sss/secrets/secrets.ldb' with backend 'tdb': Unable to open tdb '/var/lib/sss/secrets/secrets.ldb': Permission denied
Aug 29 08:36:37 f41sb sssd_kcm[2773]: (2024-08-29  8:36:37): [kcm] [sss_sec_init] (0x0020): Failed to initialize secdb [5]: Input/output error
Aug 29 08:36:37 f41sb sssd_kcm[2773]: (2024-08-29  8:36:37): [kcm] [ccdb_secdb_init] (0x0020): Cannot initialize the security database
Aug 29 08:36:37 f41sb sssd_kcm[2773]: (2024-08-29  8:36:37): [kcm] [kcm_ccdb_init] (0x0020): Cannot initialize ccache database
Aug 29 08:36:37 f41sb sssd_kcm[2773]: (2024-08-29  8:36:37): [kcm] [kcm_process_init] (0x0010): fatal error initializing responder data
Aug 29 08:36:37 f41sb systemd[1]: sssd-kcm.service: Main process exited, code=exited, status=3/NOTIMPLEMENTED
Aug 29 08:36:37 f41sb systemd[1]: sssd-kcm.service: Failed with result 'exit-code'.
Aug 29 08:36:37 f41sb systemd[1]: sssd-kcm.service: Start request repeated too quickly.
Aug 29 08:36:37 f41sb systemd[1]: sssd-kcm.service: Failed with result 'exit-code'.
Aug 29 08:36:37 f41sb systemd[1]: Failed to start sssd-kcm.service - SSSD Kerberos Cache Manager.

and
parag@f41sb:~$ sudo systemctl status sssd-kcm.socket
× sssd-kcm.socket - SSSD Kerberos Cache Manager responder socket
     Loaded: loaded (/usr/lib/systemd/system/sssd-kcm.socket; enabled; preset: enabled)
     Active: failed (Result: service-start-limit-hit) since Thu 2024-08-29 08:36:37 IST; 2min 39s ago
   Duration: 23.535s
 Invocation: de4f2f42700f47a3958789b82f7d5070
   Triggers: ● sssd-kcm.service
       Docs: man:sssd-kcm(8)
     Listen: /run/.heim_org.h5l.kcm-socket (Stream)

Aug 29 08:36:14 f41sb systemd[1]: Listening on sssd-kcm.socket - SSSD Kerberos Cache Manager responder socket.
Aug 29 08:36:37 f41sb systemd[1]: sssd-kcm.socket: Failed with result 'service-start-limit-hit'.


Can someone help to know better how to fix this service failure issue so that I can start using kinit command?

Reproducible: Always

Steps to Reproduce:
1. Take Fedora 41 bare metal Silverblue system not VM
2. use fedora toolbox
3. in toolbox just run "kinit" or "klist" it fails
Actual Results:  
kinit: Connection refused while getting default ccache

Expected Results:  
Should run kinit and get a ticket

Comment 1 Alexey Tikhonov 2024-08-29 09:06:10 UTC
Hi,

could you please show
(1) output of `systemctl cat sssd-kcm.service`
and
(2) output of `ls -lahZ /var/lib/sss/secrets/secrets.ldb`
?

Comment 2 Parag Nemade 2024-08-29 09:28:15 UTC
(1) output of `systemctl cat sssd-kcm.service`

parag@f41sb:~$ systemctl cat sssd-kcm.service
# /usr/lib/systemd/system/sssd-kcm.service
[Unit]
Description=SSSD Kerberos Cache Manager
Documentation=man:sssd-kcm(5)
Requires=sssd-kcm.socket
After=sssd-kcm.socket

[Install]
Also=sssd-kcm.socket

[Service]
Environment=DEBUG_LOGGER=--logger=files 
ExecStartPre=+-/bin/chown -f sssd:sssd /etc/sssd/sssd.conf
ExecStartPre=+-/bin/chown -f -R sssd:sssd /etc/sssd/conf.d
ExecStart=/usr/libexec/sssd/sssd_kcm ${DEBUG_LOGGER}
CapabilityBoundingSet= CAP_DAC_OVERRIDE CAP_CHOWN CAP_SETGID CAP_SETUID
SecureBits=noroot noroot-locked
User=sssd
Group=sssd
# If service configured to be run under "root", uncomment "SupplementaryGroups"
#SupplementaryGroups=sssd

# /usr/lib/systemd/system/service.d/10-timeout-abort.conf 
# This file is part of the systemd package.
# See https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer.
#
# To facilitate debugging when a service fails to stop cleanly,
# TimeoutStopFailureMode=abort is set to "crash" services that fail to stop in
# the time allotted. This will cause the service to be terminated with SIGABRT
# and a coredump to be generated.
#
# To undo this configuration change, create a mask file:
#   sudo mkdir -p /etc/systemd/system/service.d
#   sudo ln -sv /dev/null /etc/systemd/system/service.d/10-timeout-abort.conf

[Service]
TimeoutStopFailureMode=abort


(2) output of `ls -lahZ /var/lib/sss/secrets/secrets.ldb`

parag@f41sb:~$ sudo ls -lahZ /var/lib/sss/secrets/secrets.ldb
-rw-------. 1 root root system_u:object_r:sssd_var_lib_t:s0 1.6M Aug 29 13:45 /var/lib/sss/secrets/secrets.ldb

Comment 3 Alexey Tikhonov 2024-08-29 09:36:07 UTC
(In reply to Parag Nemade from comment #2)
> 
> parag@f41sb:~$ sudo ls -lahZ /var/lib/sss/secrets/secrets.ldb
> -rw-------. 1 root root system_u:object_r:sssd_var_lib_t:s0 1.6M Aug 29
> 13:45 /var/lib/sss/secrets/secrets.ldb

This is the issue - it should be sssd:sssd owned.

Ownership has to be changed during upgrade:
https://src.fedoraproject.org/rpms/sssd/blob/f41/f/sssd.spec#_1076

Do you have upgrade logs by chance?
I wonder if there is something specific to Silverblue that made this to fail...

Obviously, as a "fix" you can chown this file to sssd:sssd manually.

Comment 4 Parag Nemade 2024-08-29 09:58:59 UTC
Thank you Alexey for your quick help.

I am not sure where to look for upgrade logs. Actually its just the fresh ostree deployment with every update.

I have now changed the ownership to sssd:sssd but then I need to reboot the system as well then only kinit started working.

In case I get something more to share here, I will add it here.

You can close this issue :)

Comment 5 Alexey Tikhonov 2024-08-29 11:29:14 UTC
(In reply to Parag Nemade from comment #4)
> 
> Actually its just the fresh ostree deployment with every update.

I guess this means rpm `%post` isn't executed on a local machine.


> You can close this issue :)

It's rpm-ostree specific but still a "valid bug"...

Comment 6 Parag Nemade 2024-08-29 11:56:34 UTC
After this issue fixed, I was happy and tried to upgrade my initial (first) f41 deployment and it got some packages updated as well.
Then I reboot and the system become un-accessible (grub failure). Check this issue https://github.com/fedora-silverblue/issue-tracker/issues/587
so I have moved back my system from F41 to F40 deployment. Probably will upgrade again after F41 beta release.

If some other people also face this file ownership issue then it can be taken to rpm-ostree people for more help. 
If someone needs how to reproduce then 
1) Take F40 updated Silverblue system
2) Then upgrade it to F41 and reboot to this new F41 deployment
3) Use toolbox and run kinit command. it will fail to run.

Thank you.

Comment 7 Parag Nemade 2024-08-29 16:24:47 UTC
Interesting, I moved back to F40 now where I was having kinit working fine with secrets.ldb file as
-rw-------. 1 root root system_u:object_r:sssd_var_lib_t:s0 1.6M Aug 29 19:41 /var/lib/sss/secrets/secrets.ldb


but as on Fedora 41, I am suggested to change its ownership to sssd:sssd, I did that but it become incompatible with Fedora 40.
I then changed back ownership of secrets.ldb file to root:root.

So is this file ownership should remain root:root in Fedora 40 and should get changed to sssd:sssd in Fedora 41?

Comment 8 Alexey Tikhonov 2024-08-29 16:30:06 UTC
(In reply to Parag Nemade from comment #7)
> 
> So is this file ownership should remain root:root in Fedora 40 and should
> get changed to sssd:sssd in Fedora 41?

Ownership should match a user used to run 'sssd_kcm' service.
Default value of this user changed from 'root' to 'sssd' starting F41.
For "regular" (package based) Fedora versions ownership is changed by rpm %post script during upgrade, but this obviously doesn't work of rpm-ostree based Fedora.

Probably we can also add this 'chown' to https://github.com/SSSD/sssd/blob/master/src/sysv/systemd/sssd-kcm.service.in

Comment 9 Alexis Jeandet 2024-09-03 07:48:33 UTC
Hello,

I got something similar on my machine, I had to chown:


- /etc/sssd, the files inside were correctly set but the folder wasn't accessible to sssd user
- /var/lib/sss and everything inside
- /var/log/sssd and everything inside

Note: I use realmd for Active Directory login.

Best regards.
Alexis.

Comment 10 Alexey Tikhonov 2024-09-03 07:59:14 UTC
Hi,

(In reply to Alexis Jeandet from comment #9)
> 
> I got something similar on my machine, I had to chown:
> 
> 
> - /etc/sssd, the files inside were correctly set but the folder wasn't
> accessible to sssd user
> - /var/lib/sss and everything inside
> - /var/log/sssd and everything inside

could you please provide additional details: what is the OS version(s), SSSD version(s), what steps were done?

If this is Silverblue 41, then what is the content of `/usr/lib/rpm-ostree/tmpfiles.d/sssd-common.conf`?

Comment 11 Alexis Jeandet 2024-09-03 08:05:04 UTC
Sure, it was also after an update from Fedora Silverblue 40 to 41.

Here is the content of /usr/lib/rpm-ostree/tmpfiles.d/sssd-common.conf:
============================================================
d /run/sssd 0775 sssd sssd - -
d /var/cache/krb5rcache 0755 root root - -
d /var/lib/sss 0775 sssd sssd - -
d /var/lib/sss/db 0770 sssd sssd - -
d /var/lib/sss/deskprofile 0771 sssd sssd - -
d /var/lib/sss/gpo_cache 0770 sssd sssd - -
d /var/lib/sss/mc 0775 sssd sssd - -
d /var/lib/sss/pipes 0775 sssd sssd - -
d /var/lib/sss/pipes/private 0770 sssd sssd - -
d /var/lib/sss/pubconf 0775 sssd sssd - -
d /var/lib/sss/secrets 0770 sssd sssd - -
d /var/log/sssd 0770 sssd sssd - -
============================================================

Also the output of `rpm-ostree status`:
============================================================
State: idle
Deployments:
● fedora:fedora/41/x86_64/silverblue
                  Version: 41.20240831.n.0 (2024-08-31T08:05:07Z)
               BaseCommit: 9971c3e30a94eea549c8fe1ba54365cf722d5ca4b3f367b0f40cb2e837fbd92a
             GPGSignature: Valid signature by 466CF2D8B60BC3057AA9453ED0622462E99D6AD1
          LayeredPackages: adcli btrfs-assistant distrobox gem glusterfs-client lm_sensors oddjob-mkhomedir powerline samba-common-tools sssd-ad zsh

  fedora:fedora/40/x86_64/silverblue
                  Version: 40.20240902.0 (2024-09-02T00:42:54Z)
               BaseCommit: 14a2faa512f11119ddbf672b4014c55a2658498be14d44812a47fcb663c4b7e8
             GPGSignature: Valid signature by 115DF9AEF857853EE8445D0A0727707EA15B79CC
          LayeredPackages: adcli btrfs-assistant distrobox gem glusterfs-client lm_sensors oddjob-mkhomedir powerline samba-common-tools sssd-ad zsh
============================================================

Comment 12 Alexey Tikhonov 2024-09-03 08:14:38 UTC
Well, I'm not sure how does ostree handle changes in /etc :
https://src.fedoraproject.org/rpms/sssd/blob/f41/f/sssd.spec#_810
Maybe it doesn't at all...

But I wonder why you had to chown `/var/lib/sss` and `/var/log/sssd`...

This looks correct:

(In reply to Alexis Jeandet from comment #11)
> Here is the content of /usr/lib/rpm-ostree/tmpfiles.d/sssd-common.conf:
> ============================================================
> d /run/sssd 0775 sssd sssd - -
> d /var/cache/krb5rcache 0755 root root - -
> d /var/lib/sss 0775 sssd sssd - -
> d /var/lib/sss/db 0770 sssd sssd - -
> d /var/lib/sss/deskprofile 0771 sssd sssd - -
> d /var/lib/sss/gpo_cache 0770 sssd sssd - -
> d /var/lib/sss/mc 0775 sssd sssd - -
> d /var/lib/sss/pipes 0775 sssd sssd - -
> d /var/lib/sss/pipes/private 0770 sssd sssd - -
> d /var/lib/sss/pubconf 0775 sssd sssd - -
> d /var/lib/sss/secrets 0770 sssd sssd - -
> d /var/log/sssd 0770 sssd sssd - -

  --  was everything root:root despite the above?

Comment 13 Alexis Jeandet 2024-09-03 08:29:19 UTC
Not sure, could it be that I tried to start manually sssd as root while investigating the issue?
Maybe then the only issue in my case was /etc/sssd directory?

Comment 14 Alexey Tikhonov 2024-09-12 11:30:54 UTC
*** Bug 2311897 has been marked as a duplicate of this bug. ***

Comment 15 Alexey Tikhonov 2024-09-12 11:32:22 UTC
Upstream PR: https://github.com/SSSD/sssd/pull/7585

Comment 16 Alexey Tikhonov 2024-09-13 12:03:59 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/7585

* `master`
    * 2dae1f64d1478ffb96064004e1b67b65379b994f - SYSTEMD: chown all artifacts at startup
    * c0c46bf6a2012b2fac5632a91c578311c1a85457 - SPEC: don't fail uninstallation if 'alternatives' fails

Comment 17 Yann Soubeyrand 2024-09-13 13:57:31 UTC
Hello,

It’s not clear to me which files/directories must belong to sssd:sssd. I chowned /var/lib/sss/db/config.ldb, /var/lib/sss/secrets/secrets.ldb and /var/log/sssd/sssd_kcm.log and was able to start sssd-kcm.service, which made my Kerberos GNOME online account work. However, I still can’t see my tickets using klist in my toolbox (klist: Connexion refusée while resolving ccache), whereas I’m pretty sure it worked on Fedora 40 Silverblue.

Regards

Yann

Comment 18 Alexey Tikhonov 2024-09-13 14:59:29 UTC
Hi,

(In reply to Yann Soubeyrand from comment #17)
> However, I still can’t see my
> tickets using klist in my toolbox (klist: Connexion refusée while resolving
> ccache), whereas I’m pretty sure it worked on Fedora 40 Silverblue.

could you please show output of `strace klist -A` starting
```
socket(AF_UNIX, SOCK_STREAM, 0)         = 4
connect(4, {sa_family=AF_UNIX, sun_path="/var/run/.heim_org.h5l.kcm-socket"}, 110) = 0
```
?

Comment 19 Yann Soubeyrand 2024-09-13 15:04:45 UTC
Well, I’m confused: stopping my toolbox (not just getting out of it) and starting it again made klist work. The issue is solved for me, sorry for the noise…

Regards

Yann

Comment 20 Alexey Tikhonov 2024-09-14 17:04:23 UTC
Additional PR: https://github.com/SSSD/sssd/pull/7594

Comment 21 Alexey Tikhonov 2024-09-17 15:07:42 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/7594

* `master`
    * f83ea91aa773050e992fd58753f670668f3549a8 - SYSTEMD: shell expansion of * doesn't work in ExecStartPre

Comment 22 Timothée Ravier 2024-09-18 14:44:40 UTC
(In reply to Alexey Tikhonov from comment #3)
> > parag@f41sb:~$ sudo ls -lahZ /var/lib/sss/secrets/secrets.ldb
> > -rw-------. 1 root root system_u:object_r:sssd_var_lib_t:s0 1.6M Aug 29
> > 13:45 /var/lib/sss/secrets/secrets.ldb
> 
> This is the issue - it should be sssd:sssd owned.
> 
> Ownership has to be changed during upgrade:
> https://src.fedoraproject.org/rpms/sssd/blob/f41/f/sssd.spec#_1076
> 
> Do you have upgrade logs by chance?
> I wonder if there is something specific to Silverblue that made this to
> fail...

%post scriptlets can not change the content of files in /var in rpm-ostree systems as the scriplets are not run on systems live but during the compose, just like container builds.

Running migration scripts in scriptlets is brittle and prone to hard to diagnose failures.

Doing it in systemd service units like you did in https://github.com/SSSD/sssd/pull/7594 is better.

Comment 23 Christian Haag 2024-09-19 18:10:52 UTC
My apologies if this should be reported as a new bug, but I had a similar issue with SSSD when upgrading from FC40 to FC41. In my case, the directory tree /var/lib/sss/gpo_cache/[my_domain.com] did not update ownership from root:root to sssd:sssd, which prevented login with AD accounts. Changing ownership fixed my login issues.

Comment 24 Alexey Tikhonov 2024-09-20 09:45:26 UTC
(In reply to Christian Haag from comment #23)

> issue with SSSD when upgrading from FC40 to FC41. In my case, the directory
> tree /var/lib/sss/gpo_cache/[my_domain.com] did not update ownership from
> root:root to sssd:sssd, which prevented login with AD accounts.

Thank you.

https://github.com/SSSD/sssd/pull/7610

Comment 25 Fedora Blocker Bugs Application 2024-09-23 18:59:10 UTC
Proposed as a Freeze Exception for 41-final by Fedora user dustymabe using the blocker tracking app because:

 Because of this bug `logrotate.service` shows up as failed on OSTree based systems. If a fix become available it would be nice to include it as an FE so peoples systems don't have a failing service.

Comment 26 Alexey Tikhonov 2024-09-23 19:05:59 UTC
(In reply to Fedora Blocker Bugs Application from comment #25)
> 
>  Because of this bug `logrotate.service` shows up as failed on OSTree based
> systems.

Wasn't this ticket mixed with bz 2299733?

Comment 27 Dusty Mabe 2024-09-23 19:09:23 UTC
Alexey - possibly!

Maybe travier was mistaken when he linked this bug from https://github.com/coreos/fedora-coreos-tracker/issues/1798#issuecomment-2358965753 ?

Comment 28 Alexey Tikhonov 2024-09-24 06:25:23 UTC
(In reply to Dusty Mabe from comment #27)
> Alexey - possibly!
> 
> Maybe travier was mistaken when he linked this bug from
> https://github.com/coreos/fedora-coreos-tracker/issues/1798#issuecomment-
> 2358965753 ?

Ah, no, this seems to be correct reference.

Comment 29 Alexey Tikhonov 2024-09-25 13:14:15 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/7610

* `master`
    * f6ad1828cf0d59e48734a52b29549e53e33b65f3 - SYSTEMD: chown gpo-cache as well

Comment 30 František Zatloukal 2024-09-30 17:37:27 UTC
Discussed during the 2024-09-30 blocker review meeting: [1]

The decision to classify this bug as a AcceptedFreezeException (Final) was made:

"This can cause problems for atomic installs on rebase from 40 to 41, so if the fix doesn't make it before freeze, we should take it to avoid problems for users who rebase during the freeze."

[1] https://meetbot.fedoraproject.org/blocker-review_matrix_fedoraproject-org/2024-09-30/f41-blocker-review.2024-09-30-16.00.log.html

Comment 31 Timothée Ravier 2024-10-11 10:20:32 UTC
Gentle ping here. It would be great to get this in Fedora 41 before the freeze next week. Thanks

Comment 32 Fedora Update System 2024-10-15 12:36:21 UTC
FEDORA-2024-73827b9035 (sssd-2.10.0-1.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-73827b9035

Comment 33 Fedora Update System 2024-10-16 02:02:32 UTC
FEDORA-2024-73827b9035 has been pushed to the Fedora 41 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-73827b9035`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-73827b9035

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 34 Pavel Březina 2024-10-17 13:10:28 UTC
František, bodhi update seems to be delayed by the freeze even though it references this bug. Is there any process that I have to follow to push it through?

Comment 35 František Zatloukal 2024-10-17 21:34:52 UTC
(In reply to Pavel Březina from comment #34)
> František, bodhi update seems to be delayed by the freeze even though it
> references this bug. Is there any process that I have to follow to push it
> through?

It'll be be pulled in a in a bit. Adam Williamson usually (and I occasionally) handles the pull of fixes for Blockers and Freeze Exceptions, and the requests are made in batches as per the discretion of individual Quality Engineer handling the process (as each pull requires manual actions by the releng). So earlier in the freeze (as it is now), we usually wait till we can batch more changes together. In any case, this should be in in day or two if I had to guess.

Comment 36 Fedora Update System 2024-10-17 23:11:48 UTC
FEDORA-2024-73827b9035 (sssd-2.10.0-1.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.