Bug 2348669 - ldconfig.service blocked booting for 30-60 seconds with Rawhide KDE live images
Summary: ldconfig.service blocked booting for 30-60 seconds with Rawhide KDE live images
Keywords:
Status: MODIFIED
Alias: None
Product: Fedora
Classification: Fedora
Component: LiveCD - KDE
Version: rawhide
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Neal Gompa
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-02-27 06:52 UTC by Matt Fagnani
Modified: 2025-03-22 19:16 UTC (History)
27 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)
journal of boot of Rawhide KDE live image in a VM showing ldconfig.service delay (255.70 KB, text/plain)
2025-02-27 06:54 UTC, Matt Fagnani
no flags Details
update-cache.log from ldconfig (86.48 KB, text/plain)
2025-03-05 19:53 UTC, Matt Fagnani
no flags Details

Description Matt Fagnani 2025-02-27 06:52:45 UTC
When I've booted Rawhide KDE live images in QEMU/KVM VMs in GNOME Boxes and on bare metal, ldconfig.service blocked booting for 30-60 seconds. This problem has been happening for the last 2 months or so, it was originally around 30-40 s, but recently started taking up to 60 s. The latest Rawhide KDE live image I've tested this with was Fedora-KDE-Desktop-Live-Rawhide-20250226.n.0.x86_64.iso using systemd-257.3-7.fc43.x86_64. ldconfig.service for example took 54 seconds to start when booting a VM with that image. The peak memory usage of ldconfig.service appeared to be 2.9 GB.

systemctl status ldconfig
Warning: The unit file, source configuration file or drop-ins of ldconfig.service changed on disk. Run 'systemctl daemon-reload' to reload units.
● ldconfig.service - Rebuild Dynamic Linker Cache
     Loaded: loaded (/usr/lib/systemd/system/ldconfig.service; static)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf, 50-keep-warm.conf
     Active: active (exited) since Thu 2025-02-27 01:27:48 UTC; 5h 1min ago
 Invocation: e9dcca66d7ae4b9d9f625f59f84b0a0f
       Docs: man:ldconfig(8)
    Process: 1252 ExecStart=/sbin/ldconfig -X (code=exited, status=0/SUCCESS)
   Main PID: 1252 (code=exited, status=0/SUCCESS)
   Mem peak: 2.9G
        CPU: 47.485s

Feb 27 01:26:54 fedora systemd[1]: Starting ldconfig.service - Rebuild Dynamic Linker Cache...
Feb 27 01:27:48 fedora systemd[1]: Finished ldconfig.service - Rebuild Dynamic Linker Cache.

When I ran time /sbin/ldconfig -X manually in Konsole in the VM it took around 25 s
/sbin/ldconfig: Can't create temporary cache file /etc/ld.so.cache~: Permission denied

real    0m25.491s
user    0m0.113s
sys     0m22.766s

This delay seemed to happen on every boot of the live images. I'll attach the journal from the VM boot I mentioned.



Reproducible: Always

Steps to Reproduce:
1. Download Fedora-KDE-Desktop-Live-Rawhide-20250226.n.0.x86_64.iso from https://koji.fedoraproject.org/koji/buildinfo?buildID=2666595 in a Fedora 41 KDE installation
2. Boot Fedora-KDE-Desktop-Live-Rawhide-20250226.n.0.x86_64.iso in a QEMU/KVM VM in GNOME Boxes with 4 GiB RAM, UEFI enabled, 3D acceleration enabled.

3.
Actual Results:  
ldconfig.service blocked booting for 30-60 seconds with Rawhide KDE live images

Expected Results:  
The boot should not have been blocked by ldconfig.service for that long.

Comment 1 Matt Fagnani 2025-02-27 06:54:33 UTC
Created attachment 2078001 [details]
journal of boot of Rawhide KDE live image in a VM showing ldconfig.service delay

Comment 2 Carlos O'Donell 2025-03-05 15:36:10 UTC
May you please run `time ldconfig -v -X >& update-cache.log` and provide the update-cache.log so we can see what's going on? Please also list the time took as reported by time.

Thank you. It's odd to see that this system takes 25s to update the cache, but it depends on how many libraries the system has installed.

Comment 3 Zbigniew Jędrzejewski-Szmek 2025-03-05 15:39:05 UTC
I don't think this is a bug in glibc.
Sounds like a bug in the construction of the Live image.

I'd expect that the unit is skipped in the Live image because neither
of the two conditions in the unit are met:
ConditionNeedsUpdate=|/etc
ConditionFileNotEmpty=|!/etc/ld.so.cache

On my laptop, ldconfig -X takes 5s. I can imagine that on a slow
system it might take longer. Though this is in a VM, so the system
shouldn't be slow. But 0 will always be better.

Comment 4 Carlos O'Donell 2025-03-05 15:45:20 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #3)
> I'd expect that the unit is skipped in the Live image because neither
> of the two conditions in the unit are met:
> ConditionNeedsUpdate=|/etc
> ConditionFileNotEmpty=|!/etc/ld.so.cache

That's a good point, the live image is static, so at startup it shouldn't be needed to update the cache?
 
> On my laptop, ldconfig -X takes 5s. I can imagine that on a slow
> system it might take longer. Though this is in a VM, so the system
> shouldn't be slow. But 0 will always be better.

It takes 0.236s on my laptop.

Comment 5 Zbigniew Jędrzejewski-Szmek 2025-03-05 15:55:29 UTC
Please make sure that the cache is not used:
$ sudo bash -c 'echo 3 >/proc/sys/vm/drop_caches'
$ time ldconfig -v -X &>/dev/null
$ time ldconfig -v -X &>/dev/null

The first run takes 6 s, the second .1 s here.

Comment 6 Florian Weimer 2025-03-05 17:55:21 UTC
I can reproduce it using Fedora-KDE-Desktop-Live-Rawhide-20250305.n.0.x86_64.iso, including this part from the systemctl status output:

   Mem peak: 2.9G

This is very suspicious. I originally suspected an ldconfig bug, but “perf record” shows that a lot of time is spent in LZMA decompression. So I assume that systemd accounts for page cache activity triggered by ldconfig, not just memory usage.

This must have to do how EROFS handles file compression, so I expect it's triggered by this change: <https://fedoraproject.org/wiki/Changes/EROFSforLiveMedia>

We do not actually read that much data in ldconfig, I think. We use mmap, so it's a bit non-trivial to measure. The change page mentions a chunk size of 1 MiB, and there are around 1,800 shared objects in the KDE live image (according to “ldconfig -p” output). So it's probably read amplification due to the 1 MiB chunk size.

Comment 7 Florian Weimer 2025-03-05 18:01:39 UTC
I tried to bring this to the Fedora devel thread for the change, but it looks like the message is stuck in moderation.

Comment 8 Matt Fagnani 2025-03-05 19:53:10 UTC
Created attachment 2079012 [details]
update-cache.log from ldconfig

I reproduced the problem using the same live image in a VM. ldconfig.service took 59 s during boot and used a maximum of 3 GB memory.

systemctl status ldconfig
Warning: The unit file, source configuration file or drop-ins of ldconfig.service changed on disk. Run 'systemctl daemon-reload' to reload units.
● ldconfig.service - Rebuild Dynamic Linker Cache
     Loaded: loaded (/usr/lib/systemd/system/ldconfig.service; static)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf, 50-keep-warm.conf
     Active: active (exited) since Wed 2025-03-05 14:39:31 UTC; 5h 5min ago
 Invocation: 28ef6d99f7ef4908b66bc5aa45263906
       Docs: man:ldconfig(8)
    Process: 1239 ExecStart=/sbin/ldconfig -X (code=exited, status=0/SUCCESS)
   Main PID: 1239 (code=exited, status=0/SUCCESS)
   Mem peak: 3G
        CPU: 50.563s

Mar 05 14:38:32 fedora systemd[1]: Starting ldconfig.service - Rebuild Dynamic Linker Cache...
Mar 05 14:39:31 fedora systemd[1]: Finished ldconfig.service - Rebuild Dynamic Linker Cache.

Based on Zbigniew and Carlos' suggestions I ran in Konsole in the VM
sudo bash -c 'echo 3 >/proc/sys/vm/drop_caches'
time ldconfig -v -X >& update-cache.log

real    0m50.037s
user    0m0.160s
sys     0m44.382s

The update-cache.log from ldconfig is attached. Thanks.

Comment 9 Neal Gompa 2025-03-05 19:54:54 UTC
Why is this running in the live environment in the first place?

Comment 10 Zbigniew Jędrzejewski-Szmek 2025-03-05 21:03:06 UTC
I downloaded Fedora-KDE-Desktop-Live-Rawhide-20250305.n.0.x86_64.iso and I get similar results (25 s, 2.9 GB).
The peak memory is reported by kernel for the cgroup, memory.peak attribute.

So I think we have two bugs: unexpectedly large memory usage, and
unexpected triggering of units with ConditionNeedsUpdate.

/run/rootfsbase/etc/ld.so.cache is present in the live image.
/etc/ld.so.cache is also present, because ldconfig.service ran.
The two files are identical. This means that the service triggered
on ConditionNeedsUpdate, not on ConditionFileNotEmpty=|!/etc/ld.so.cache
(that file was never empty).

/run/rootfsbase/etc/.updated is not present. I think this is the reason
for the second bug. 

/run/rootfsbase/var/.updated is also not present. /run/rootfsbase/var/
is populated, so it doesn't make sense for that marker file to not be
present.

Comment 11 Carlos O'Donell 2025-03-07 14:25:43 UTC
The summary so far is as follows:

 * Block size tuning is likely wrong for this kind of deployment.
 * Unexpected triggering of a unit with ConditionNeedsUpdate.

I don't think that unexpected large memory usage is the root bug here, but rather the block size tuning.

I'm moving this over to LiveCD KDE to review why the ".updated" files are missing.

The block size tuning still needs to be reviewed, and Florian commented here:
https://bugzilla.redhat.com/show_bug.cgi?id=2346259#c4
though it's not clear that the timeline aligns with the changes.

Comment 12 Neal Gompa 2025-03-07 16:04:38 UTC
(In reply to Carlos O'Donell from comment #11)
> The summary so far is as follows:
> 
>  * Block size tuning is likely wrong for this kind of deployment.
>  * Unexpected triggering of a unit with ConditionNeedsUpdate.
> 
> I don't think that unexpected large memory usage is the root bug here, but
> rather the block size tuning.
> 
> I'm moving this over to LiveCD KDE to review why the ".updated" files are
> missing.
> 
> The block size tuning still needs to be reviewed, and Florian commented here:
> https://bugzilla.redhat.com/show_bug.cgi?id=2346259#c4
> though it's not clear that the timeline aligns with the changes.

The block size tuning has been 1M for kiwi-based images since it was introduced for Fedora live images. Even squashfs ones (like Fedora KDE Mobile and MiracleWM in 41, and KDE Desktop ones in 42 and Rawhide prior to Feb 20) used 1M. That's the reason the EROFS one is 1M too.

I can, of course, shrink it, but the consequence is that it will not necessarily compress as well (since blocksize is the main tunable for better compression). I'll see if raising the compression level can compensate for this.

But what is supposed to create the ".updated" files? There's nothing in image build tools to do that.

Comment 13 Matt Fagnani 2025-03-08 00:32:21 UTC
/etc/.updated and /var/.updated were created by systemd-update-done.service on my F42 KDE installation. systemd-update-done.service seems like it's run after offline updates according to man systemd-update-done.service.

cat /etc/.updated
# This file was created by systemd-update-done. Its only 
# purpose is to hold a timestamp of the time this directory
# was updated. See man:systemd-update-done.service(8).
TIMESTAMP_NSEC=1740787734454480688

Comment 14 Neal Gompa 2025-03-08 06:52:24 UTC
Then these will *never* get created as part of image builds, because the entire system is constructed offline.

Comment 15 Neal Gompa 2025-03-18 13:49:19 UTC
Here's a pull request to reduce the chunksize and increase the compression: https://pagure.io/fedora-kiwi-descriptions/pull-request/159

Comment 16 Neal Gompa 2025-03-19 12:43:49 UTC
I've merged and cherry-picked the PR to F42. People will need to test to see if tomorrow's composes don't have the issue anymore.

Comment 17 Zbigniew Jędrzejewski-Szmek 2025-03-19 16:58:26 UTC
Thanks Neal, that should alleviate the problem.

I was also exploring the approach with /etc/.updated.
https://github.com/systemd/systemd/pull/36803 tweaks the comments and
implements systemd-update-done --root. But it's not directly useful
for us now.

https://github.com/systemd/mkosi/pull/3602 implements a PoC
in mkosi. The resulting image skips ldconfig.service and other
"update" services as expected. I think it'd be good to do something
equivalent in the Kiwi configs. (I'm not a Kiwi expert, so it'd
probably be more efficient if Neal does this :--[]] )

Comment 18 Florian Weimer 2025-03-19 17:31:12 UTC
One aspect I forgot to mention: ldconfig has an auxiliary cache specifically designed to avoid having to re-read all installed shared objects when ldconfig is run. But the change detection logic includes inode numbers and timestamps. The inode numbers change during erofs creation, so that cache is ineffective. I'm not sure if anything can be done about that.

Comment 19 Neal Gompa 2025-03-19 21:47:25 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #17)
> Thanks Neal, that should alleviate the problem.
> 
> I was also exploring the approach with /etc/.updated.
> https://github.com/systemd/systemd/pull/36803 tweaks the comments and
> implements systemd-update-done --root. But it's not directly useful
> for us now.
> 
> https://github.com/systemd/mkosi/pull/3602 implements a PoC
> in mkosi. The resulting image skips ldconfig.service and other
> "update" services as expected. I think it'd be good to do something
> equivalent in the Kiwi configs. (I'm not a Kiwi expert, so it'd
> probably be more efficient if Neal does this :--[]] )

This can be done by adding it at the end of config.sh in the kiwi descriptions.

Comment 20 Gao Xiang 2025-03-20 13:48:23 UTC


(In reply to Neal Gompa from comment #16)
> I've merged and cherry-picked the PR to F42. People will need to test to see
> if tomorrow's composes don't have the issue anymore.

Hi, I tried `Fedora-Workstation-Live-Rawhide-20250320.n.0.x86_64.iso` and it seems it doesn't help much.
Also I observed squashfs has lower Mem peak (780.3M) and CPU (17.922s) compared to EROFS Mem peak (2.7G) and CPU (38.690s).

Currently I'm not sure how the difference in ldconfig comes from and what does Mem peak really mean.
I need more time to know where does the difference come from.
If it's too late for F42, maybe just use squashfs and reconsider EROFS in F43 then so I have more time to play into that?

Comment 21 Matt Fagnani 2025-03-20 18:58:01 UTC
I booted Fedora-KDE-Desktop-Live-Rawhide-20250320.n.0.x86_64.iso in a VM as described, and ldconfig took 72 s to run and had memory peak of 2.9 GB. Based on https://github.com/systemd/mkosi/pull/3602/commits/664066e2a5249eac03d374417b073d6238cd4ee4 I ran the following in the VM
sudo touch -r "$BUILDROOT/usr" "$BUILDROOT/etc/.updated" "$BUILDROOT/var/.updated"
sudo systemctl daemon-reload
sudo systemctl restart ldconfig

ldconfig.service was skipped because no trigger condition checks were met. That change should be enough to prevent ldconfig.service from starting on boot if it's added to the kiwi descriptions as mentioned by Neal in comment 19.

Comment 22 Neal Gompa 2025-03-20 22:30:33 UTC
I'm doing it now: https://pagure.io/fedora-kiwi-descriptions/pull-request/163

Comment 23 Matt Fagnani 2025-03-20 23:32:30 UTC
Thanks. On subsequent boots of Fedora-KDE-Desktop-Live-Rawhide-20250320.n.0.x86_64.iso in a VM, /etc/.updated and /var/.updated were there already.

liveuser@localhost-live:~$ ll /etc/.updated
-rw-r--r--. 1 root root 208 Mar 20 07:01 /etc/.updated
liveuser@localhost-live:~$ cat /etc/.updated
# This file was created by systemd-update-done. Its only 
# purpose is to hold a timestamp of the time this directory
# was updated. See man:systemd-update-done.service(8).
TIMESTAMP_NSEC=1742454062853981977

ldconfig.service took 73 s to run and 2.9 GB peak memory. There was also the warning that files ldconfig.service were changed and to run systemctl daemon-reload.
 
liveuser@localhost-live:~$ systemctl status ldconfig
Warning: The unit file, source configuration file or drop-ins of ldconfig.service changed on disk. Run 'systemctl daemon-reload' to reload units.
● ldconfig.service - Rebuild Dynamic Linker Cache
     Loaded: loaded (/usr/lib/systemd/system/ldconfig.service; static)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf, 50-keep-warm.conf
     Active: active (exited) since Thu 2025-03-20 19:07:15 UTC; 4h 2min ago
 Invocation: c4db4fe5566c46d499f821327197229e
       Docs: man:ldconfig(8)
    Process: 1237 ExecStart=/sbin/ldconfig -X (code=exited, status=0/SUCCESS)
   Main PID: 1237 (code=exited, status=0/SUCCESS)
   Mem peak: 2.9G
        CPU: 57.823s

Mar 20 19:06:02 fedora systemd[1]: Starting ldconfig.service - Rebuild Dynamic Linker Cache...
Mar 20 19:07:15 fedora systemd[1]: Finished ldconfig.service - Rebuild Dynamic Linker Cache.

I ran the daemon reload and restarted ldconfig.service which was skipped because it's conditions weren't met.

liveuser@localhost-live:~$ sudo systemctl daemon-reload
liveuser@localhost-live:~$ sudo systemctl restart ldconfig
liveuser@localhost-live:~$ systemctl status ldconfig
○ ldconfig.service - Rebuild Dynamic Linker Cache
     Loaded: loaded (/usr/lib/systemd/system/ldconfig.service; static)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf, 50-keep-warm.conf
     Active: inactive (dead) since Thu 2025-03-20 23:11:19 UTC; 4s ago
   Duration: 4h 4min 4.172s
 Invocation: c4db4fe5566c46d499f821327197229e
  Condition: start condition unmet at Thu 2025-03-20 23:11:19 UTC; 4s ago
             ├─ ConditionNeedsUpdate=|/etc was not met
             └─ ConditionFileNotEmpty=|!/etc/ld.so.cache was not met
       Docs: man:ldconfig(8)
   Main PID: 1237 (code=exited, status=0/SUCCESS)

Mar 20 19:06:02 fedora systemd[1]: Starting ldconfig.service - Rebuild Dynamic Linker Cache...
Mar 20 19:07:15 fedora systemd[1]: Finished ldconfig.service - Rebuild Dynamic Linker Cache.
Mar 20 23:11:19 localhost-live systemd[1]: ldconfig.service: Deactivated successfully.
Mar 20 23:11:19 localhost-live systemd[1]: Stopped ldconfig.service - Rebuild Dynamic Linker Cache.
Mar 20 23:11:19 localhost-live systemd[1]: Stopping ldconfig.service - Rebuild Dynamic Linker Cache...
Mar 20 23:11:19 localhost-live systemd[1]: ldconfig.service - Rebuild Dynamic Linker Cache was skipped because no trigger condition checks were met.

/etc/.updated and /var/.updated state that they were created by systemd-update-done, and systemd-update-done.service ran during the boot right after ldconfig.service stopped. So /etc/.updated wasn't created until after ldconfig.service checked if it existed. 
 
systemctl status systemd-update-done.service
● systemd-update-done.service - Update is Completed
     Loaded: loaded (/usr/lib/systemd/system/systemd-update-done.service; static)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf, 50-keep-warm.conf
     Active: active (exited) since Thu 2025-03-20 19:07:16 UTC; 4h 17min ago
 Invocation: 8f70130d4544460d85792a32907493b7
       Docs: man:systemd-update-done.service(8)
   Main PID: 1300 (code=exited, status=0/SUCCESS)
   Mem peak: 27.1M
        CPU: 606ms

Mar 20 19:07:15 fedora systemd[1]: Starting systemd-update-done.service - Update is Completed...
Mar 20 19:07:16 fedora systemd[1]: Finished systemd-update-done.service - Update is Completed.

Comment 24 Matt Fagnani 2025-03-21 20:55:39 UTC
I booted Fedora-KDE-Desktop-Live-Rawhide-20250321.n.0.x86_64.iso, and ldconfig.service didn't run because no condition checks were met. /etc/.updated and /var/.updated were empty files as I'd expect from the touch command added at https://pagure.io/fedora-kiwi-descriptions/c/d7316ab54385a248ed453ab2b6067d81becc7ce4 Thanks.

Comment 25 Zbigniew Jędrzejewski-Szmek 2025-03-22 18:40:32 UTC
I can confirm that Fedora-KDE-Desktop-Live-Rawhide-20250320.n.0.x86_64.iso still had the problem,
but Fedora-KDE-Desktop-Live-Rawhide-20250321.n.0.x86_64.iso is fixed.

There's still the warning that units have changed. I think this is because
systemd does a preset-all during boot and some units are enabled. This is
really a bug in packaging, those units should be enabled by %post scriptlets
already.

The units are:
- rpmdb-rebuild.service → https://src.fedoraproject.org/rpms/rpm/pull-request/55
- systemd-network-generator.service → I see this one is missing from %post in systemd. I'll take a look.
- vgauthd.service → https://src.fedoraproject.org/rpms/fedora-release/pull-request/370,
                    https://src.fedoraproject.org/rpms/fedora-release/pull-request/371

Comment 26 Zbigniew Jędrzejewski-Szmek 2025-03-22 19:16:33 UTC
https://src.fedoraproject.org/rpms/systemd/pull-request/200 for systemd


Note You need to log in before you can comment on or make changes to this bug.