When I've booted Rawhide KDE live images in QEMU/KVM VMs in GNOME Boxes and on bare metal, ldconfig.service blocked booting for 30-60 seconds. This problem has been happening for the last 2 months or so, it was originally around 30-40 s, but recently started taking up to 60 s. The latest Rawhide KDE live image I've tested this with was Fedora-KDE-Desktop-Live-Rawhide-20250226.n.0.x86_64.iso using systemd-257.3-7.fc43.x86_64. ldconfig.service for example took 54 seconds to start when booting a VM with that image. The peak memory usage of ldconfig.service appeared to be 2.9 GB. systemctl status ldconfig Warning: The unit file, source configuration file or drop-ins of ldconfig.service changed on disk. Run 'systemctl daemon-reload' to reload units. ● ldconfig.service - Rebuild Dynamic Linker Cache Loaded: loaded (/usr/lib/systemd/system/ldconfig.service; static) Drop-In: /usr/lib/systemd/system/service.d └─10-timeout-abort.conf, 50-keep-warm.conf Active: active (exited) since Thu 2025-02-27 01:27:48 UTC; 5h 1min ago Invocation: e9dcca66d7ae4b9d9f625f59f84b0a0f Docs: man:ldconfig(8) Process: 1252 ExecStart=/sbin/ldconfig -X (code=exited, status=0/SUCCESS) Main PID: 1252 (code=exited, status=0/SUCCESS) Mem peak: 2.9G CPU: 47.485s Feb 27 01:26:54 fedora systemd[1]: Starting ldconfig.service - Rebuild Dynamic Linker Cache... Feb 27 01:27:48 fedora systemd[1]: Finished ldconfig.service - Rebuild Dynamic Linker Cache. When I ran time /sbin/ldconfig -X manually in Konsole in the VM it took around 25 s /sbin/ldconfig: Can't create temporary cache file /etc/ld.so.cache~: Permission denied real 0m25.491s user 0m0.113s sys 0m22.766s This delay seemed to happen on every boot of the live images. I'll attach the journal from the VM boot I mentioned. Reproducible: Always Steps to Reproduce: 1. Download Fedora-KDE-Desktop-Live-Rawhide-20250226.n.0.x86_64.iso from https://koji.fedoraproject.org/koji/buildinfo?buildID=2666595 in a Fedora 41 KDE installation 2. Boot Fedora-KDE-Desktop-Live-Rawhide-20250226.n.0.x86_64.iso in a QEMU/KVM VM in GNOME Boxes with 4 GiB RAM, UEFI enabled, 3D acceleration enabled. 3. Actual Results: ldconfig.service blocked booting for 30-60 seconds with Rawhide KDE live images Expected Results: The boot should not have been blocked by ldconfig.service for that long.
Created attachment 2078001 [details] journal of boot of Rawhide KDE live image in a VM showing ldconfig.service delay
May you please run `time ldconfig -v -X >& update-cache.log` and provide the update-cache.log so we can see what's going on? Please also list the time took as reported by time. Thank you. It's odd to see that this system takes 25s to update the cache, but it depends on how many libraries the system has installed.
I don't think this is a bug in glibc. Sounds like a bug in the construction of the Live image. I'd expect that the unit is skipped in the Live image because neither of the two conditions in the unit are met: ConditionNeedsUpdate=|/etc ConditionFileNotEmpty=|!/etc/ld.so.cache On my laptop, ldconfig -X takes 5s. I can imagine that on a slow system it might take longer. Though this is in a VM, so the system shouldn't be slow. But 0 will always be better.
(In reply to Zbigniew Jędrzejewski-Szmek from comment #3) > I'd expect that the unit is skipped in the Live image because neither > of the two conditions in the unit are met: > ConditionNeedsUpdate=|/etc > ConditionFileNotEmpty=|!/etc/ld.so.cache That's a good point, the live image is static, so at startup it shouldn't be needed to update the cache? > On my laptop, ldconfig -X takes 5s. I can imagine that on a slow > system it might take longer. Though this is in a VM, so the system > shouldn't be slow. But 0 will always be better. It takes 0.236s on my laptop.
Please make sure that the cache is not used: $ sudo bash -c 'echo 3 >/proc/sys/vm/drop_caches' $ time ldconfig -v -X &>/dev/null $ time ldconfig -v -X &>/dev/null The first run takes 6 s, the second .1 s here.
I can reproduce it using Fedora-KDE-Desktop-Live-Rawhide-20250305.n.0.x86_64.iso, including this part from the systemctl status output: Mem peak: 2.9G This is very suspicious. I originally suspected an ldconfig bug, but “perf record” shows that a lot of time is spent in LZMA decompression. So I assume that systemd accounts for page cache activity triggered by ldconfig, not just memory usage. This must have to do how EROFS handles file compression, so I expect it's triggered by this change: <https://fedoraproject.org/wiki/Changes/EROFSforLiveMedia> We do not actually read that much data in ldconfig, I think. We use mmap, so it's a bit non-trivial to measure. The change page mentions a chunk size of 1 MiB, and there are around 1,800 shared objects in the KDE live image (according to “ldconfig -p” output). So it's probably read amplification due to the 1 MiB chunk size.
I tried to bring this to the Fedora devel thread for the change, but it looks like the message is stuck in moderation.
Created attachment 2079012 [details] update-cache.log from ldconfig I reproduced the problem using the same live image in a VM. ldconfig.service took 59 s during boot and used a maximum of 3 GB memory. systemctl status ldconfig Warning: The unit file, source configuration file or drop-ins of ldconfig.service changed on disk. Run 'systemctl daemon-reload' to reload units. ● ldconfig.service - Rebuild Dynamic Linker Cache Loaded: loaded (/usr/lib/systemd/system/ldconfig.service; static) Drop-In: /usr/lib/systemd/system/service.d └─10-timeout-abort.conf, 50-keep-warm.conf Active: active (exited) since Wed 2025-03-05 14:39:31 UTC; 5h 5min ago Invocation: 28ef6d99f7ef4908b66bc5aa45263906 Docs: man:ldconfig(8) Process: 1239 ExecStart=/sbin/ldconfig -X (code=exited, status=0/SUCCESS) Main PID: 1239 (code=exited, status=0/SUCCESS) Mem peak: 3G CPU: 50.563s Mar 05 14:38:32 fedora systemd[1]: Starting ldconfig.service - Rebuild Dynamic Linker Cache... Mar 05 14:39:31 fedora systemd[1]: Finished ldconfig.service - Rebuild Dynamic Linker Cache. Based on Zbigniew and Carlos' suggestions I ran in Konsole in the VM sudo bash -c 'echo 3 >/proc/sys/vm/drop_caches' time ldconfig -v -X >& update-cache.log real 0m50.037s user 0m0.160s sys 0m44.382s The update-cache.log from ldconfig is attached. Thanks.
Why is this running in the live environment in the first place?
I downloaded Fedora-KDE-Desktop-Live-Rawhide-20250305.n.0.x86_64.iso and I get similar results (25 s, 2.9 GB). The peak memory is reported by kernel for the cgroup, memory.peak attribute. So I think we have two bugs: unexpectedly large memory usage, and unexpected triggering of units with ConditionNeedsUpdate. /run/rootfsbase/etc/ld.so.cache is present in the live image. /etc/ld.so.cache is also present, because ldconfig.service ran. The two files are identical. This means that the service triggered on ConditionNeedsUpdate, not on ConditionFileNotEmpty=|!/etc/ld.so.cache (that file was never empty). /run/rootfsbase/etc/.updated is not present. I think this is the reason for the second bug. /run/rootfsbase/var/.updated is also not present. /run/rootfsbase/var/ is populated, so it doesn't make sense for that marker file to not be present.
The summary so far is as follows: * Block size tuning is likely wrong for this kind of deployment. * Unexpected triggering of a unit with ConditionNeedsUpdate. I don't think that unexpected large memory usage is the root bug here, but rather the block size tuning. I'm moving this over to LiveCD KDE to review why the ".updated" files are missing. The block size tuning still needs to be reviewed, and Florian commented here: https://bugzilla.redhat.com/show_bug.cgi?id=2346259#c4 though it's not clear that the timeline aligns with the changes.
(In reply to Carlos O'Donell from comment #11) > The summary so far is as follows: > > * Block size tuning is likely wrong for this kind of deployment. > * Unexpected triggering of a unit with ConditionNeedsUpdate. > > I don't think that unexpected large memory usage is the root bug here, but > rather the block size tuning. > > I'm moving this over to LiveCD KDE to review why the ".updated" files are > missing. > > The block size tuning still needs to be reviewed, and Florian commented here: > https://bugzilla.redhat.com/show_bug.cgi?id=2346259#c4 > though it's not clear that the timeline aligns with the changes. The block size tuning has been 1M for kiwi-based images since it was introduced for Fedora live images. Even squashfs ones (like Fedora KDE Mobile and MiracleWM in 41, and KDE Desktop ones in 42 and Rawhide prior to Feb 20) used 1M. That's the reason the EROFS one is 1M too. I can, of course, shrink it, but the consequence is that it will not necessarily compress as well (since blocksize is the main tunable for better compression). I'll see if raising the compression level can compensate for this. But what is supposed to create the ".updated" files? There's nothing in image build tools to do that.
/etc/.updated and /var/.updated were created by systemd-update-done.service on my F42 KDE installation. systemd-update-done.service seems like it's run after offline updates according to man systemd-update-done.service. cat /etc/.updated # This file was created by systemd-update-done. Its only # purpose is to hold a timestamp of the time this directory # was updated. See man:systemd-update-done.service(8). TIMESTAMP_NSEC=1740787734454480688
Then these will *never* get created as part of image builds, because the entire system is constructed offline.
Here's a pull request to reduce the chunksize and increase the compression: https://pagure.io/fedora-kiwi-descriptions/pull-request/159
I've merged and cherry-picked the PR to F42. People will need to test to see if tomorrow's composes don't have the issue anymore.
Thanks Neal, that should alleviate the problem. I was also exploring the approach with /etc/.updated. https://github.com/systemd/systemd/pull/36803 tweaks the comments and implements systemd-update-done --root. But it's not directly useful for us now. https://github.com/systemd/mkosi/pull/3602 implements a PoC in mkosi. The resulting image skips ldconfig.service and other "update" services as expected. I think it'd be good to do something equivalent in the Kiwi configs. (I'm not a Kiwi expert, so it'd probably be more efficient if Neal does this :--[]] )
One aspect I forgot to mention: ldconfig has an auxiliary cache specifically designed to avoid having to re-read all installed shared objects when ldconfig is run. But the change detection logic includes inode numbers and timestamps. The inode numbers change during erofs creation, so that cache is ineffective. I'm not sure if anything can be done about that.
(In reply to Zbigniew Jędrzejewski-Szmek from comment #17) > Thanks Neal, that should alleviate the problem. > > I was also exploring the approach with /etc/.updated. > https://github.com/systemd/systemd/pull/36803 tweaks the comments and > implements systemd-update-done --root. But it's not directly useful > for us now. > > https://github.com/systemd/mkosi/pull/3602 implements a PoC > in mkosi. The resulting image skips ldconfig.service and other > "update" services as expected. I think it'd be good to do something > equivalent in the Kiwi configs. (I'm not a Kiwi expert, so it'd > probably be more efficient if Neal does this :--[]] ) This can be done by adding it at the end of config.sh in the kiwi descriptions.
(In reply to Neal Gompa from comment #16) > I've merged and cherry-picked the PR to F42. People will need to test to see > if tomorrow's composes don't have the issue anymore. Hi, I tried `Fedora-Workstation-Live-Rawhide-20250320.n.0.x86_64.iso` and it seems it doesn't help much. Also I observed squashfs has lower Mem peak (780.3M) and CPU (17.922s) compared to EROFS Mem peak (2.7G) and CPU (38.690s). Currently I'm not sure how the difference in ldconfig comes from and what does Mem peak really mean. I need more time to know where does the difference come from. If it's too late for F42, maybe just use squashfs and reconsider EROFS in F43 then so I have more time to play into that?
I booted Fedora-KDE-Desktop-Live-Rawhide-20250320.n.0.x86_64.iso in a VM as described, and ldconfig took 72 s to run and had memory peak of 2.9 GB. Based on https://github.com/systemd/mkosi/pull/3602/commits/664066e2a5249eac03d374417b073d6238cd4ee4 I ran the following in the VM sudo touch -r "$BUILDROOT/usr" "$BUILDROOT/etc/.updated" "$BUILDROOT/var/.updated" sudo systemctl daemon-reload sudo systemctl restart ldconfig ldconfig.service was skipped because no trigger condition checks were met. That change should be enough to prevent ldconfig.service from starting on boot if it's added to the kiwi descriptions as mentioned by Neal in comment 19.
I'm doing it now: https://pagure.io/fedora-kiwi-descriptions/pull-request/163
Thanks. On subsequent boots of Fedora-KDE-Desktop-Live-Rawhide-20250320.n.0.x86_64.iso in a VM, /etc/.updated and /var/.updated were there already. liveuser@localhost-live:~$ ll /etc/.updated -rw-r--r--. 1 root root 208 Mar 20 07:01 /etc/.updated liveuser@localhost-live:~$ cat /etc/.updated # This file was created by systemd-update-done. Its only # purpose is to hold a timestamp of the time this directory # was updated. See man:systemd-update-done.service(8). TIMESTAMP_NSEC=1742454062853981977 ldconfig.service took 73 s to run and 2.9 GB peak memory. There was also the warning that files ldconfig.service were changed and to run systemctl daemon-reload. liveuser@localhost-live:~$ systemctl status ldconfig Warning: The unit file, source configuration file or drop-ins of ldconfig.service changed on disk. Run 'systemctl daemon-reload' to reload units. ● ldconfig.service - Rebuild Dynamic Linker Cache Loaded: loaded (/usr/lib/systemd/system/ldconfig.service; static) Drop-In: /usr/lib/systemd/system/service.d └─10-timeout-abort.conf, 50-keep-warm.conf Active: active (exited) since Thu 2025-03-20 19:07:15 UTC; 4h 2min ago Invocation: c4db4fe5566c46d499f821327197229e Docs: man:ldconfig(8) Process: 1237 ExecStart=/sbin/ldconfig -X (code=exited, status=0/SUCCESS) Main PID: 1237 (code=exited, status=0/SUCCESS) Mem peak: 2.9G CPU: 57.823s Mar 20 19:06:02 fedora systemd[1]: Starting ldconfig.service - Rebuild Dynamic Linker Cache... Mar 20 19:07:15 fedora systemd[1]: Finished ldconfig.service - Rebuild Dynamic Linker Cache. I ran the daemon reload and restarted ldconfig.service which was skipped because it's conditions weren't met. liveuser@localhost-live:~$ sudo systemctl daemon-reload liveuser@localhost-live:~$ sudo systemctl restart ldconfig liveuser@localhost-live:~$ systemctl status ldconfig ○ ldconfig.service - Rebuild Dynamic Linker Cache Loaded: loaded (/usr/lib/systemd/system/ldconfig.service; static) Drop-In: /usr/lib/systemd/system/service.d └─10-timeout-abort.conf, 50-keep-warm.conf Active: inactive (dead) since Thu 2025-03-20 23:11:19 UTC; 4s ago Duration: 4h 4min 4.172s Invocation: c4db4fe5566c46d499f821327197229e Condition: start condition unmet at Thu 2025-03-20 23:11:19 UTC; 4s ago ├─ ConditionNeedsUpdate=|/etc was not met └─ ConditionFileNotEmpty=|!/etc/ld.so.cache was not met Docs: man:ldconfig(8) Main PID: 1237 (code=exited, status=0/SUCCESS) Mar 20 19:06:02 fedora systemd[1]: Starting ldconfig.service - Rebuild Dynamic Linker Cache... Mar 20 19:07:15 fedora systemd[1]: Finished ldconfig.service - Rebuild Dynamic Linker Cache. Mar 20 23:11:19 localhost-live systemd[1]: ldconfig.service: Deactivated successfully. Mar 20 23:11:19 localhost-live systemd[1]: Stopped ldconfig.service - Rebuild Dynamic Linker Cache. Mar 20 23:11:19 localhost-live systemd[1]: Stopping ldconfig.service - Rebuild Dynamic Linker Cache... Mar 20 23:11:19 localhost-live systemd[1]: ldconfig.service - Rebuild Dynamic Linker Cache was skipped because no trigger condition checks were met. /etc/.updated and /var/.updated state that they were created by systemd-update-done, and systemd-update-done.service ran during the boot right after ldconfig.service stopped. So /etc/.updated wasn't created until after ldconfig.service checked if it existed. systemctl status systemd-update-done.service ● systemd-update-done.service - Update is Completed Loaded: loaded (/usr/lib/systemd/system/systemd-update-done.service; static) Drop-In: /usr/lib/systemd/system/service.d └─10-timeout-abort.conf, 50-keep-warm.conf Active: active (exited) since Thu 2025-03-20 19:07:16 UTC; 4h 17min ago Invocation: 8f70130d4544460d85792a32907493b7 Docs: man:systemd-update-done.service(8) Main PID: 1300 (code=exited, status=0/SUCCESS) Mem peak: 27.1M CPU: 606ms Mar 20 19:07:15 fedora systemd[1]: Starting systemd-update-done.service - Update is Completed... Mar 20 19:07:16 fedora systemd[1]: Finished systemd-update-done.service - Update is Completed.
I booted Fedora-KDE-Desktop-Live-Rawhide-20250321.n.0.x86_64.iso, and ldconfig.service didn't run because no condition checks were met. /etc/.updated and /var/.updated were empty files as I'd expect from the touch command added at https://pagure.io/fedora-kiwi-descriptions/c/d7316ab54385a248ed453ab2b6067d81becc7ce4 Thanks.
I can confirm that Fedora-KDE-Desktop-Live-Rawhide-20250320.n.0.x86_64.iso still had the problem, but Fedora-KDE-Desktop-Live-Rawhide-20250321.n.0.x86_64.iso is fixed. There's still the warning that units have changed. I think this is because systemd does a preset-all during boot and some units are enabled. This is really a bug in packaging, those units should be enabled by %post scriptlets already. The units are: - rpmdb-rebuild.service → https://src.fedoraproject.org/rpms/rpm/pull-request/55 - systemd-network-generator.service → I see this one is missing from %post in systemd. I'll take a look. - vgauthd.service → https://src.fedoraproject.org/rpms/fedora-release/pull-request/370, https://src.fedoraproject.org/rpms/fedora-release/pull-request/371
→ https://src.fedoraproject.org/rpms/systemd/pull-request/200 for systemd