Bug 1195998

Summary: ldconfig.service runs on live boot, slowing down the boot substantially
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: spin-kickstartsAssignee: Jeroen van Meeuwen <vanmeeuwen+fedora>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 22CC: admiller, bruno, bugzilla, fedora, johannbg, jsynacek, kevin, kparal, lnykryn, massi.ergosum, mcatanzaro+wrong-account-do-not-cc, msekleta, pschindl, s, systemd-maint, vanmeeuwen+fedora, vpavlin, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard: AcceptedFreezeException
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-07 15:46:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1043123    

Description Adam Williamson 2015-02-25 01:55:07 UTC
While debugging other issues, we noticed that ldconfig.service - which runs '/sbin/ldconfig -X' - appears to run on every Fedora live image boot, and on current F22 live images, takes 20-30 secs in a typical VM. This is substantially slowing down live image boot.

The service is conditional:

ConditionNeedsUpdate=/etc

from the documentation for ConditionNeedsUpdate and the actual intended purpose of the service I'm not *entirely* sure if it's appropriate/necessary for it to run on live boot, and if not, whether the responsibility for making it not run should be on systemd or spin-kickstarts. So I'm filing on systemd to start with, but we can re-assign if necessary.

Do we actually want to run this on live boot? If not, is the problem that ConditionNeedsUpdate services shouldn't run on first boot of a system? Or if that's correct, do we need to add a ConditionKernelCommandLine to stop it running from live boots?

Thanks!

Comment 1 Adam Williamson 2015-02-25 01:55:42 UTC
Proposing as an Alpha FE, on the grounds that if it's appropriate to not run this service it should be a relatively safe change and make Alpha lives boot a lot faster.

Comment 2 Chris Murphy 2015-03-01 03:53:10 UTC
~50s boot delay on baremetal and VM is pretty icky.

Fedora-Live-Workstation-x86_64-22_Alpha-TC7.iso which uses systemd-219-4. There's nothing in the journal or systemctl status that explains why this takes so long.


[   21.044998] localhost audispd[1181]: audispd initialized with q_depth=150 and 1 active plugins
[   71.871867] localhost systemd[1]: Started Rebuild Dynamic Linker Cache.


# systemctl status ldconfig.service
● ldconfig.service - Rebuild Dynamic Linker Cache
   Loaded: loaded (/usr/lib/systemd/system/ldconfig.service; static; vendor preset: disabled)
   Active: active (exited) since Sat 2015-02-28 22:14:40 EST; 34min ago
     Docs: man:ldconfig(8)
  Process: 792 ExecStart=/sbin/ldconfig -X (code=exited, status=0/SUCCESS)
 Main PID: 792 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/ldconfig.service

Feb 28 22:13:53 localhost systemd[1]: Starting Rebuild Dynamic Linker Cache...
Feb 28 22:14:40 localhost systemd[1]: Started Rebuild Dynamic Linker Cache.

Comment 3 Petr Schindler 2015-03-02 21:38:59 UTC
Discussed at today's blocker review meeting [1].

This bug was accepted as Freeze Exception - This bug has been granted FE status. Please apply a fix before the next compose for testing.

http://meetbot.fedoraproject.org/fedora-blocker-review/2015-03-02/

Comment 4 Zbigniew Jędrzejewski-Szmek 2015-03-04 02:18:21 UTC
It takes 280 ms in my VM. It have no idea why it is so slow for some people.

Anyway, this service can be safely disabled on a live image. The most obvious way would be to uninstall ldconfig.service, or to mask it. But actually a nicer option would be to touch /etc/.updated and /var/.updated after creating the image (some time after all packages have been installed and /usr will not be touched anymore). This would have the advantage that it would keep things closer to a normal installation and would also prevent any other service which is conditionalized on ConditionNeedsUpdate from needlessly running. systemd itself installs 5 of those, and not running them could shave some significant milliseconds from Live boot.

Comment 5 Adam Williamson 2015-03-04 07:07:25 UTC
sure, that can work, we can do it in spin-kickstarts or livecd-tools i guess. We do also have ConditionKernelCommandLine=!rd.live.image , I think we've used that for other stuff.

Comment 6 Kevin Fenzi 2015-03-04 20:30:31 UTC
So, I guess something like: 

diff --git a/fedora-live-base.ks b/fedora-live-base.ks
index 8f2ddc2..264f118 100644
--- a/fedora-live-base.ks
+++ b/fedora-live-base.ks
@@ -194,6 +194,10 @@ systemctl --no-reload disable atd.service 2> /dev/null || :
 systemctl stop crond.service 2> /dev/null || :
 systemctl stop atd.service 2> /dev/null || :
 
+# don't run ldconfig, it makes boot on live very slow
+systemctl --no-reload disable ldconfig.service 2> /dev/null || :
+systemctl stop ldconfig.service 2> /dev/null || :
+
 # Mark things as configured
 touch /.liveimg-configured
 
might work? I suppose I could push to rawhide and we can see how well it works there first?

Comment 7 Zbigniew Jędrzejewski-Szmek 2015-03-04 21:52:00 UTC
(In reply to Kevin Fenzi from comment #6)
> So, I guess something like: 
> 
> diff --git a/fedora-live-base.ks b/fedora-live-base.ks
> index 8f2ddc2..264f118 100644
> --- a/fedora-live-base.ks
> +++ b/fedora-live-base.ks
> @@ -194,6 +194,10 @@ systemctl --no-reload disable atd.service 2> /dev/null
> || :
>  systemctl stop crond.service 2> /dev/null || :
>  systemctl stop atd.service 2> /dev/null || :
>  
> +# don't run ldconfig, it makes boot on live very slow
> +systemctl --no-reload disable ldconfig.service 2> /dev/null || :
> +systemctl stop ldconfig.service 2> /dev/null || :
The service is a Type=oneshot service, so it's unlikely to be running, and stopping it is probably useless. The first part would work, but only for this service. Why not do the thing I suggested in comment #c4:

diff --git fedora-live-base.ks fedora-live-base.ks
index 8f2ddc29c3..785c1676c6 100644
--- fedora-live-base.ks
+++ fedora-live-base.ks
@@ -305,6 +305,8 @@ if [ -x /usr/bin/fc-cache ] ; then
    fc-cache -f
 fi
 
+echo 'File created by kickstart. See systemd-update-done.service(8).' \
+    | tee /etc/.updated >/var/.updated
 %end

Comment 8 Kevin Fenzi 2015-03-06 20:19:02 UTC
ok, lets give it a shot. 

Pushed to rawhide. Will see how it does tomorrow.

Comment 9 Kevin Fenzi 2015-03-07 13:59:57 UTC
Seems to work. ;) 

I'll cherry pick it over to f22 branch at some point.

Comment 10 Kevin Fenzi 2015-03-07 15:46:24 UTC
I went ahead and pushed this to f22 also. 

Please re-open if you see it again.

Comment 11 Wolfgang Ulbrich 2015-03-08 15:18:16 UTC
I can confirm that the fix prevent ldconfig from running, tested with workstation rawhide nigtly build from yesterday.
Boot time is about 1min30!
For some reason, NetworkManager-wait-online.service starts during boot of live images and increase the boot time too, about 30 secs.
Note, the unit isn't enabled.
In a local mate livecd build i mask the service and the boot time is normal again, about 30-35 secs in a VM.
So it looks like that a other service starts NetworkManager-wait-online.service or there is another reason for that.
But i checked dependencies of all units which are enabled, but none of them seems to start NetworkManager-wait-online.service.

Comment 12 Zbigniew Jędrzejewski-Szmek 2015-03-08 15:20:17 UTC
What about ntpdate or something like that?

Comment 13 Wolfgang Ulbrich 2015-03-08 16:26:57 UTC
Mask network-online.target helps a lot but NetworkManager-wait-online.service starts nevertheless.

[root@localhost liveuser]# systemd-analyze 
Startup finished in 1.274s (kernel) + 3.903s (initrd) + 30.755s (userspace) = 35.934s
[root@localhost liveuser]# systemd-analyze blame
         10.930s firewalld.service
          9.441s livesys.service
          8.095s accounts-daemon.service
          6.350s NetworkManager-wait-online.service
          3.677s rsyslog.service
          3.556s gssproxy.service
          2.785s proc-fs-nfsd.mount
          2.490s lvm2-monitor.service
          1.850s systemd-journald.service
          1.849s fedora-readonly.service
          1.843s dmraid-activation.service
          1.820s polkit.service
          1.744s systemd-logind.service
          1.704s systemd-udev-settle.service
          1.528s NetworkManager.service
          1.475s chronyd.service
          1.460s systemd-tmpfiles-setup-dev.service
          1.368s fedora-import-state.service
          1.338s spice-vdagentd.service
          1.317s systemd-udev-trigger.service
          1.266s avahi-daemon.service
          1.218s tmp.mount
          1.121s kmod-static-nodes.service
           866ms systemd-sysctl.service
           860ms mcelog.service
           825ms plymouth-read-write.service
           807ms udisks2.service
<snip>

[root@localhost liveuser]# systemctl status ntpd
ntpdate.service  ntpd.service     
[root@localhost liveuser]# systemctl status ntpdate.service 
● ntpdate.service
   Loaded: masked (/dev/null)
   Active: inactive (dead)

Comment 14 Wolfgang Ulbrich 2015-03-08 16:35:11 UTC
opps, sorry
i meant mask ntpdate.service.

Comment 15 Zbigniew Jędrzejewski-Szmek 2015-03-08 17:12:04 UTC
OK, nfs is the culprit. Good news is that it is already fixed (at least upstream), see see-also-ed bug.

Comment 16 Wolfgang Ulbrich 2015-03-08 18:26:01 UTC
confirm, i masked rpc-statd-notify.service and nfs-utils.service, boot time is OK now and NetworkManager-wait-online.service don't starts.

[liveuser@localhost ~]$ systemd-analyze 
Startup finished in 1.310s (kernel) + 4.122s (initrd) + 25.996s (userspace) = 31.429s
[liveuser@localhost ~]$ systemd-analyze blame
         14.127s firewalld.service
         13.048s livesys.service
         12.560s accounts-daemon.service
         10.961s dev-sr0.device
          9.899s dev-mapper-live\x2drw.device
          6.338s spice-vdagentd.service
          5.985s gssproxy.service
          5.979s mcelog.service
          5.970s nfs-config.service
          5.962s rsyslog.service
          5.930s systemd-logind.service
          5.892s rtkit-daemon.service
          5.881s avahi-daemon.service
          5.833s chronyd.service
          4.457s systemd-udev-settle.service
          3.072s proc-fs-nfsd.mount
          2.587s lvm2-monitor.service
          2.408s systemd-journald.service
          2.349s polkit.service
          2.222s udisks2.service
          2.149s systemd-tmpfiles-setup-dev.service
          2.105s fedora-readonly.service
          1.770s systemd-udev-trigger.service
          1.358s systemd-sysctl.service
           833ms dev-mqueue.mount
           753ms NetworkManager.service
<snip>

Comment 17 Wolfgang Ulbrich 2015-03-22 19:15:58 UTC
Confirmed that nfs-utils-1.3.2-2.0.fc22 fix the issue with mate-compiz live spin (local build).
NetworkManager-wait-online.service doesn't start anymore.
see https://bugzilla.redhat.com/show_bug.cgi?id=1183293#c14

Comment 18 Massimiliano 2015-07-12 16:23:09 UTC
Is there something planned for F21? I've the same problem with my live remix:

# systemd-analyze blame
         32.897s ldconfig.service
          6.918s systemd-udev-settle.service
          5.101s plymouth-quit-wait.service
          3.042s firewalld.service
          2.658s systemd-udev-hwdb-update.service
          2.270s accounts-daemon.service

Maybe I should apply patch proposed in comment #7 ...

Comment 19 Kevin Fenzi 2015-07-13 13:58:20 UTC
We don't typically change stable releases after they happen (since we don't really have a good process to respin all the media). 

So, yeah, I'd say apply a version of the patch to your remix.

Comment 20 Massimiliano 2015-07-15 07:39:28 UTC
(In reply to Kevin Fenzi from comment #19)
> We don't typically change stable releases after they happen (since we don't
> really have a good process to respin all the media). 
> 

Because we don't pay attention to Torvlad's suggestions...
;-)

> So, yeah, I'd say apply a version of the patch to your remix.

OK, I see the change in recent fedora-live-base.ks
Thank you

Comment 21 Adam Williamson 2015-07-15 08:59:39 UTC
We pay attention to everyone who suggests it, but talk is cheap and work is expensive. Only a couple of people have been interested in actually doing the work to do respins *properly*, never enough at one time to make it a really official project. J.B. Williams (southern_gentleman on IRC) currently does hemi-demi-semi-unofficial respins, see his blog: https://jbwillia.wordpress.com/

Comment 22 Massimiliano 2015-07-15 21:33:52 UTC
(In reply to Adam Williamson from comment #21)
> J.B. Williams (southern_gentleman on IRC) currently
> does hemi-demi-semi-unofficial respins, see his blog:
> https://jbwillia.wordpress.com/

Interesting. But mine is a fully-totally-wildly-unauthorized source-only-remix:
https://www.assembla.com/code/fedora-remix/subversion/nodes/

cheers