Bug 1725364
Summary: | local-fs target completes before local fs are mounted, resulting in failure of libvirtd & other services | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | BugMasta <vorpal> |
Component: | systemd | Assignee: | systemd-maint |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 29 | CC: | lnykryn, msekleta, ssahani, s, systemd-maint, zbyszek |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-30 08:40:42 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
BugMasta
2019-06-30 07:18:21 UTC
In addition, we have this: [root@Il-Duce 06-30 15:29:59 ~]# systemctl status libvirtd ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Active: inactive (dead) since Sun 2019-06-30 14:00:27 ACST; 1h 29min ago Docs: man:libvirtd(8) https://libvirt.org Process: 1611 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=exited, status=0/SUCCESS) Main PID: 1611 (code=exited, status=0/SUCCESS) CPU: 211ms ie: Active: inactive (dead) since Sun 2019-06-30 14:00:27 ACST; 1h 29min ago Why is it dead? As mentioned earlier, the libvirtd.servce has this: After=local-fs.target so, it should've been able to access its save directory, and start without any issue. But the service file also has this: Restart=on-failure So, why was it not restarted? If one more attemt had been made to restart the service it would've succeeded, because by then the 6tb-linux fs would've been mounted. Can someone please explain to me what does "Restart=on-failure" mean, if it does not mean RESTART ON FAILURE. It's 2019, and systemd still cannot reliably mount filesystems and start a service. Is that too much to ask for is it? The docs for "nofail" [1] are > With nofail, this mount will be only wanted, not required, by > local-fs.target or remote-fs.target. Moreover the mount unit is not ordered > before these target units. This means that the boot will continue without > waiting for the mount unit and regardless whether the mount point can be > mounted successfully. Please remove nofail from fstab, and you should be good. [1] https://www.freedesktop.org/software/systemd/man/systemd.mount.html#nofail For the second question: > Process: 1611 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=exited, status=0/SUCCESS) As you can see, the process returned SUCCESS, i.e. not failure, i.e. there is no reason to restart. I do not want to remove nofail from fstab. If i remove nofail, then the system will drop to an emergency shell if this drive cannot be mounted. That is not "good". The option we are talking about is called "nofail", it is not called "nowait". Please re-open this bug. The current state of affairs is unacceptable. If people do not wish to wait for a drive to mount, they can set a short timeout. What option do i have, to stop my system dropping to an emergency shell if a fileystem can't be mounted? Only one, and it is called "nofail". If I use nofail, it does *NOT* mean i do not want to wait for it to mount if it is there, and it's only going to take a few seconds for a disk to spin up and mount. Going back to a manpage from a sysv system, you knowm remember those old systems we used to have, where you could actually be sure one thing was done before something else was attempted, we have this: nofail : do not report errors for this device if it does not exist. In my case, the device definitely DOES exist. It is attached. A sysv system would wait for it to be mounted, even with nofail. systemd is not waiting for it. That is a bug. Having to remove the nofail option, just because the semantics of nofail have been changed, is unacceptable. It makes my system fragile, and liable to drop to an emergency shell if that disk is remove. If that disk is removed, then the nofail option will allow boot to proceed, and then sure, libvirtd will fail also, but the rest of the system will come up. That is the purpose of nofail. "nofail" is *NOT* there to prevent delays waiting for removable devices that are not there. For that, use noauto, or, add a new option "nowait", like the old "nobootwait" option that some systems had. nofail has a specific purpose, and that is to designate a system that the system should be able to boot without. That is all. It does not mean we do not want to wait for that filesystem, when it is there. I want this bug reopened and this bug addressed. It is unacceptable for systemd to change the behaviour of nofail, making critical systems more fragile just to pander to the convenience of incompetent users who cannot use any of the many user-space tools out there, to manage mounting and unmounting removable drives. This is a critical issue, which affects the fundamental robustness of systems. It is not a joke. Since this bug has been inappropriately closed, i have submitted bug 1725389: https://bugzilla.redhat.com/show_bug.cgi?id=1725389 This is a serious issue and I expect that bug to be addressed, not closed without any thought. If anyone else comes across this issue, do not remove nofail. Do this: x-systemd.before=, x-systemd.after= Configures a Before= dependency or After= between the created mount unit and another systemd unit, such as a mount unit. The argument should be a unit name or an absolute path to a mount point. This option may be specified more than once. This option is particularly useful for mount point declarations with nofail option that are mounted asynchronously but need to be mounted before or after some unit start, for example, before local-fs.target unit. See Before= and After= in systemd.unit(5) for details. So, i can use nofail and add option: x-systemd.before=local-fs.target Once again, systemd has found a way to do the simplest thing in the most arse-backwards way imaginable.
Re the libvirtd restart issue,
Zbigniewx also said:
For the second question:
> Process: 1611 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=exited, status=0/SUCCESS)
As you can see, the process returned SUCCESS, i.e. not failure, i.e. there is no reason
to restart.
True, but since the service didn't start successfully, and journalctl logged this:
Jun 30 14:00:27 Il-Duce audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=libvirtd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 30 14:00:27 Il-Duce libvirtd[1611]: 2019-06-30 04:30:27.598+0000: 1726: info : libvirt version: 4.7.0, package: 3.fc29 (Fedora Project, 2019-05-14-18:50:20, )
Jun 30 14:00:27 Il-Duce libvirtd[1611]: 2019-06-30 04:30:27.598+0000: 1726: info : hostname: Il-Duce
Jun 30 14:00:27 Il-Duce libvirtd[1611]: 2019-06-30 04:30:27.598+0000: 1726: error : qemuStateInitialize:758 : unable to set ownership of '/var/lib/libvirt/qemu/save' to 107:107: No such file or directory
Jun 30 14:00:27 Il-Duce libvirtd[1611]: 2019-06-30 04:30:27.598+0000: 1726: error : virStateInitialize:667 : Initialization of QEMU state driver failed: unable to set ownership of '/var/lib/libvirt/qemu/save' to 107:107: No such file or directory
Jun 30 14:00:27 Il-Duce libvirtd[1611]: 2019-06-30 04:30:27.598+0000: 1726: error : daemonRunStateInit:806 : Driver state initialization failed
Jun 30 14:00:27 Il-Duce audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=libvirtd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 30 14:00:27 Il-Duce systemd[1]: libvirtd.service: Consumed 211ms CPU time
Then obviously we have a bug in libvirtd.
Maybe I'll log a bug vs libvirtd next week. I'll call it "starting libvirtd service reports success even when it has totally failed".
I can't log it now, because the stupidity behind this systemd bug tonight has already pushed ny sanity closer to the edge than I can tolerate on what should be a relaxing Sunday night.
|