Bug 1575376

Summary: dracut fails to disassemble device-mapper devices
Product: [Fedora] Fedora Reporter: Alan Jenkins <alan.christopher.jenkins>
Component: plymouthAssignee: dracut-maint-list
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 28CC: agk, alan.christopher.jenkins, dracut-maint-list, elad, harald, herrold, jonathan, mihai, prajnoha, prd-fedora, rstrode, zbyszek, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-28 20:16:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
photo of error messages
none
`halt` photograph with `rhgb rd.plymouth=0 plymouth.enable=0`
none
`halt` photograph with `rhgb rd.plymouth=0 plymouth.enable=0`
none
`halt` photograph with `rhgb rd.plymouth=0 plymouth.enable=0`
none
screenshot shutdown errors from pk-offline-update (non-dracut case) none

Description Alan Jenkins 2018-05-06 12:22:49 UTC
Description of problem:
dracut shutdown process does not manage to disassemble LVM.

Version-Release number of selected component (if applicable):
dracut-047-8.git20180305.fc28.x86_64

How reproducible: seemingly always (given that I have rootfs on LVM)


Steps to Reproduce:
1. systemctl halt

(for the un-aware, this will halt the system instead of powering off, allowing some shutdown messages to be read).


Actual results: see screenshot

dracut: Disassembling device-mapper devices
dracut: Disassembling device-mapper devices
...
etc
...
device-mapper: remove ioctl on alan_dell_2016-fedora  failed: Device or resource busy
Command failed
dracut: dmsetup ls --tree
dracut: `- (8:7)`
Halting.
...


Expected results:
it's nice for shutdown not to have errors when it tries to shut down storage cleanly.


Additional info:

Comment 1 Alan Jenkins 2018-05-06 12:25:33 UTC
Created attachment 1432375 [details]
photo of error messages

Comment 2 Harald Hoyer 2018-05-07 08:50:38 UTC
Please try, if adding either one or all options to the kernel command line work for you:
* selinux=0
* rd.plymouth=0 plymouth.enable=0

Comment 3 Alan Jenkins 2018-05-07 12:10:20 UTC
Created attachment 1432582 [details]
`halt` photograph with `rhgb rd.plymouth=0 plymouth.enable=0`

Comment 4 Alan Jenkins 2018-05-07 12:11:32 UTC
Created attachment 1432583 [details]
`halt` photograph with `rhgb rd.plymouth=0 plymouth.enable=0`

Comment 5 Alan Jenkins 2018-05-07 12:12:59 UTC
Thanks for looking at this.

I don't think `enforcing=0` helped. Do you specifically want `selinux=0` results? https://danwalsh.livejournal.com/10972.html

Funnily I was already booting without `rhgb`. But I tried adding `rhgb rd.plymouth=0 plymouth.enable=0`. That time, I had no device-mapper error.

I retried the both original case and `rhgb rd.plymouth=0 plymouth.enable=0` a second time. Both results seemed reproducible. Sounds like this bug involves the plymouth process. 


A different (earlier) error became visible though. I assume because the other errors scrolled it off the screen.

mount: /oldsys/sys: filesystem was mounted, but failed to update userspace mount table

and the same for the proc, run, and dev subdirectories

Comment 6 Harald Hoyer 2018-05-07 13:26:20 UTC
hmm, neither attachment worked :/

Comment 7 Alan Jenkins 2018-05-07 14:24:00 UTC
Created attachment 1432611 [details]
`halt` photograph with `rhgb rd.plymouth=0 plymouth.enable=0`

Too many text boxes (and inadequate error-checking :).

Let's try that again.

Comment 8 Alan Jenkins 2018-05-09 10:10:01 UTC
Lennart points out that there is a bug in plymouth.  (At least it will not be fixed in systemd).  So even if dracut gains a work-around, there will still be a bug in plymouth, and this could affect shutdown *without* dracut.

---

In the dracut shutdown case, plymouthd must be preventing the rootfs from being unmounted inside dracut.

https://github.com/systemd/systemd/issues/8912#issuecomment-387574452

The problem is that plymouth pretends to be a "Root Storage Daemon", to avoid being killed and restarted when switching from initrd to rootfs at boot time.  However, plymouth violates the contract for a root storage daemon.

Plymouth actually stops after boot, and is restarted on shutdown.  Hence the shutdown instance is running from the root filesystem.  This is a circular dependency, because the process still marks itself as a root storage daemon. That is, systemd will not kill plymouth before it goes to unmount root (by entering dracut), but plymouth is keeping the rootfs open.

https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/


--- (attention conservation notice:  Very lengthy qualification for why plymouth should be considered buggy in the *non* dracut case as well.  The specifics of this case are different.) ---

I reported a separate bug in the non-dracut case.  The comment by Lennart *grouped* this as the same bug, but I think it's actually a rather more special case.

Basically every PackageKit offline update logged on my Fedora system causes an unclean unmount of the rootfs (only).  I.e. it triggers a fsck on the next boot, which must replay the filesystem journal.  On ext2 (no journal) it would have forced a full fsck!  The fact that PackageKit triggers a non-dracut shutdown is a quirk in PackageKit, which should also be fixed, but I'd say we have a root cause bug in plymouth.

In this non-dracut case, systemd cannot literally unmount it's own rootfs.  Instead, systemd mounts its rootfs as read-only.  This would be fine if plymouth did not have any files open for writing, right?  Wrong - it also breaks if plymouth has any *deleted* files open.  A clean unmount/read-only mount of ext4 requires disposing deleted files, but you can't do that while the files are still open.

systemd reportedly fell foul of this.  They had to include a workaround, to use sync() to prevent data loss, even if they cannot remount as read-only.  So they definitely had a bug in this area.

https://github.com/systemd/systemd/pull/6598#issuecomment-339595846

The problem Lennart highlighted in this link, would be that package updates end up deleting and re-creating either the plymouth binary, a library that it uses, or maybe some other file.  But that's not consistent with my observations of offline update...

In the offline update case, plymouth is running during this update (and if it wasn't, there wouldn't be a problem).  But, this plymouth service must have been started during the initrd.  And I don't see anywhere that it gets re-executed from the rootfs.  Instead, there must be a different problem.

plymouth-read-write.service (plymouth update-root-fs --read-write) causes plymouthd to open a logfile i.e. a writable file.

Most shutdowns don't have a problem.  Most shutdowns have to start a fresh plymouthd instance, which never opens a log file.  Because a successful boot involves stopping plymouthd.  But in this offline update case, the initial plymouthd is never stopped, so I think the logfile remains open.

https://cgit.freedesktop.org/plymouth/tree/src/main.c?id=0.9.3#n848

Comment 9 Harald Hoyer 2018-05-11 08:12:55 UTC
Thank you for the detailed analysis!

Comment 10 Harald Hoyer 2018-05-15 11:53:59 UTC
Could you please try, if this fixes the issue?

https://github.com/dracutdevs/dracut/commit/df6bb5e959178cba06118493a7c8d019e84d54e7

files:
/usr/lib/dracut/modules.d/99base/dracut-lib.sh
/usr/lib/dracut/modules.d/99shutdown/shutdown.sh 
/usr/lib/dracut/modules.d/99shutdown/module-setup.sh

Comment 12 Alan Jenkins 2018-05-15 14:22:28 UTC
I think so.  I can halt without errors with this, and I can still reproduce errors without it.

Thanks!  I think I'll leave this installed so I can keep testing it.

Interesting that these things haven't been picked up from users of intel fakeraid.

Having seen this, I now have incentive to get PackageKit offline update to actually use dracut shutdown, instead of the equivalent of `systemctl reboot -f` :-).

p.s. systemd will do the same forced shutdown, if you press ctrl+alt+del 7 times in quick succession.  So there would still be at least one reasons to want plymouth/systemd to work out their differences properly.

Comment 13 Alan Jenkins 2018-05-16 12:37:02 UTC
Created attachment 1437353 [details]
screenshot shutdown errors from pk-offline-update (non-dracut case)

Confirmed.  Fixing PK, plus the updated dracut, PLUS an install which uses dracut to shutdown, due to having rootfs on LVM), appears to work around my problem that pk-offline-update was failing to unmount the rootfs cleanly.

https://github.com/hughsie/PackageKit/pull/251

Also confirmed, that I can reproduce this unclean unmount in a VM w/o LVM rootfs. In this case dracut is not run and so cannot work around the problem.  The rootfs is not unmounted cleanly, and "systemd-fsck: /dev/vda3: recovering journal" is logged in the following boot.

Harald, do you have a suggestion, should I open a clean bug in plymouth to show this? Or are you happy to leave this one open, as it is already assigned to the plymouth package?  (I guess we should probably change the title in that case).

Comment 14 Ben Cotton 2019-05-02 21:45:27 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 15 Ben Cotton 2019-05-28 20:16:31 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.