Bug 1575376 - dracut fails to disassemble device-mapper devices
Summary: dracut fails to disassemble device-mapper devices
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: plymouth   
(Show other bugs)
Version: 28
Hardware: Unspecified Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-06 12:22 UTC by Alan Jenkins
Modified: 2018-05-16 12:37 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
photo of error messages (1.74 MB, image/jpeg)
2018-05-06 12:25 UTC, Alan Jenkins
no flags Details
`halt` photograph with `rhgb rd.plymouth=0 plymouth.enable=0` (752 bytes, text/plain)
2018-05-07 12:10 UTC, Alan Jenkins
no flags Details
`halt` photograph with `rhgb rd.plymouth=0 plymouth.enable=0` (26 bytes, text/plain)
2018-05-07 12:11 UTC, Alan Jenkins
no flags Details
`halt` photograph with `rhgb rd.plymouth=0 plymouth.enable=0` (291.61 KB, image/jpeg)
2018-05-07 14:24 UTC, Alan Jenkins
no flags Details
screenshot shutdown errors from pk-offline-update (non-dracut case) (32.16 KB, image/png)
2018-05-16 12:37 UTC, Alan Jenkins
no flags Details

Description Alan Jenkins 2018-05-06 12:22:49 UTC
Description of problem:
dracut shutdown process does not manage to disassemble LVM.

Version-Release number of selected component (if applicable):
dracut-047-8.git20180305.fc28.x86_64

How reproducible: seemingly always (given that I have rootfs on LVM)


Steps to Reproduce:
1. systemctl halt

(for the un-aware, this will halt the system instead of powering off, allowing some shutdown messages to be read).


Actual results: see screenshot

dracut: Disassembling device-mapper devices
dracut: Disassembling device-mapper devices
...
etc
...
device-mapper: remove ioctl on alan_dell_2016-fedora  failed: Device or resource busy
Command failed
dracut: dmsetup ls --tree
dracut: `- (8:7)`
Halting.
...


Expected results:
it's nice for shutdown not to have errors when it tries to shut down storage cleanly.


Additional info:

Comment 1 Alan Jenkins 2018-05-06 12:25 UTC
Created attachment 1432375 [details]
photo of error messages

Comment 2 Harald Hoyer 2018-05-07 08:50:38 UTC
Please try, if adding either one or all options to the kernel command line work for you:
* selinux=0
* rd.plymouth=0 plymouth.enable=0

Comment 3 Alan Jenkins 2018-05-07 12:10 UTC
Created attachment 1432582 [details]
`halt` photograph with `rhgb rd.plymouth=0 plymouth.enable=0`

Comment 4 Alan Jenkins 2018-05-07 12:11 UTC
Created attachment 1432583 [details]
`halt` photograph with `rhgb rd.plymouth=0 plymouth.enable=0`

Comment 5 Alan Jenkins 2018-05-07 12:12:59 UTC
Thanks for looking at this.

I don't think `enforcing=0` helped. Do you specifically want `selinux=0` results? https://danwalsh.livejournal.com/10972.html

Funnily I was already booting without `rhgb`. But I tried adding `rhgb rd.plymouth=0 plymouth.enable=0`. That time, I had no device-mapper error.

I retried the both original case and `rhgb rd.plymouth=0 plymouth.enable=0` a second time. Both results seemed reproducible. Sounds like this bug involves the plymouth process. 


A different (earlier) error became visible though. I assume because the other errors scrolled it off the screen.

mount: /oldsys/sys: filesystem was mounted, but failed to update userspace mount table

and the same for the proc, run, and dev subdirectories

Comment 6 Harald Hoyer 2018-05-07 13:26:20 UTC
hmm, neither attachment worked :/

Comment 7 Alan Jenkins 2018-05-07 14:24 UTC
Created attachment 1432611 [details]
`halt` photograph with `rhgb rd.plymouth=0 plymouth.enable=0`

Too many text boxes (and inadequate error-checking :).

Let's try that again.

Comment 8 Alan Jenkins 2018-05-09 10:10:01 UTC
Lennart points out that there is a bug in plymouth.  (At least it will not be fixed in systemd).  So even if dracut gains a work-around, there will still be a bug in plymouth, and this could affect shutdown *without* dracut.

---

In the dracut shutdown case, plymouthd must be preventing the rootfs from being unmounted inside dracut.

https://github.com/systemd/systemd/issues/8912#issuecomment-387574452

The problem is that plymouth pretends to be a "Root Storage Daemon", to avoid being killed and restarted when switching from initrd to rootfs at boot time.  However, plymouth violates the contract for a root storage daemon.

Plymouth actually stops after boot, and is restarted on shutdown.  Hence the shutdown instance is running from the root filesystem.  This is a circular dependency, because the process still marks itself as a root storage daemon. That is, systemd will not kill plymouth before it goes to unmount root (by entering dracut), but plymouth is keeping the rootfs open.

https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/


--- (attention conservation notice:  Very lengthy qualification for why plymouth should be considered buggy in the *non* dracut case as well.  The specifics of this case are different.) ---

I reported a separate bug in the non-dracut case.  The comment by Lennart *grouped* this as the same bug, but I think it's actually a rather more special case.

Basically every PackageKit offline update logged on my Fedora system causes an unclean unmount of the rootfs (only).  I.e. it triggers a fsck on the next boot, which must replay the filesystem journal.  On ext2 (no journal) it would have forced a full fsck!  The fact that PackageKit triggers a non-dracut shutdown is a quirk in PackageKit, which should also be fixed, but I'd say we have a root cause bug in plymouth.

In this non-dracut case, systemd cannot literally unmount it's own rootfs.  Instead, systemd mounts its rootfs as read-only.  This would be fine if plymouth did not have any files open for writing, right?  Wrong - it also breaks if plymouth has any *deleted* files open.  A clean unmount/read-only mount of ext4 requires disposing deleted files, but you can't do that while the files are still open.

systemd reportedly fell foul of this.  They had to include a workaround, to use sync() to prevent data loss, even if they cannot remount as read-only.  So they definitely had a bug in this area.

https://github.com/systemd/systemd/pull/6598#issuecomment-339595846

The problem Lennart highlighted in this link, would be that package updates end up deleting and re-creating either the plymouth binary, a library that it uses, or maybe some other file.  But that's not consistent with my observations of offline update...

In the offline update case, plymouth is running during this update (and if it wasn't, there wouldn't be a problem).  But, this plymouth service must have been started during the initrd.  And I don't see anywhere that it gets re-executed from the rootfs.  Instead, there must be a different problem.

plymouth-read-write.service (plymouth update-root-fs --read-write) causes plymouthd to open a logfile i.e. a writable file.

Most shutdowns don't have a problem.  Most shutdowns have to start a fresh plymouthd instance, which never opens a log file.  Because a successful boot involves stopping plymouthd.  But in this offline update case, the initial plymouthd is never stopped, so I think the logfile remains open.

https://cgit.freedesktop.org/plymouth/tree/src/main.c?id=0.9.3#n848

Comment 9 Harald Hoyer 2018-05-11 08:12:55 UTC
Thank you for the detailed analysis!

Comment 10 Harald Hoyer 2018-05-15 11:53:59 UTC
Could you please try, if this fixes the issue?

https://github.com/dracutdevs/dracut/commit/df6bb5e959178cba06118493a7c8d019e84d54e7

files:
/usr/lib/dracut/modules.d/99base/dracut-lib.sh
/usr/lib/dracut/modules.d/99shutdown/shutdown.sh 
/usr/lib/dracut/modules.d/99shutdown/module-setup.sh

Comment 12 Alan Jenkins 2018-05-15 14:22:28 UTC
I think so.  I can halt without errors with this, and I can still reproduce errors without it.

Thanks!  I think I'll leave this installed so I can keep testing it.

Interesting that these things haven't been picked up from users of intel fakeraid.

Having seen this, I now have incentive to get PackageKit offline update to actually use dracut shutdown, instead of the equivalent of `systemctl reboot -f` :-).

p.s. systemd will do the same forced shutdown, if you press ctrl+alt+del 7 times in quick succession.  So there would still be at least one reasons to want plymouth/systemd to work out their differences properly.

Comment 13 Alan Jenkins 2018-05-16 12:37 UTC
Created attachment 1437353 [details]
screenshot shutdown errors from pk-offline-update (non-dracut case)

Confirmed.  Fixing PK, plus the updated dracut, PLUS an install which uses dracut to shutdown, due to having rootfs on LVM), appears to work around my problem that pk-offline-update was failing to unmount the rootfs cleanly.

https://github.com/hughsie/PackageKit/pull/251

Also confirmed, that I can reproduce this unclean unmount in a VM w/o LVM rootfs. In this case dracut is not run and so cannot work around the problem.  The rootfs is not unmounted cleanly, and "systemd-fsck: /dev/vda3: recovering journal" is logged in the following boot.

Harald, do you have a suggestion, should I open a clean bug in plymouth to show this? Or are you happy to leave this one open, as it is already assigned to the plymouth package?  (I guess we should probably change the title in that case).


Note You need to log in before you can comment on or make changes to this bug.