Bug 830447 - dracut exits early from initqueue (switching to pre-mount), even if udev events are still in the queue
dracut exits early from initqueue (switching to pre-mount), even if udev even...
Product: Fedora
Classification: Fedora
Component: dracut (Show other bugs)
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Harald Hoyer
Fedora Extras Quality Assurance
Depends On:
Blocks: 822071
  Show dependency treegraph
Reported: 2012-06-09 14:39 EDT by Matthias Hensler
Modified: 2012-12-20 10:53 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-12-20 10:52:57 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
simple module for triggering the problem (711 bytes, application/x-download)
2012-06-09 14:39 EDT, Matthias Hensler
no flags Details
Simply, but dirty patch, to fix the problem by calling settle in a pre-mount hook (566 bytes, application/x-bzip2)
2012-06-30 13:39 EDT, Matthias Hensler
no flags Details

  None (edit)
Description Matthias Hensler 2012-06-09 14:39:44 EDT
Created attachment 590638 [details]
simple module for triggering the problem

Description of problem:
To me it looks like the logic of dracut in the initqueue is flawed, in a way that initqueue might be exited too early, switching to pre-mount, although udev is not yet settled.

I do not have a deep inside into dracut, so I might be wrong, but I would be happy if one developer could check this case. Maybe the problem is not within the initqueue-code, but within the LVM module.

Version-Release number of selected component (if applicable):
dracut 0.18 from Fedora 17 (although dracut 0.19 should have the same problem)

How reproducible:
Every time

Steps to Reproduce:
1. Use a system with a swap-device on a logical LVM volume
2. Install the simple module attached to this bug
3. Create new dracut initramfs and reboot
Actual results:
udev is running some script which gets started from an event in the initqueue, while dracut does not wait until udev is fully settled. In fact that script is still running while dracut runs pre-mount, mount, pre-pivot and pre-pivot-cleanup in parallel. It takes till the switch into the root system before dracut finally waits for udev to finish settling.

Expected results:
I would think, that dracut should not leave the initqueue, if udev is not yet settled and still has events to process. In fact dracut should run any pre-mount hooks if udev is still running.

Additional info:
From what I see and understand, the following does happen:

1. In the pre-udev hook dracut runs my preparescript. That script just installs a basic udev rule, which will kick whenever a new swapdevice is found. In the rule we tell udev to run a small shellscript, which will slow down for 10 seconds, to visualize the problem.

2. dracut (99base in that case) continues and runs the pre-trigger hook and finally triggers udev.

3. for the moment nothing more happens, as our swapdevice is on LVM, which is not yet initialized.

4. in the next step dracut enters the initqueue phase. From what I understand in the code, this is more or less an endless loop, calling scripts in the initqueue-hooks and looking if everything is settled. I suspect that dracut should never leave this loop till all events are processed (or only if some timeout or error occurs).

4a. in initqueue/settled/ there is a script named "lvm_scan", which now will be called. That script finds my volume group with the swap logical volume in it, and activates the logical volume.

4b. in dmesg I can now see something like "dracut: inactive '/dev/vg_fc17/lv_swap' [5.88 GiB] inherit"

4c. that message is directly followed by "DracutBug: I am running for swapdevice /dev/dm-0 now..." (that is the script which is kicked by the udev rule in step 1).

5. however, now the real trouble begins (here I am unsure if the initqueue-logic of 99base in dracut is to blame, or the lvm_scan hook): in 99base the following code is processed: "check_finished && break 2". The "check_finished" function returns 1 in this case, causing the "break 2" to be executed and leaving the initqueue as result.

If I comment this line and call "udevadm settle --timeout=0" here, I will notice that udev has still events to process (this is to be expected, because of the LVM initialization there are new rules to execute).

6. so know dracut is switching to the pre-mount hook already, while my little script is still running in the background.

7. dracut is now running the next hooks until it reaches the "udevadm control --exit". There it finally waits till the script finshes and then goes on with switching to the new root filesystem.

For illustration I added some info-lines into 99base, so I knew which sequence dracut is currently running. The following is the relevant section from dmesg:
[    0.635658] Freeing unused kernel memory: 1516k freed
[    0.685404] dracut: dracut-018-40.git20120522.fc17
[    0.731810] dracut: DracutBug: prepare script
[    0.732281] dracut: DracutBug: Sucessfully installed rule for swapdevice
[    0.737698] udevd[110]: starting version 182
[    0.937229] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
[    0.981027] usb 1-1: new full-speed USB device number 2 using uhci_hcd
[    1.011490] dracut: Starting plymouth daemon
[    1.370893] Refined TSC clocksource calibration: 2691.202 MHz.
[    1.552476] virtio-pci 0000:00:05.0: irq 40 for MSI/MSI-X
[    1.552500] virtio-pci 0000:00:05.0: irq 41 for MSI/MSI-X
[    1.563955]  vda: vda1 vda2 vda3
[    1.747797] usb 1-1: New USB device found, idVendor=0627, idProduct=0001
[    1.748771] usb 1-1: New USB device strings: Mfr=1, Product=3, SerialNumber=5
[    1.749282] usb 1-1: Product: QEMU USB Tablet
[    1.749728] usb 1-1: Manufacturer: QEMU 1.0
[    1.750201] usb 1-1: SerialNumber: 42
[    1.782806] input: QEMU 1.0 QEMU USB Tablet as /devices/pci0000:00/0000:00:01.2/usb1/1-1/1-1:1.0/input/input3
[    1.783402] generic-usb 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Pointer [QEMU 1.0 QEMU USB Tablet] on usb-0000:00:01.2-1/input0
[    1.821557] dracut: CUBBI: Job=/lib/dracut/hooks/initqueue/settled/blocksymlink.sh
[    1.822000] dracut: CUBBI: Job=/lib/dracut/hooks/initqueue/settled/lvm_scan.sh
[    1.846719] dracut: Scanning devices vda3  for LVM logical volumes vg_fc17-beta-64bit/lv_root vg_fc17-beta-64bit/lv_swap
[    1.991455] dracut: inactive '/dev/vg_fc17-beta-64bit/lv_swap' [5.88 GiB] inherit
[    1.992150] dracut: inactive '/dev/vg_fc17-beta-64bit/lv_root' [13.62 GiB] inherit
[    2.213802] DracutBug: I am running for swapdevice /dev/dm-0 now...
[    2.736496] dracut: CUBBI: settled, now hooking pre-mount
[    2.738582] dracut: CUBBI: done pre-mount
[    2.888805] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[    2.915690] dracut: Checking ext4: /dev/mapper/vg_fc17--beta--64bit-lv_root
[    2.916917] dracut: issuing e2fsck -a  /dev/mapper/vg_fc17--beta--64bit-lv_root
[    3.220812] DracutBug: waiting 1 of 10
[    3.234647] dracut: /dev/mapper/vg_fc17--beta--64bit-lv_root: clean, 51013/892928 files, 473849/3571712 blocks
[    3.235631] dracut: Remounting /dev/mapper/vg_fc17--beta--64bit-lv_root with -o ro
[    3.249963] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[    3.253090] dracut: CUBBI: doing root mount
[    3.255514] dracut: Mounted root filesystem /dev/mapper/vg_fc17--beta--64bit-lv_root
[    4.232852] DracutBug: waiting 2 of 10
[    5.238663] DracutBug: waiting 3 of 10
[    6.244727] DracutBug: waiting 4 of 10
[    7.251462] DracutBug: waiting 5 of 10
[    8.258435] DracutBug: waiting 6 of 10
[    9.262199] DracutBug: waiting 7 of 10
[   10.264690] DracutBug: waiting 8 of 10
[   11.271409] DracutBug: waiting 9 of 10
[   12.273934] DracutBug: waiting 10 of 10
[   13.278639] DracutBug: exiting now, returning control to 99base, which now most likely has already run pre-mount and mount
[   13.340844] dracut: Switching root
[   13.422831] type=1404 audit(1339265156.508:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
[   13.685977] SELinux: 2048 avtab hash slots, 99143 rules.

As you can see, the udev-rule for the new swap device is now being processed in parallel of the rootfs-fsck and rootfs-mount.

In my tests I could successfully delay the pre-mount hook, if I just call "udevadm settle" there, but that is doubtfully the correct fix.

More likely there should be some more checks with $UDEV_QUEUE_EMPTY in the initqueue-loop, or better a combined check which only leaves the queue if udev is settled _and_ all initqueue hooks are executed and finished.

I think this bug/problem is one likely cause to the filesystem corruption seen from hibernation (bug 822071), because the rootfs gets checked and mounted while the udev rule for processing the resume is still in queue.
Comment 1 Nils Philippsen 2012-06-11 04:49:46 EDT
(In reply to comment #0) 
> I think this bug/problem is one likely cause to the filesystem corruption
> seen from hibernation (bug 822071), because the rootfs gets checked and
> mounted while the udev rule for processing the resume is still in queue.

Bumping severity to high (crash, potential loss of data). Blocking bug #822071.
Comment 2 Harald Hoyer 2012-06-11 12:06:23 EDT
Nice find! Will move the resume process in the initqueue, which should eliminate the race.
Comment 3 Ravikiran Rajagopal 2012-06-30 11:04:57 EDT
I have been bitten by this as well. Is there a patch for dracut to try locally? Searching koji logs came up empty.
Comment 4 Matthias Hensler 2012-06-30 13:39:58 EDT
Created attachment 595461 [details]
Simply, but dirty patch, to fix the problem by calling settle in a pre-mount hook
Comment 5 Matthias Hensler 2012-06-30 13:40:59 EDT
Well, at the moment I "fix" this issue by calling udevadm --settle in a pre-mount hook. I do not know how the dracut maintainers will address this issue, but I certainly hope for some failsafe check in pre-mount.

I added a simple fix in my dracut-tuxonice package (http://mhensler.de/swsusp/download/dracut-tuxonice-002-16.fc17.cubbi1.noarch.rpm), however since you possible use the in-kernel suspend I attached the broken-out patch to this bugentry (just copy the install- and the bz-file to the dracut-modules directory). After installing the new module you have to recreate the initrd for your running kernel.
Comment 6 Fedora Update System 2012-09-27 06:22:20 EDT
dracut-018-60.git20120927.fc16 has been submitted as an update for Fedora 16.
Comment 7 Fedora Update System 2012-09-27 06:23:24 EDT
dracut-018-105.git20120927.fc17 has been submitted as an update for Fedora 17.
Comment 8 Fedora Update System 2012-09-27 06:30:51 EDT
dracut-018-105.git20120927.fc17 has been submitted as an update for Fedora 17.
Comment 9 Fedora Update System 2012-09-28 04:20:27 EDT
Package dracut-018-105.git20120927.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing dracut-018-105.git20120927.fc17'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
Comment 10 Fedora Update System 2012-12-20 10:53:05 EST
dracut-018-105.git20120927.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.