Created attachment 590638 [details]
simple module for triggering the problem
Description of problem:
To me it looks like the logic of dracut in the initqueue is flawed, in a way that initqueue might be exited too early, switching to pre-mount, although udev is not yet settled.
I do not have a deep inside into dracut, so I might be wrong, but I would be happy if one developer could check this case. Maybe the problem is not within the initqueue-code, but within the LVM module.
Version-Release number of selected component (if applicable):
dracut 0.18 from Fedora 17 (although dracut 0.19 should have the same problem)
Steps to Reproduce:
1. Use a system with a swap-device on a logical LVM volume
2. Install the simple module attached to this bug
3. Create new dracut initramfs and reboot
udev is running some script which gets started from an event in the initqueue, while dracut does not wait until udev is fully settled. In fact that script is still running while dracut runs pre-mount, mount, pre-pivot and pre-pivot-cleanup in parallel. It takes till the switch into the root system before dracut finally waits for udev to finish settling.
I would think, that dracut should not leave the initqueue, if udev is not yet settled and still has events to process. In fact dracut should run any pre-mount hooks if udev is still running.
From what I see and understand, the following does happen:
1. In the pre-udev hook dracut runs my preparescript. That script just installs a basic udev rule, which will kick whenever a new swapdevice is found. In the rule we tell udev to run a small shellscript, which will slow down for 10 seconds, to visualize the problem.
2. dracut (99base in that case) continues and runs the pre-trigger hook and finally triggers udev.
3. for the moment nothing more happens, as our swapdevice is on LVM, which is not yet initialized.
4. in the next step dracut enters the initqueue phase. From what I understand in the code, this is more or less an endless loop, calling scripts in the initqueue-hooks and looking if everything is settled. I suspect that dracut should never leave this loop till all events are processed (or only if some timeout or error occurs).
4a. in initqueue/settled/ there is a script named "lvm_scan", which now will be called. That script finds my volume group with the swap logical volume in it, and activates the logical volume.
4b. in dmesg I can now see something like "dracut: inactive '/dev/vg_fc17/lv_swap' [5.88 GiB] inherit"
4c. that message is directly followed by "DracutBug: I am running for swapdevice /dev/dm-0 now..." (that is the script which is kicked by the udev rule in step 1).
5. however, now the real trouble begins (here I am unsure if the initqueue-logic of 99base in dracut is to blame, or the lvm_scan hook): in 99base the following code is processed: "check_finished && break 2". The "check_finished" function returns 1 in this case, causing the "break 2" to be executed and leaving the initqueue as result.
If I comment this line and call "udevadm settle --timeout=0" here, I will notice that udev has still events to process (this is to be expected, because of the LVM initialization there are new rules to execute).
6. so know dracut is switching to the pre-mount hook already, while my little script is still running in the background.
7. dracut is now running the next hooks until it reaches the "udevadm control --exit". There it finally waits till the script finshes and then goes on with switching to the new root filesystem.
For illustration I added some info-lines into 99base, so I knew which sequence dracut is currently running. The following is the relevant section from dmesg:
[ 0.635658] Freeing unused kernel memory: 1516k freed
[ 0.685404] dracut: dracut-018-40.git20120522.fc17
[ 0.731810] dracut: DracutBug: prepare script
[ 0.732281] dracut: DracutBug: Sucessfully installed rule for swapdevice
[ 0.737698] udevd: starting version 182
[ 0.937229] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
[ 0.981027] usb 1-1: new full-speed USB device number 2 using uhci_hcd
[ 1.011490] dracut: Starting plymouth daemon
[ 1.370893] Refined TSC clocksource calibration: 2691.202 MHz.
[ 1.552476] virtio-pci 0000:00:05.0: irq 40 for MSI/MSI-X
[ 1.552500] virtio-pci 0000:00:05.0: irq 41 for MSI/MSI-X
[ 1.563955] vda: vda1 vda2 vda3
[ 1.747797] usb 1-1: New USB device found, idVendor=0627, idProduct=0001
[ 1.748771] usb 1-1: New USB device strings: Mfr=1, Product=3, SerialNumber=5
[ 1.749282] usb 1-1: Product: QEMU USB Tablet
[ 1.749728] usb 1-1: Manufacturer: QEMU 1.0
[ 1.750201] usb 1-1: SerialNumber: 42
[ 1.782806] input: QEMU 1.0 QEMU USB Tablet as /devices/pci0000:00/0000:00:01.2/usb1/1-1/1-1:1.0/input/input3
[ 1.783402] generic-usb 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Pointer [QEMU 1.0 QEMU USB Tablet] on usb-0000:00:01.2-1/input0
[ 1.821557] dracut: CUBBI: Job=/lib/dracut/hooks/initqueue/settled/blocksymlink.sh
[ 1.822000] dracut: CUBBI: Job=/lib/dracut/hooks/initqueue/settled/lvm_scan.sh
[ 1.846719] dracut: Scanning devices vda3 for LVM logical volumes vg_fc17-beta-64bit/lv_root vg_fc17-beta-64bit/lv_swap
[ 1.991455] dracut: inactive '/dev/vg_fc17-beta-64bit/lv_swap' [5.88 GiB] inherit
[ 1.992150] dracut: inactive '/dev/vg_fc17-beta-64bit/lv_root' [13.62 GiB] inherit
[ 2.213802] DracutBug: I am running for swapdevice /dev/dm-0 now...
[ 2.736496] dracut: CUBBI: settled, now hooking pre-mount
[ 2.738582] dracut: CUBBI: done pre-mount
[ 2.888805] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[ 2.915690] dracut: Checking ext4: /dev/mapper/vg_fc17--beta--64bit-lv_root
[ 2.916917] dracut: issuing e2fsck -a /dev/mapper/vg_fc17--beta--64bit-lv_root
[ 3.220812] DracutBug: waiting 1 of 10
[ 3.234647] dracut: /dev/mapper/vg_fc17--beta--64bit-lv_root: clean, 51013/892928 files, 473849/3571712 blocks
[ 3.235631] dracut: Remounting /dev/mapper/vg_fc17--beta--64bit-lv_root with -o ro
[ 3.249963] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[ 3.253090] dracut: CUBBI: doing root mount
[ 3.255514] dracut: Mounted root filesystem /dev/mapper/vg_fc17--beta--64bit-lv_root
[ 4.232852] DracutBug: waiting 2 of 10
[ 5.238663] DracutBug: waiting 3 of 10
[ 6.244727] DracutBug: waiting 4 of 10
[ 7.251462] DracutBug: waiting 5 of 10
[ 8.258435] DracutBug: waiting 6 of 10
[ 9.262199] DracutBug: waiting 7 of 10
[ 10.264690] DracutBug: waiting 8 of 10
[ 11.271409] DracutBug: waiting 9 of 10
[ 12.273934] DracutBug: waiting 10 of 10
[ 13.278639] DracutBug: exiting now, returning control to 99base, which now most likely has already run pre-mount and mount
[ 13.340844] dracut: Switching root
[ 13.422831] type=1404 audit(1339265156.508:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
[ 13.685977] SELinux: 2048 avtab hash slots, 99143 rules.
As you can see, the udev-rule for the new swap device is now being processed in parallel of the rootfs-fsck and rootfs-mount.
In my tests I could successfully delay the pre-mount hook, if I just call "udevadm settle" there, but that is doubtfully the correct fix.
More likely there should be some more checks with $UDEV_QUEUE_EMPTY in the initqueue-loop, or better a combined check which only leaves the queue if udev is settled _and_ all initqueue hooks are executed and finished.
I think this bug/problem is one likely cause to the filesystem corruption seen from hibernation (bug 822071), because the rootfs gets checked and mounted while the udev rule for processing the resume is still in queue.
(In reply to comment #0)
> I think this bug/problem is one likely cause to the filesystem corruption
> seen from hibernation (bug 822071), because the rootfs gets checked and
> mounted while the udev rule for processing the resume is still in queue.
Bumping severity to high (crash, potential loss of data). Blocking bug #822071.
Nice find! Will move the resume process in the initqueue, which should eliminate the race.
I have been bitten by this as well. Is there a patch for dracut to try locally? Searching koji logs came up empty.
Created attachment 595461 [details]
Simply, but dirty patch, to fix the problem by calling settle in a pre-mount hook
Well, at the moment I "fix" this issue by calling udevadm --settle in a pre-mount hook. I do not know how the dracut maintainers will address this issue, but I certainly hope for some failsafe check in pre-mount.
I added a simple fix in my dracut-tuxonice package (http://mhensler.de/swsusp/download/dracut-tuxonice-002-16.fc17.cubbi1.noarch.rpm), however since you possible use the in-kernel suspend I attached the broken-out patch to this bugentry (just copy the install- and the bz-file to the dracut-modules directory). After installing the new module you have to recreate the initrd for your running kernel.
dracut-018-60.git20120927.fc16 has been submitted as an update for Fedora 16.
dracut-018-105.git20120927.fc17 has been submitted as an update for Fedora 17.
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing dracut-018-105.git20120927.fc17'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
dracut-018-105.git20120927.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.