Bug 521959 - dracut does not work for / over lvm over md
dracut does not work for / over lvm over md
Product: Fedora
Classification: Fedora
Component: dracut (Show other bugs)
All Linux
low Severity medium
: ---
: ---
Assigned To: Harald Hoyer
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2009-09-08 16:56 EDT by Nicolas Mailhot
Modified: 2009-10-22 09:44 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-09-17 13:40:28 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
dmesg (52.32 KB, text/plain)
2009-09-08 16:59 EDT, Nicolas Mailhot
no flags Details
boot.log (4.12 KB, text/plain)
2009-09-08 17:00 EDT, Nicolas Mailhot
no flags Details
lspci (192.90 KB, text/plain)
2009-09-08 17:01 EDT, Nicolas Mailhot
no flags Details
dmesg with dracut-001-10.git4d924752.fc12.noarch.rpm (52.51 KB, text/plain)
2009-09-15 13:06 EDT, Nicolas Mailhot
no flags Details
dmesg with dracut-001-10.git4d924752.fc12.noarch (55.72 KB, text/plain)
2009-09-16 18:29 EDT, Nicolas Mailhot
no flags Details
dmesg with dracut-002-1.fc12 (53.90 KB, text/plain)
2009-09-17 13:39 EDT, Nicolas Mailhot
no flags Details

  None (edit)
Description Nicolas Mailhot 2009-09-08 16:56:56 EDT
Description of problem:

On a system where /boot is on a raid-1 md0 and everything else in lvm over raid-1 md1, since the move to dracut, md volumes are not cleanly released at the end of the initramfs stage and therefore are not re-assembled correctly later (half of md1 is online, the other half stuck in md127)

cat /proc/mdstat 
Personalities : [raid1] 
md1 : active raid1 sda3[0]
      288856640 blocks [2/1] [U_]
md0 : active raid1 sda1[0] sdb1[1]
      2096384 blocks [2/2] [UU]
md126 : active raid1 sdb3[1]
      288856640 blocks [2/1] [_U]
unused devices: <none>
[root@arekh nim]# mdadm --stop /dev/md126
mdadm: failed to stop array /dev/md126: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?

Version-Release number of selected component (if applicable):

kernel-2.6.31-0.212.rc9.git1.fc12.x86_64 with initramfs rebuild using

dracut -f /boot/initrd-generic-2.6.31-0.212.rc9.git1.fc12.x86_64.img 2.6.31-0.212.rc9.git1.fc12.x86_64
Comment 1 Nicolas Mailhot 2009-09-08 16:59:25 EDT
Created attachment 360132 [details]
Comment 2 Nicolas Mailhot 2009-09-08 17:00:35 EDT
Created attachment 360133 [details]
Comment 3 Nicolas Mailhot 2009-09-08 17:01:10 EDT
Created attachment 360134 [details]
Comment 4 Nicolas Mailhot 2009-09-08 17:01:45 EDT
(22:53:04) warren: nim-nim: add rdbreak=pre-pivot
(22:53:06) warren: nim-nim: boot it
(22:53:21) warren: nim-nim: look in the filesystem for the script that runs mdmon
(22:53:29) warren: nim-nim: run mdmon in the same manner, see if it segfaults
(22:54:16) warren: nim-nim: if so, ulimit -c unlimited; make it crash; #plug usb stick in; mkdir /tmp/whatever; mount /dev/dev/sdwhatever1 /tmp/whatever; copy it
(22:55:10) warren: nim-nim: with that core file, hopefully we can get a backtrace
(22:56:18) warren: nim-nim: hopefully ulimit is a dash built-in
Comment 5 Harald Hoyer 2009-09-15 10:25:53 EDT
For the advanced user, here is a scratch version to test:

# rpm -e '*dracut*' --nodeps
# rpm -ivh 'http://koji.fedoraproject.org/koji/getfile?taskID=1680533&name=dracut-001-10.git4d924752.fc12.noarch.rpm'
Comment 6 Nicolas Mailhot 2009-09-15 13:06:19 EDT
Created attachment 361116 [details]
dmesg with dracut-001-10.git4d924752.fc12.noarch.rpm

Well, this one does not work any better. It fails exactly the same way at the same place
Comment 7 Harald Hoyer 2009-09-16 02:48:17 EDT
If I look at the dmesg, it seems you did not recreate the initramfs image, with which you are testing...

dmesg says:
dracut: dracut-001-9.git6f0e469d.fc12
Comment 8 Harald Hoyer 2009-09-16 02:49:13 EDT
please recreate the initramfs with dracut like you would do with mkinitrd.

# dracut /boot/initramfs-<kernel version>.img <kernel version>
Comment 9 Nicolas Mailhot 2009-09-16 03:50:13 EDT
I did (with a -f and a rm of the old file even) but it seems I appached the wrong log (though the new one is hardly different)

That, or your rpm has the wrong code inside. I'll check this evening when I have acces to this system
Comment 10 Nicolas Mailhot 2009-09-16 03:51:34 EDT
Anyway the new log has the same errors, didn't notice anything new when looking at it. But I didn't try diff-ing
Comment 11 Harald Hoyer 2009-09-16 04:13:40 EDT
ok, please provide the output of:

# lsinitrd /boot/<image which fails> | grep dracut
# lsinitrd /boot/<image which fails> | grep mdadm.conf
# grep mdadm /etc/dracut.conf
Comment 12 Harald Hoyer 2009-09-16 04:14:33 EDT
and I really would like to see the new dmesg output
Comment 13 Nicolas Mailhot 2009-09-16 17:21:52 EDT
Well I made the mistake of updating rawhide before re-doing a new test, and now the system does not reboot at all anymore (the system drops into the "please unbork me if you're an admin" at fs check time

So before taking any more risk with my data I've rebooted from an usb key into a rescue session, to re-sync the raid (which has not been synced since the start of august when dracut started not assembling it correctly at boot time)

This way even if I crap one disk the other will still have a fresh data copy as it was always supposed to. I've been living too dangerously this past month (granted I was away a few weeks with the system off)

It should be almost done now, been running for about 90 min
Comment 14 Nicolas Mailhot 2009-09-16 17:27:58 EDT
Well it's done, and it craps itself just the same way it did 2 hours ago before I decided to create a rescue flash

The main change since the posted dmesg is udev logging '/sbin/mdmadm --detail --export /dev/dm127' unexpected exit with status 0x000b

Could be the new mdadm version in koji today. Or the new selinux that does not block anymore some mdadm files. If ails the same way with old kernels, so it's not something new in the initramfs itself
Comment 15 Nicolas Mailhot 2009-09-16 17:31:49 EDT
If I remove the /boot /dev/md0 mount in fstab the system boots

But I can assemble md0 post-boot, mdadm is wedged in a strange state and does not accept any md manipulation (md127 is live with / over lvm on it, as dracut assembled it, though)

So I can't change the initramfs from rawhide anymore :(

Next try: reboot in rescue mode and chroot / from there. The F11 rescue disk does not crap over md like dracut
Comment 16 Nicolas Mailhot 2009-09-16 17:53:26 EDT
well, actually I had the bright idea to downgrade mdadm first (since it does not need /boot access to install) that restored the system to yesterday's level of breakage (boots, but only half the array under / is assembled)

seems mdadm-3.0-3.fc12.x86_64 is bad mojo
Comment 17 Nicolas Mailhot 2009-09-16 18:06:02 EDT
(In reply to comment #8)
> please recreate the initramfs with dracut like you would do with mkinitrd.
> # dracut /boot/initramfs-<kernel version>.img <kernel version>  

Oh, now I see why it didn't work, I reused and old dracut command, and it created and initrd-generic- file instead of an initramfs- file
Comment 18 Nicolas Mailhot 2009-09-16 18:22:22 EDT
An this time it seems to work! Yahoo!

Except, I managed to re-break the md in the meanwhile, so I can't check if the full stop => boot cycle works 100%. I'll need to wait before the disks are re-synced to test a full cycle in "clean" conditions
Comment 19 Nicolas Mailhot 2009-09-16 18:26:24 EDT
Anyway, just to be complete

grep mdadm /etc/dracut.conf  
# install local /etc/mdadm.conf

New working initramfs:

lsinitrd /boot/initramfs-2.6.31-17.fc12.x86_64.img |grep dracut
-rw-r--r--   1 root     root           31 Sep 17 00:07 dracut-001-10.git4d924752.fc12
-rw-r--r--   1 root     root         2675 Sep 15 15:54 lib/dracut-lib.sh
. /lib/dracut-lib.sh
    echo "file a bug against dracut."

lsinitrd /boot/initramfs-2.6.31-17.fc12.x86_64.img |grep mdadm.conf
-rw-r--r--   1 root     root          164 Jul  4 16:38 etc/mdadm.conf

Old failing initramfs

lsinitrd /boot/initramfs-2.6.31-14.fc12.x86_64.img |grep dracut
-rw-r--r--   1 root     root           30 Sep 15 18:33 dracut-001-9.git6f0e469d.fc12
-rw-r--r--   1 root     root         2540 Sep  9 19:50 lib/dracut-lib.sh
. /lib/dracut-lib.sh

lsinitrd /boot/initramfs-2.6.31-14.fc12.x86_64.img |grep mdadm.conf
Comment 20 Nicolas Mailhot 2009-09-16 18:29:29 EDT
Created attachment 361388 [details]
dmesg with dracut-001-10.git4d924752.fc12.noarch

new working dmesg
Comment 21 Nicolas Mailhot 2009-09-17 02:29:31 EDT
(In reply to comment #18)
> An this time it seems to work! Yahoo!
> Except, I managed to re-break the md in the meanwhile, so I can't check if the
> full stop => boot cycle works 100%. I'll need to wait before the disks are
> re-synced to test a full cycle in "clean" conditions  

And cycling the system works too. The raid is still in a sane unbroken state
Comment 22 Harald Hoyer 2009-09-17 07:05:06 EDT
if it works please close the bug
Comment 23 Harald Hoyer 2009-09-17 07:09:27 EDT
Please test dracut-001-12.git0f7e10ce.fc12.
Either wait for it to appear in rawhide or do:
# yum install koji
# cd $(mktemp -d)
# koji download-build 132403
# rpm -Fvh *.rpm

and recreate the image with

# dracut /boot/<image> <kernel version>

Note: in recent installs the <image> is named initramfs-<kernel version>.img
Comment 24 Nicolas Mailhot 2009-09-17 13:39:59 EDT
Created attachment 361532 [details]
dmesg with dracut-002-1.fc12

I assume you're more interested in dracut-002-1.fc12 and yes it works

Note You need to log in before you can comment on or make changes to this bug.