Bug 521959

Summary: dracut does not work for / over lvm over md
Product: [Fedora] Fedora Reporter: Nicolas Mailhot <nicolas.mailhot>
Component: dracutAssignee: Harald Hoyer <harald>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: dledford, harald
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-17 17:40:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg
none
boot.log
none
lspci
none
dmesg with dracut-001-10.git4d924752.fc12.noarch.rpm
none
dmesg with dracut-001-10.git4d924752.fc12.noarch
none
dmesg with dracut-002-1.fc12 none

Description Nicolas Mailhot 2009-09-08 20:56:56 UTC
Description of problem:

On a system where /boot is on a raid-1 md0 and everything else in lvm over raid-1 md1, since the move to dracut, md volumes are not cleanly released at the end of the initramfs stage and therefore are not re-assembled correctly later (half of md1 is online, the other half stuck in md127)

cat /proc/mdstat 
Personalities : [raid1] 
md1 : active raid1 sda3[0]
      288856640 blocks [2/1] [U_]
      
md0 : active raid1 sda1[0] sdb1[1]
      2096384 blocks [2/2] [UU]
      
md126 : active raid1 sdb3[1]
      288856640 blocks [2/1] [_U]
      
unused devices: <none>
[root@arekh nim]# mdadm --stop /dev/md126
mdadm: failed to stop array /dev/md126: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?

Version-Release number of selected component (if applicable):

kernel-2.6.31-0.212.rc9.git1.fc12.x86_64 with initramfs rebuild using
dracut-001-6.gitf5c4374d.fc12.noarch

dracut -f /boot/initrd-generic-2.6.31-0.212.rc9.git1.fc12.x86_64.img 2.6.31-0.212.rc9.git1.fc12.x86_64

Comment 1 Nicolas Mailhot 2009-09-08 20:59:25 UTC
Created attachment 360132 [details]
dmesg

Comment 2 Nicolas Mailhot 2009-09-08 21:00:35 UTC
Created attachment 360133 [details]
boot.log

Comment 3 Nicolas Mailhot 2009-09-08 21:01:10 UTC
Created attachment 360134 [details]
lspci

Comment 4 Nicolas Mailhot 2009-09-08 21:01:45 UTC
(22:53:04) warren: nim-nim: add rdbreak=pre-pivot
(22:53:06) warren: nim-nim: boot it
(22:53:21) warren: nim-nim: look in the filesystem for the script that runs mdmon
(22:53:29) warren: nim-nim: run mdmon in the same manner, see if it segfaults
(22:54:16) warren: nim-nim: if so, ulimit -c unlimited; make it crash; #plug usb stick in; mkdir /tmp/whatever; mount /dev/dev/sdwhatever1 /tmp/whatever; copy it
22:55
(22:55:10) warren: nim-nim: with that core file, hopefully we can get a backtrace
(22:56:18) warren: nim-nim: hopefully ulimit is a dash built-in

Comment 5 Harald Hoyer 2009-09-15 14:25:53 UTC
For the advanced user, here is a scratch version to test:

# rpm -e '*dracut*' --nodeps
# rpm -ivh 'http://koji.fedoraproject.org/koji/getfile?taskID=1680533&name=dracut-001-10.git4d924752.fc12.noarch.rpm'

Comment 6 Nicolas Mailhot 2009-09-15 17:06:19 UTC
Created attachment 361116 [details]
dmesg with dracut-001-10.git4d924752.fc12.noarch.rpm

Well, this one does not work any better. It fails exactly the same way at the same place

Comment 7 Harald Hoyer 2009-09-16 06:48:17 UTC
If I look at the dmesg, it seems you did not recreate the initramfs image, with which you are testing...

dmesg says:
dracut: dracut-001-9.git6f0e469d.fc12

Comment 8 Harald Hoyer 2009-09-16 06:49:13 UTC
please recreate the initramfs with dracut like you would do with mkinitrd.

# dracut /boot/initramfs-<kernel version>.img <kernel version>

Comment 9 Nicolas Mailhot 2009-09-16 07:50:13 UTC
I did (with a -f and a rm of the old file even) but it seems I appached the wrong log (though the new one is hardly different)

That, or your rpm has the wrong code inside. I'll check this evening when I have acces to this system

Comment 10 Nicolas Mailhot 2009-09-16 07:51:34 UTC
Anyway the new log has the same errors, didn't notice anything new when looking at it. But I didn't try diff-ing

Comment 11 Harald Hoyer 2009-09-16 08:13:40 UTC
ok, please provide the output of:

# lsinitrd /boot/<image which fails> | grep dracut
# lsinitrd /boot/<image which fails> | grep mdadm.conf
# grep mdadm /etc/dracut.conf

Comment 12 Harald Hoyer 2009-09-16 08:14:33 UTC
and I really would like to see the new dmesg output

Comment 13 Nicolas Mailhot 2009-09-16 21:21:52 UTC
Well I made the mistake of updating rawhide before re-doing a new test, and now the system does not reboot at all anymore (the system drops into the "please unbork me if you're an admin" at fs check time

So before taking any more risk with my data I've rebooted from an usb key into a rescue session, to re-sync the raid (which has not been synced since the start of august when dracut started not assembling it correctly at boot time)

This way even if I crap one disk the other will still have a fresh data copy as it was always supposed to. I've been living too dangerously this past month (granted I was away a few weeks with the system off)

It should be almost done now, been running for about 90 min

Comment 14 Nicolas Mailhot 2009-09-16 21:27:58 UTC
Well it's done, and it craps itself just the same way it did 2 hours ago before I decided to create a rescue flash

The main change since the posted dmesg is udev logging '/sbin/mdmadm --detail --export /dev/dm127' unexpected exit with status 0x000b

Could be the new mdadm version in koji today. Or the new selinux that does not block anymore some mdadm files. If ails the same way with old kernels, so it's not something new in the initramfs itself

Comment 15 Nicolas Mailhot 2009-09-16 21:31:49 UTC
If I remove the /boot /dev/md0 mount in fstab the system boots

But I can assemble md0 post-boot, mdadm is wedged in a strange state and does not accept any md manipulation (md127 is live with / over lvm on it, as dracut assembled it, though)

So I can't change the initramfs from rawhide anymore :(

Next try: reboot in rescue mode and chroot / from there. The F11 rescue disk does not crap over md like dracut

Comment 16 Nicolas Mailhot 2009-09-16 21:53:26 UTC
well, actually I had the bright idea to downgrade mdadm first (since it does not need /boot access to install) that restored the system to yesterday's level of breakage (boots, but only half the array under / is assembled)

seems mdadm-3.0-3.fc12.x86_64 is bad mojo

Comment 17 Nicolas Mailhot 2009-09-16 22:06:02 UTC
(In reply to comment #8)
> please recreate the initramfs with dracut like you would do with mkinitrd.
> 
> # dracut /boot/initramfs-<kernel version>.img <kernel version>  

Oh, now I see why it didn't work, I reused and old dracut command, and it created and initrd-generic- file instead of an initramfs- file

Comment 18 Nicolas Mailhot 2009-09-16 22:22:22 UTC
An this time it seems to work! Yahoo!

Except, I managed to re-break the md in the meanwhile, so I can't check if the full stop => boot cycle works 100%. I'll need to wait before the disks are re-synced to test a full cycle in "clean" conditions

Comment 19 Nicolas Mailhot 2009-09-16 22:26:24 UTC
Anyway, just to be complete

grep mdadm /etc/dracut.conf  
# install local /etc/mdadm.conf
mdadmconf="yes"


New working initramfs:

lsinitrd /boot/initramfs-2.6.31-17.fc12.x86_64.img |grep dracut
-rw-r--r--   1 root     root           31 Sep 17 00:07 dracut-001-10.git4d924752.fc12
-rw-r--r--   1 root     root         2675 Sep 15 15:54 lib/dracut-lib.sh
init
. /lib/dracut-lib.sh
    echo "file a bug against dracut."

lsinitrd /boot/initramfs-2.6.31-17.fc12.x86_64.img |grep mdadm.conf
-rw-r--r--   1 root     root          164 Jul  4 16:38 etc/mdadm.conf
init

Old failing initramfs

lsinitrd /boot/initramfs-2.6.31-14.fc12.x86_64.img |grep dracut
-rw-r--r--   1 root     root           30 Sep 15 18:33 dracut-001-9.git6f0e469d.fc12
-rw-r--r--   1 root     root         2540 Sep  9 19:50 lib/dracut-lib.sh
. /lib/dracut-lib.sh
init

lsinitrd /boot/initramfs-2.6.31-14.fc12.x86_64.img |grep mdadm.conf
init

Comment 20 Nicolas Mailhot 2009-09-16 22:29:29 UTC
Created attachment 361388 [details]
dmesg with dracut-001-10.git4d924752.fc12.noarch

new working dmesg

Comment 21 Nicolas Mailhot 2009-09-17 06:29:31 UTC
(In reply to comment #18)
> An this time it seems to work! Yahoo!
> 
> Except, I managed to re-break the md in the meanwhile, so I can't check if the
> full stop => boot cycle works 100%. I'll need to wait before the disks are
> re-synced to test a full cycle in "clean" conditions  

And cycling the system works too. The raid is still in a sane unbroken state

Comment 22 Harald Hoyer 2009-09-17 11:05:06 UTC
if it works please close the bug

Comment 23 Harald Hoyer 2009-09-17 11:09:27 UTC
Please test dracut-001-12.git0f7e10ce.fc12.
Either wait for it to appear in rawhide or do:
# yum install koji
# cd $(mktemp -d)
# koji download-build 132403
# rpm -Fvh *.rpm

and recreate the image with

# dracut /boot/<image> <kernel version>

Note: in recent installs the <image> is named initramfs-<kernel version>.img

Comment 24 Nicolas Mailhot 2009-09-17 17:39:59 UTC
Created attachment 361532 [details]
dmesg with dracut-002-1.fc12

I assume you're more interested in dracut-002-1.fc12 and yes it works