Bug 710646 - raid 10 PV not being assembled within dracut
Summary: raid 10 PV not being assembled within dracut
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: mdadm
Version: rawhide
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Doug Ledford
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 710713 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-03 21:48 UTC by Clyde E. Kunkel
Modified: 2011-10-05 23:58 UTC (History)
23 users (show)

Fixed In Version: kernel-2.6.40.6-0.fc15
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-14 20:19:08 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
dmesg of kernal failure (72.55 KB, text/plain)
2011-06-03 21:48 UTC, Clyde E. Kunkel
no flags Details
dmesg output (64.16 KB, text/plain)
2011-06-05 02:03 UTC, Bruno Wolff III
no flags Details
Seriel Console Output (10.90 KB, text/plain)
2011-06-05 08:50 UTC, Frank Murphy
no flags Details
patch for mdadm (872 bytes, patch)
2011-06-05 19:04 UTC, Milan Broz
no flags Details | Diff

Description Clyde E. Kunkel 2011-06-03 21:48:31 UTC
Created attachment 502917 [details]
dmesg of kernal failure

Description of problem:
dracut can't find root lv because the underlying raid 10 pv is started

Version-Release number of selected component (if applicable):
kernel-3.0-0.rc1.git0.2.fc16.x86_64

How reproducible:
every time

Steps to Reproduce:
1. boot system
2.
3.
  
Actual results:
root lv not found and dropped into shell

Expected results:
normal boot into G3 desktop

Additional info:
kernel-2.6.39-1.fc16.x86_64 works as expected and I don't see any mdadm or dracut updates that may have impacted.

am attaching dmesg output from the shell during the failure.  I note the following msgs:
[    8.325272] dracut: Scanning devices sda2 sdb6 sdc2 sdd2 sde1 sde2  for LVM logical
 volumes VolGroup00/rawhide   <----NOTE that /dev/md127 was not included!!
[    9.027064] dracut: Could not determine kernel version used.
...

[    9.590848] dracut: Volume group "VolGroup00" not found <***This is where root is located.
...
[    9.591076] dracut: Skipping volume group VolGroup00
[    9.604755] dracut: Autoassembling MD Raid
[    9.973544] md: md127 stopped.
[    9.976869] dracut: mdadm: Cannot start array: No such device
[   10.559726] dracut: Autoassembling MD Raid
[   11.000792] md: md127 stopped.

above repeated many times.

mdadm.conf in /etc is correct as is kernel cmd line in grub.
# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md127 level=raid10 num-devices=4 UUID=b9438b55:1d815c8b:bfe78010:bc810f04


title Fedora (3.0-0.rc1.git0.2.fc16.x86_64)
	root (hd0,0)
	kernel /vmlinuz-3.0-0.rc1.git0.2.fc16.x86_64 ro root=/dev/mapper/VolGroup00-rawhide rd_LVM_LV=VolGroup00/rawhide rd_MD_UUID=b9438b55:1d815c8b:bfe78010:bc810f04 rd_NO_LUKS rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rdshell noresume rd_NO_PLYMOUTH
	initrd /initramfs-3.0-0.rc1.git0.2.fc16.x86_64.img
title Fedora (2.6.39-1.fc16.x86_64)
	root (hd0,0)
	kernel /vmlinuz-2.6.39-1.fc16.x86_64 ro root=/dev/mapper/VolGroup00-rawhide rd_LVM_LV=VolGroup00/rawhide rd_MD_UUID=b9438b55:1d815c8b:bfe78010:bc810f04 rd_NO_LUKS rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rdshell noresume rd_NO_PLYMOUTH
	initrd /initramfs-2.6.39-1.fc16.x86_64.img

Comment 1 Clyde E. Kunkel 2011-06-03 21:49:13 UTC
Should have made severity high.

Comment 2 Bruno Wolff III 2011-06-04 01:01:25 UTC
I am seeing a similar problem. In my case I am using luks on top of software raid 1 and none of my arrays are being assembled.

Comment 3 Kyle McMartin 2011-06-04 02:23:11 UTC
 volumes VolGroup00/rawhide   <----NOTE that /dev/md127 was not included!!
[    9.027064] dracut: Could not determine kernel version used.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

lvm bug.

        if (sscanf(_uts.release, "%d.%d.%d",
                        &_kernel_major,
                        &_kernel_minor,
                        &_kernel_release) != 3) {
                log_error("Could not determine kernel version used.");
                return 0;
        }

Comment 4 Clyde E. Kunkel 2011-06-04 03:06:46 UTC
(In reply to comment #0)
> Created attachment 502917 [details]
> dmesg of kernal failure
> 
> Description of problem:
> dracut can't find root lv because the underlying raid 10 pv is started
                                                                 ^^^^^^^
                                                                 not started.

Comment 5 Milan Broz 2011-06-04 09:10:44 UTC
This is in libdevmapper, so all users of libdevmapper are probably affected (lvm, dmraid, cryptsetup, mpath, kpartx, ....)

(Seems there was some temporary variants during 3.0 transition, because other distro reports for me 3.0.0 which works.. Anyway, kernel check must be fixed.)

Comment 6 Kyle McMartin 2011-06-04 16:45:12 UTC
*** Bug 710713 has been marked as a duplicate of this bug. ***

Comment 7 Milan Broz 2011-06-04 19:43:54 UTC
I added workaroud until the detection of 3.0 kernel is settled in lvm2 upstream.
At least for me system now boots properly (there are other issues with another packages but boot is not failing now).

Build is here (until it reach rawhide repo)
http://koji.fedoraproject.org/koji/buildinfo?buildID=246322

Please let me know if there is still some problem.

Comment 8 Andre Robatino 2011-06-04 20:53:09 UTC
I'm using a Rawhide VM with the default LVM partitioning using the entire disk (in particular, nothing advanced such as RAID) and see this, so it should be affecting almost everybody.

Comment 9 Bruno Wolff III 2011-06-04 20:55:15 UTC
I tried lvm 2.02.84-2.fc16 stuff and device mapper 1.02.63-2 stuff and my software raid 1 array is not being properly assembled.

Comment 10 Bruno Wolff III 2011-06-04 20:56:52 UTC
And note that I uninstalled the latest kernel and then reinstalled it, so that dracut would rebuild the initramfs.

Comment 11 Bruno Wolff III 2011-06-05 02:03:41 UTC
Created attachment 503032 [details]
dmesg output

If I wait long enough I get dropped into a shell. I was able to manually start the /boot array and save the dmesg output there.

Comment 12 Milan Broz 2011-06-05 08:09:08 UTC
hm. I am afraid that raid assembly is another bug (dracut or mdadm?).
(And yes, I forgot to say it need rebuild of initramfs.)

Can anyone verify that without MD RAID it works now? (Default install should be such cnfiguration.)

Comment 13 Frank Murphy 2011-06-05 08:45:40 UTC
is this correct rebuild syntax:

Have tried

cd /boot
dracut -v -f -o mdraid initramfs-3.0-0.rc1.git0.2.fc16.i686.img

I then cannot find the luks stuff again.

Will attach seriel console output.

Comment 14 Frank Murphy 2011-06-05 08:50:42 UTC
Created attachment 503054 [details]
Seriel Console Output

I don't have any soft\physical raid on this vm.

Comment 15 Milan Broz 2011-06-05 09:02:29 UTC
I had to remove the whole kernel and reinstall again to make it works but now I am able to boot with bot lvm and some crypto volumes.

When you drop to shell, kernel modules are properly loaded in dracut? Does dracut see underlying device? (try blkid from dracut shell - it should see "crypto_LUKS" device)

Which kernel version you are using? (it should be kernel-3.0-0.rc1.git0.2.fc16 at least)

Comment 16 Frank Murphy 2011-06-05 09:28:22 UTC
yum erase kernel-3.0-0.rc1.git0.2.fc16
yum install kernel-3.0-0.rc1.git0.2.fc16

reboot has got past luks,
only waiting for lots of "audit" stuff to finish.

Comment 17 Frank Murphy 2011-06-05 12:05:33 UTC
(In reply to comment #16)
> yum erase kernel-3.0-0.rc1.git0.2.fc16
> yum install kernel-3.0-0.rc1.git0.2.fc16
> 
> reboot has got past luks,
> only waiting for lots of "audit" stuff to finish.


audit still going on, would that be normal? 
Anyone else getting lots of audit?

Comment 18 Milan Broz 2011-06-05 12:35:20 UTC
There are apparently more problems. MD raid fails to assemble, definitely separate problem (mdadm -As doesn't work for some reason from init ramdisk,
I'll check that later, maybe it is regression in kernel md code.)

For audit - isn't that just selinux? try to boot with selinux=0 (and do full relabel later).

But lvm/cryptsetup should work with update above.

Comment 19 Frank Murphy 2011-06-05 12:55:03 UTC
selinux=0 

boot completes,

but I also have telinit 3 on the kernel line.

I login as user, 
password comes up in the clear, unhidden.

Unsure what does that.

Comment 20 Milan Broz 2011-06-05 13:44:45 UTC
(In reply to comment #19)
> I login as user, 
> password comes up in the clear, unhidden.
see bug #650890 or bug #655538 (or similar, it is plymouth issue probably)

Anyway, thanks for confirmation that at least devmapper problem is fixed.

Comment 21 Milan Broz 2011-06-05 19:04:50 UTC
Created attachment 503114 [details]
patch for mdadm

The same problem is in mdadm, kernel version 3.0-rc1 is mishandled and mdadm is not able to perform requested operation (mdadm -As --auto=yes --run).

With attached patch I can boot from lvm over md raid1 again.

Comment 22 Milan Broz 2011-06-05 19:05:44 UTC
I have no rights for mdadm package, reassigning. See attached patch.

Comment 23 Bruno Wolff III 2011-06-06 02:22:17 UTC
I tested the mdadm fix and I do get past the raid assembly now. I am now hitting the selinux policy load loop, but that problem is likely not related to this.

Comment 24 Frank Murphy 2011-06-06 08:42:38 UTC
(In reply to comment #23)
> I tested the mdadm fix and I do get past the raid assembly now. I am now
> hitting the selinux policy load loop, but that problem is likely not related to
> this.


Bugged the selinux loop:
https://bugzilla.redhat.com/show_bug.cgi?id=711015

Comment 25 Clyde E. Kunkel 2011-06-07 15:10:36 UTC
(In reply to comment #23)
> I tested the mdadm fix and I do get past the raid assembly now. 

Do you have an RPM available to share?

Comment 26 Bruno Wolff III 2011-06-07 16:23:24 UTC
I just did a local build for i686.
Note that even with the systemd fix for the selinux loop (there is a scratch build for that), my system still wasn't booting with the kernel appearing to crap out. So we will likely need to wait for an rc2 build to be able to use a 3.0 kernel. Hopefully by then there will also be a new mdadm.

Comment 27 Andre Robatino 2011-06-08 18:58:19 UTC
With the 20110608 updates including the 3.0-0.rc2.git0.1.fc16 kernel, the only workaround I need to boot is "selinux=0".

Comment 28 Clyde E. Kunkel 2011-06-08 19:47:37 UTC
(In reply to comment #27)
> With the 20110608 updates including the 3.0-0.rc2.git0.1.fc16 kernel, the only
> workaround I need to boot is "selinux=0".

Is your root on a raid device?

Comment 29 Andre Robatino 2011-06-08 19:54:29 UTC
(In reply to comment #28)
> (In reply to comment #27)
> > With the 20110608 updates including the 3.0-0.rc2.git0.1.fc16 kernel, the only
> > workaround I need to boot is "selinux=0".
> 
> Is your root on a raid device?

No - see comment 8.

Comment 30 Bruno Wolff III 2011-06-10 02:14:32 UTC
If your request for commit access to mdadm doesn't get approved soon, consider talking to a proven packager to either get added or to commit your fixes.

Comment 31 Clyde E. Kunkel 2011-06-14 17:14:34 UTC
Any idea when mdadm will be fixed?  TIA

Comment 32 Milan Broz 2011-06-14 18:16:02 UTC
Doug, please could you grant me access right to mdadm or fix this BZ?

mdadm is quite seriously broken in rawide without tkernel 3.0 patch and it need rebuild.

Comment 33 Doug Ledford 2011-06-14 19:28:05 UTC
Patch applied.

Comment 34 Clyde E. Kunkel 2011-06-14 22:17:11 UTC
I see the new mdadm in koji, however:


$ wget http://kojipkgs.fedoraproject.org/packages/mdadm/3.2.1/5.fc16/x86_64/mdadm-3.2.1-5.fc16.x86_64.rpm
--2011-06-14 18:14:26--  http://kojipkgs.fedoraproject.org/packages/mdadm/3.2.1/5.fc16/x86_64/mdadm-3.2.1-5.fc16.x86_64.rpm
Resolving kojipkgs.fedoraproject.org... 209.132.181.10
Connecting to kojipkgs.fedoraproject.org|209.132.181.10|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2011-06-14 18:14:27 ERROR 403: Forbidden.

Comment 35 Andre Robatino 2011-06-14 22:23:42 UTC
Should be temporary, nothing specific to that package:

https://fedorahosted.org/fedora-infrastructure/ticket/2823

Comment 36 Bruno Wolff III 2011-06-14 23:46:28 UTC
I was able to boot the -rc3 kernel using mdadm-3.2.1-5.fc16.i686.

Comment 37 Fedora Update System 2011-10-04 14:14:50 UTC
kernel-2.6.40.6-0.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.6-0.fc15

Comment 38 Fedora Update System 2011-10-05 23:58:53 UTC
kernel-2.6.40.6-0.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.