Bug 605940 - mdraid does not initialize at boot time after kernel upgrade
mdraid does not initialize at boot time after kernel upgrade
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: lvm2 (Show other bugs)
12
x86_64 Linux
low Severity high
: ---
: ---
Assigned To: Peter Rajnoha
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-19 10:00 EDT by Eduardo
Modified: 2010-12-03 09:15 EST (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-12-03 08:47:30 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
grub (1.32 KB, application/octet-stream)
2010-06-25 23:40 EDT, Eduardo
no flags Details
dmesg (63.68 KB, application/octet-stream)
2010-06-27 08:50 EDT, Eduardo
no flags Details

  None (edit)
Description Eduardo 2010-06-19 10:00:46 EDT
Description of problem:
After upgrading from kernel 2.6.32.12-115.fc12.x86_64 to 2.6.32.14-127.fc12.x86_64, mdraid does not initialize an Intel BIOS raid at boot time. The LVM physical group gets initialized pointing to one of the hard disks of the raid (randomly?) so if undetected by the disks go out of sync.

Version-Release number of selected component (if applicable):
kernel-2.6.32.14-127.fc12.x86_6
mdadm-3.0.3-2.fc12.x86_64

How reproducible:
Every time


Steps to Reproduce:
1. Install an Intel BIOS raid and setup LVM volumes
2. reboot to the new kernel 2.6.32.14-127.fc12.x86_6
3. verify where the physical volume points to (pvdisplay)
  
Actual results:
The raid will not be initialized and the LVM physical volume will point to only one of the disks


Expected results:
The raid should be initialized and the LVM physical volume should point to the raided device


Additional info:
I was able to initialize mdraid manually after boot, but LVM kept pointing to one of the disks when reinitialized even if the raid device was available (it showed it as duplicated)

I am only able to workaround this issue by initializing the raid with dmraid instead. LVM in that case will pick up the raided device correctly.

Please let me know if you need additional information and how to collect it. Thank you,
Comment 1 Stanislaw Gruszka 2010-06-23 11:47:40 EDT
Please show /boot/grub/grub.conf .
Comment 2 Eduardo 2010-06-25 23:40:57 EDT
Created attachment 427034 [details]
grub

This is the grub file as it was when I reported this. Since then I have added some options in order to get my raid running. Actually after reporting this issue I realized that the raid had actually been broken and it was no longer available from my windows partition. It seemed that the raid was identified as a software raid instead and the superblock was overwritten. I had to recreate and resynchronize which took a long time

Here are the options that I added to this grub file. I am not sure if I really need all these options. I followed a grub what I got from a F13 installation in another machine (this machine is still a a fresh F12 installation). I just removed rd_NO_DM to start the raid with draid from the root image:
rd_LVM_LV=vg_asgard1/LogVol_root rd_LVM_LV=vg_asgard1/LogVol_swap rd_NO_LUKS rd_NO_MD noiswmd
Comment 3 Stanislaw Gruszka 2010-06-26 14:01:45 EDT
Ahh, I think I misunderstand your problem at first glance. 

(In reply to comment #2)
> It seemed that the raid was identified as
> a software raid instead and the superblock was overwritten. I had to recreate
> and resynchronize which took a long time

You have hardware raid. Hw raid have noting in common with md (software raid) except name. It is performed by firmware and is hidden in kernel as simple SCSI /dev/sdX device. To assure I have right, boot old working kernel on raid device, and do "cat /proc/mdstat" - it should show nothing.

For some unknown reason hw raid stop working and disk controller start work in  JBOD (just a bunch of disks) mode. In result kernel start to see multiple disk instead of one. That could be disk controller driver problem after kernel update, what driver do you use? Please attach dmesg from your system as well.

> I just
> removed rd_NO_DM to start the raid with draid from the root image:

Not sure if this is good idea. I think it's good to have rd_NO_DM option to assure md (sofware raid) will not run.
Comment 4 Eduardo 2010-06-27 08:49:36 EDT
(In reply to comment #3)

> You have hardware raid. Hw raid have noting in common with md (software raid)
> except name. It is performed by firmware and is hidden in kernel as simple SCSI

No, I do not have a hardware raid. It is a Intel BIOS raid. /proc/mdstat used to show the /dev/md126 and /dev/md127 as it was supposed to. I was getting hit by bug 576749 paralyzing my system for hours, so I have been switching to dmraid manually and through init scripts. After last week's upgrades the raids kept breaking under the new kernel before it reached my scripts, until I realized that with these options I could make the raid initialize with dmraid instead. I have been rebuilding the raid from Intel's console in my windows partition after those breakages.

> For some unknown reason hw raid stop working and disk controller start work in 
> JBOD (just a bunch of disks) mode. In result kernel start to see multiple disk
> instead of one. That could be disk controller driver problem after kernel
> update, what driver do you use? Please attach dmesg from your system as well.

I will post my dmesg next. Thank you for taking a look at this.
Comment 5 Eduardo 2010-06-27 08:50:43 EDT
Created attachment 427199 [details]
dmesg
Comment 6 Eduardo 2010-09-08 19:48:33 EDT
I just upgraded this system to Fedora 13. At the beginning I kept getting the raid identified as a software raid, though it worked fine under my Windows partition with Intel's drivers. Apparently the disks did not have the right metadata, but somehow the Intel driver was able to know that the disks were in a imsm raid. I reinitialize them using mdadm under Fedora:
mdadm --create /dev/md/imsm -e imsm -n 2 /dev/sdb /dev/sdc
mdadm -C /dev/md/myraid --level raid1 -n 2 /dev/sdb /dev/sdc
and they started to work correctly (/dev/md126 and /dev/md127 showed up as expected) and I am still able to see the raid under my Windows partition.

It took me some tries using the correct dracut flags. I ended using only rd_NO_LUKS rd_NO_DM. If I disabled MD on dracut, the raid still comes up correctly at the end but LVM "sees" the logical group on the raid as well as on the individual disks and it choses to use one of the disks instead of the raid (at least that is what pvdisplay says). When dracut initializes the raid somehow the access to the individual disks gets blocked (I can see errors happening when it LVM tries to access them). I am not sure if that is or should be a separate bug.
Everything is working fine on my system now with that configuration with these versions:
kernel-2.6.34.6-47.fc13.x86_64
dracut-005-3.fc13.noarch
mdadm-3.1.3-0.git20100804.2.fc13.x86_64
Let me know if you need more information about the problem when the raid is not initialized by dracut.
Comment 7 Eduardo 2010-09-09 19:51:39 EDT
I just found a later bug filed for RHEL 6 that describes the lvm problem that I was reporting here (Bug 620745). I upgraded to lvm2-2.02.73-2.fc13.x86_64 from updates-testing and it is working correctly now with all the standard flags for dracut (rd_NO_LUKS rd_NO_MD rd_NO_DM). I guess this bug can be closed when that version makes it to the standard updates. Thank you,
Comment 8 Stanislaw Gruszka 2010-09-10 00:16:34 EDT
Change component to lvm2. Not sure if the same fix will go to F-12, if not bug can be closed with NEXTRELEASE resolution.
Comment 9 Peter Rajnoha 2010-09-10 04:34:09 EDT
This should probably get into F-12 soon. We have all Fedoras in sync now, so it's just matter of submitting an official fedora update and getting through karma process, but the build itself is done already:

  http://koji.fedoraproject.org/koji/buildinfo?buildID=190866

Setting to POST for now, if it won't get into F-12 for some reason, I'll set the NEXTRELEASE as Stanislaw pointed out...
Comment 10 Bug Zapper 2010-11-03 09:03:22 EDT
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 11 Bug Zapper 2010-12-03 08:47:30 EST
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.
Comment 12 Peter Rajnoha 2010-12-03 09:15:52 EST
This is fixed, just the package didn't make it for F12 (lvm2 >= 2.02.73 is needed, as tested and pointed out in comment #7).

Note You need to log in before you can comment on or make changes to this bug.