Red Hat Bugzilla – Bug 605940
mdraid does not initialize at boot time after kernel upgrade
Last modified: 2010-12-03 09:15:52 EST
Description of problem:
After upgrading from kernel 220.127.116.11-115.fc12.x86_64 to 18.104.22.168-127.fc12.x86_64, mdraid does not initialize an Intel BIOS raid at boot time. The LVM physical group gets initialized pointing to one of the hard disks of the raid (randomly?) so if undetected by the disks go out of sync.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Install an Intel BIOS raid and setup LVM volumes
2. reboot to the new kernel 22.214.171.124-127.fc12.x86_6
3. verify where the physical volume points to (pvdisplay)
The raid will not be initialized and the LVM physical volume will point to only one of the disks
The raid should be initialized and the LVM physical volume should point to the raided device
I was able to initialize mdraid manually after boot, but LVM kept pointing to one of the disks when reinitialized even if the raid device was available (it showed it as duplicated)
I am only able to workaround this issue by initializing the raid with dmraid instead. LVM in that case will pick up the raided device correctly.
Please let me know if you need additional information and how to collect it. Thank you,
Please show /boot/grub/grub.conf .
Created attachment 427034 [details]
This is the grub file as it was when I reported this. Since then I have added some options in order to get my raid running. Actually after reporting this issue I realized that the raid had actually been broken and it was no longer available from my windows partition. It seemed that the raid was identified as a software raid instead and the superblock was overwritten. I had to recreate and resynchronize which took a long time
Here are the options that I added to this grub file. I am not sure if I really need all these options. I followed a grub what I got from a F13 installation in another machine (this machine is still a a fresh F12 installation). I just removed rd_NO_DM to start the raid with draid from the root image:
rd_LVM_LV=vg_asgard1/LogVol_root rd_LVM_LV=vg_asgard1/LogVol_swap rd_NO_LUKS rd_NO_MD noiswmd
Ahh, I think I misunderstand your problem at first glance.
(In reply to comment #2)
> It seemed that the raid was identified as
> a software raid instead and the superblock was overwritten. I had to recreate
> and resynchronize which took a long time
You have hardware raid. Hw raid have noting in common with md (software raid) except name. It is performed by firmware and is hidden in kernel as simple SCSI /dev/sdX device. To assure I have right, boot old working kernel on raid device, and do "cat /proc/mdstat" - it should show nothing.
For some unknown reason hw raid stop working and disk controller start work in JBOD (just a bunch of disks) mode. In result kernel start to see multiple disk instead of one. That could be disk controller driver problem after kernel update, what driver do you use? Please attach dmesg from your system as well.
> I just
> removed rd_NO_DM to start the raid with draid from the root image:
Not sure if this is good idea. I think it's good to have rd_NO_DM option to assure md (sofware raid) will not run.
(In reply to comment #3)
> You have hardware raid. Hw raid have noting in common with md (software raid)
> except name. It is performed by firmware and is hidden in kernel as simple SCSI
No, I do not have a hardware raid. It is a Intel BIOS raid. /proc/mdstat used to show the /dev/md126 and /dev/md127 as it was supposed to. I was getting hit by bug 576749 paralyzing my system for hours, so I have been switching to dmraid manually and through init scripts. After last week's upgrades the raids kept breaking under the new kernel before it reached my scripts, until I realized that with these options I could make the raid initialize with dmraid instead. I have been rebuilding the raid from Intel's console in my windows partition after those breakages.
> For some unknown reason hw raid stop working and disk controller start work in
> JBOD (just a bunch of disks) mode. In result kernel start to see multiple disk
> instead of one. That could be disk controller driver problem after kernel
> update, what driver do you use? Please attach dmesg from your system as well.
I will post my dmesg next. Thank you for taking a look at this.
Created attachment 427199 [details]
I just upgraded this system to Fedora 13. At the beginning I kept getting the raid identified as a software raid, though it worked fine under my Windows partition with Intel's drivers. Apparently the disks did not have the right metadata, but somehow the Intel driver was able to know that the disks were in a imsm raid. I reinitialize them using mdadm under Fedora:
mdadm --create /dev/md/imsm -e imsm -n 2 /dev/sdb /dev/sdc
mdadm -C /dev/md/myraid --level raid1 -n 2 /dev/sdb /dev/sdc
and they started to work correctly (/dev/md126 and /dev/md127 showed up as expected) and I am still able to see the raid under my Windows partition.
It took me some tries using the correct dracut flags. I ended using only rd_NO_LUKS rd_NO_DM. If I disabled MD on dracut, the raid still comes up correctly at the end but LVM "sees" the logical group on the raid as well as on the individual disks and it choses to use one of the disks instead of the raid (at least that is what pvdisplay says). When dracut initializes the raid somehow the access to the individual disks gets blocked (I can see errors happening when it LVM tries to access them). I am not sure if that is or should be a separate bug.
Everything is working fine on my system now with that configuration with these versions:
Let me know if you need more information about the problem when the raid is not initialized by dracut.
I just found a later bug filed for RHEL 6 that describes the lvm problem that I was reporting here (Bug 620745). I upgraded to lvm2-2.02.73-2.fc13.x86_64 from updates-testing and it is working correctly now with all the standard flags for dracut (rd_NO_LUKS rd_NO_MD rd_NO_DM). I guess this bug can be closed when that version makes it to the standard updates. Thank you,
Change component to lvm2. Not sure if the same fix will go to F-12, if not bug can be closed with NEXTRELEASE resolution.
This should probably get into F-12 soon. We have all Fedoras in sync now, so it's just matter of submitting an official fedora update and getting through karma process, but the build itself is done already:
Setting to POST for now, if it won't get into F-12 for some reason, I'll set the NEXTRELEASE as Stanislaw pointed out...
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '12'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 12's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 12 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.
This is fixed, just the package didn't make it for F12 (lvm2 >= 2.02.73 is needed, as tested and pointed out in comment #7).