Red Hat Bugzilla – Bug 494254
mdraid dmraid bug (initscripts?)
Last modified: 2009-04-08 05:28:29 EDT
new dmfakeraid jbod array isn't initialised on boot potentially due to previous mdraid info on discs.
How to replicate on an nvidia 680i system:
* 2*discs in raid 0 (/dev/mapper/nvidia_abcdefgh)
* 2*discs as separate discs (sdc and sdd)
* fdisk sdc and sdd with the following:
md0: (raid1) LABEL=/boot
(ensure all partitions are appropriately formatted)
* add sdc and sdd as raid discs through the bios
* through the mediashield f10 setup screen create JBOD with sdc and sdd
* boot system from latest boot.iso (anaconda >22.214.171.124)
* after anaconda has scanned disc's accept the "do you want to initialise /dev/mapper/nvidia_hgfedcba" (the new jbod array)
* on to the original raid0 set put /boot / and swap (i put / and swap in to
LVM) and create part for windows install (i made a 100GB swap partition because
there was no NTFS format option and i couldn't leave it "unformatted")
* on to the new jbod create a massive ext3 partition and some swap space (i
used ext3 because I need windows to read it with the ext2ifs driver and am
unsure about ext4 functionality)
* install system with default packages and boot your system
hopefully you get the same issue as me where the jbod set was not initialised
because of sdd originally being a part of an mdraid set. despite anaconda
formatting the entire disc during installation and modifying the MBR of the JBOD the MBR of sdd remains in tact because of how the JBOD set was constructed in the first place. The problem arises because of how early mdraid array's are rebuilt during boot process. Because of the locks it has in place dmraid can not initialise all of its raid sets.
I see the problem as this. When i create the new JBOD using the nvidia mediashield bios i am asked to clear the disks. When i click yes the MBR of the fist disk is wiped but, i'm guessing, not the second. what anaconda needs to do when it initialises the new raid set before the partitioning step is to wipe the MBR from all associated disks and not just the first.
i hope you have the same problems when you try this :)
(comment from hdegoede https://bugzilla.redhat.com/show_bug.cgi?id=489148#c40)
Thanks for the detailed bug report. Here is what I believe has happened:
sdc + sdd both had a partition which were part of a mdraid set.
Normally if you would have choosen to re-cycle the 2 mdraid partitions in
anaconda, we would have wiped the mdraid meta data, but instead you made
the 2 disks part of an dmraid set using the BIOS-setup, the BIOS then
wipes the partition table clean. Anaconda sees a wiped partitiontable and
asks whether or not to initialize it, then we create an empty partition table
anaconda does not do any wiping of the mdraid meta data, as we do not see
it, as we check for things like lvm / mdraid meta data on the basis of
the partition table, which says there are no partitions so we do not
check for any meta data.
Then we create and format new partitions as asked, however creating
a filesystem only touches certain parts of the disk (where the
inode tables , journal, etc will live), so we happen to not write
the part where the mdraid meta data lives.
System reboots, scans for mdraid metadata and somehow finds the mdraid
metadata (perhaps some of your new partitions live on the same location
as the old ones ?).
Fixing this is going to be pretty hard, if not damn near impossible. Nothing
sort of doing a full wipe of the disks (slow) is going to get rid of all
metadata. Long story short, by letting the BIOS wipe the partitiontable before
first manually removing lvm and / or mdraid metadata you sort of shot yourself
in the foot. Or you could call this a BIOS bug, as it did not properly
wipe the disks before making a raid array out them.
(In reply to comment #1)
> the BIOS then
> wipes the partition table clean. Anaconda sees a wiped partition table and
> asks whether or not to initialize it, then we create an empty partition table
exactly, the bios wipes the partition table of the jbod (being the first disk) and doesn't touch the second or remaining MBR's.
> Then we create and format new partitions as asked, however creating
> a filesystem only touches certain parts of the disk (where the
> inode tables , journal, etc will live), so we happen to not write
> the part where the mdraid meta data lives.
the formatting went fine and fsck runs without error which leads me to think the issue is with the MBR. Getting anaconda to wipe the MBR of all associated disks in an array should, to my knowledge and reasoning, be a relatively easy thing to achieve as the system should be able to report what block devices are needed for a JBOD set.
I find your theory of the BIOS not wiping the mbr of the the disks in the jbod (other then the first disk) very interesting.
Can you verify this by doing:
fdisk -l /dev/sdd
Assuming sdd is the second disk of the jbod. And if this indeed
shows a partition table, can you edit using fdisk and remove all partitions?
And then reboot and see if the problem is fixed?
These results are from memory because i have since cleared the part tables and reformatted outside of anaconda's attempt to great success. I should have included the following in the original report but here we are :).
the exact sequence of the following may not be linear but it all happened.
I did an fdisk -l and noted that sdc had the proper partitions but sdd still had the original part info (sdd1 = type fd etc). i don't remember the exact error i faced on boot but there was a complaint about mdadm saying "no valid metadata could be found", which i'm guessing the system picked up from sdd1 being of type "raid auto".
Using boot.iso i went to the rescue environment and ran # dmraid -ay - only the raid0 set was activated. I deleted all parts on sdd, ran partprobe and tried # dmraid -ay again - success.
I should be able to replicate this in coming days to make an even more detailed report but as my pc is finally up and running after 3 days of beta madness I need to play some games :).
btw, hans, i modified the last two paragraphs of the original post in this bug to add some detail (from my original post in the dmraid master bug) so you may need to reread it if you haven't already.
The problem with wiping the first sector of the other then the 1st disks
which are part of the jobd, is that that sector might actually contain data.
Can you please do "dmsetup table" (with the jbod set active) and paste the output
here, then I can confirm (or deny) that the first sector could actually contain data.
raid0 = 2x 74GB Raptors
afcjhadap1 = /boot
afcjhadap2 = C:\
afcjhadap3 = LVM = root & swap
JBOD = 2x 750GB Seagate's
bfeafffep1 = /storage
bfeafffep2 = swap
[root@x64 ~]# dmraid -s
*** Active Set
name : nvidia_bfeafffe
size : 2930298112
stride : 128
type : linear
status : ok
devs : 2
spares : 0
*** Active Set
name : nvidia_afcjhada
size : 290451968
stride : 256
type : stripe
status : ok
devs : 2
spares : 0
[root@x64 ~]# dmsetup table
nvidia_bfeafffep1: 0 2926095102 linear 253:6 63
nvidia_afcjhada: 0 290451968 striped 2 256 8:0 0 8:16 0
nvidia_afcjhadap3: 0 54527904 linear 253:0 235911231
nvidia_afcjhadap2: 0 235386880 linear 253:0 524351
nvidia_afcjhadap1: 0 524288 linear 253:0 63
LVM-Swap: 0 14680064 linear 253:3 39846272
LVM-Root: 0 39845888 linear 253:3 384
nvidia_bfeafffe: 0 1465149166 linear 8:32 0
nvidia_bfeafffe: 1465149166 1465149166 linear 8:48 0
nvidia_bfeafffep2: 0 4192965 linear 253:6 2926095165
*** This bug has been marked as a duplicate of bug 494821 ***
Ok, the device map shows that the first sector of the second disk is part
of the jbod, so we cannot just clear it as it may contain data, we could
clear it when clearing the jbod partition table, but that will only fix
some issues. The basic problem here is that then initscripts scan mdraid
before dmraid. I've filed bug 494821 for this.