543749 – After upgrade from Fedora 11, RAID-1 mdraid assembles incorrectly

Bug 543749 - After upgrade from Fedora 11, RAID-1 mdraid assembles incorrectly

Summary: After upgrade from Fedora 11, RAID-1 mdraid assembles incorrectly

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	util-linux-ng
Sub Component:
Version:	12
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Karel Zak
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	578377 (view as bug list)
Depends On:
Blocks:	586368
TreeView+	depends on / blocked

Reported:	2009-12-03 00:38 UTC by H. Peter Anvin
Modified:	2013-01-13 13:19 UTC (History)
CC List:	12 users (show)
Fixed In Version:	util-linux-ng-2.17.2-10.fc13
Clone Of:
Clones:	586368 (view as bug list)
Environment:
Last Closed:	2010-05-04 12:20:42 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dracut debug output (226.66 KB, application/x-gzip) 2011-01-12 11:18 UTC, Alexey Kurnosov	no flags	Details
View All

Description H. Peter Anvin 2009-12-03 00:38:34 UTC

I upgraded a system with the following mdraid RAID-1 configuration from Fedora 11 to Fedora 12:

    md[012] -> sd[ab][123]

mdadm.conf had a listing of the devices and "DEVICE partitions".

After reboot, dracut/mdadm tried to assemble the RAIDs as follows:

    md2 -> sda sdb

   [succeeded with a zero offset, but with the device size of the
    original md2]

   md0 -> md2p1 + 1 missing
   md1 -> md2p2 + 1 missing

Needless to say, the resulting system was unusable.

I booted from a rescue disk and changed mdadm.conf to have:

DEVICE /dev/sd[ab][123]

and explicit "devices=/dev/sda1,/dev/sdb1" etc. lines on the partitions, followed by rebuilding the initramfs.

Now the system gives "No devices found in mdadm.conf found", and the system fails to boot.

Booting from the old Fedora 11 kernel and initramfs works.

Comment 1 Nate Clark 2009-12-10 18:01:12 UTC

I think I hit this bug as well and tracked the problem down to etc/udev/rules.d/65-md-incremental-imsm.rules. In there if it finds a block device that is not a partition that is detected to have a md raid signature all partitions are deleted. In my system I have 3 partitions on two drives and the individual partitions are raided together. I believe the whole drive is detected as a raid device because the last partition, which goes to the end of the device, is part of a raid.

I was able to get my machine to boot fine once I commented out these lines in the udev rules for the initramfs:

#ENV{DEVTYPE}!="partition", \
#    RUN+="/sbin/partx -d --nr 1-1024 $env{DEVNAME}"

Of course this won't work for people who do want the whole drive to be assembled as a raid device.

Comment 2 Harald Hoyer 2010-01-05 14:14:55 UTC

ah, so blkid should probably not report the whole disk as a raid device...

kzak, what do you think?

Comment 3 Nate Clark 2010-01-05 16:56:29 UTC

I don't think that will completely work. blkid or somebody will need to distinguish between a full device that is part of a raid group and the last partition on the device being part of a raid partition. The problem is that if the last partition goes all the way to the end of the device the offset for the raid data is at the same location as it would be if the entire device was part of a raid group.

You might be able to look at /etc/mdadm.conf and the DEVICE line since that is suppose to distinguish between whole devices and partitions. However I am not sure that this will work 100% of the time.

Comment 4 Karel Zak 2010-01-05 17:03:37 UTC

(In reply to comment #2)
> ah, so blkid should probably not report the whole disk as a raid device...
 
  Why? The blkid does not care about type of device and a whole disk could be used for filesystem or raid. We don't have a rule that all disks must be partitioned.

 The current concept is to parse partitions by kernel. It's udev responsibility to call blkid for a correct block device.

Comment 5 H. Peter Anvin, Intel 2010-01-09 00:50:41 UTC

blkid should be able to reject a RAID superblock which is too large to fit inside the block device; that of course would take care of testing the partition when the RAID superblock applies to the whole device.

At that point, sequencing becomes significant: if there appears to be a partition table on the device, one needs to examine the *partitions* for RAID superblocks before the main device is such examined.

Comment 6 Stijn Hoop 2010-01-09 19:13:14 UTC

I have the same problem. The failing array in my case is one RAID-5 array of 4 drives, 3 of which (detected as sdc, sdd, sdf) have a single 750Gb partition spanning the entire disk. The fourth (detected as sde) has been replaced already and is larger, 1.5TB.

For two of these four disks (sdc and sde), dracut generates both /dev/sdc as well as the partition entry /dev/sdc1. For the other two, only the disk entry is available.

This (rightly) causes mdadm to fail to start the array as only two of the four drives are available. I do have to specify DEVICE /dev/sd[c-f]1 in /etc/mdadm.conf but this has been so since Fedora 11 already.

The workaround from Nate Clark in comment #1 fixes things so that the correct device entries appear in /dev and mdadm is happy again.

If you need more information please let me know and I'll try to provide it. I do not have a serial console handy though.

Comment 7 Karel Zak 2010-01-11 15:00:14 UTC

log from IRC chat on #udev:

<kzak> kay: I though that we call blkid for partitions or non-partitioned disks only
<kay> kzak: why what?
<kzak> kay: because RAID signature on the last partition could be interpreted as a signature for whole disk

<kay> kzak: we don't bother for until now to check if there are partitions and skip the disk

<kzak> don't forget that the RAID signature is at end of the disk
<kay> kzak: oh, how can i forget that silly concept :)
<kay> kzak: it creates so many problems with all sorts of devices which report the wrong size

<kay> i know the problem. its not covered by the current stuff we do

<kzak> kay: so do you think it's udev problem, and it should be fixed (somehow) by more robust udev rules, right? :-)
<kay> kay: it's not simple. sometimes the part table is invalidly parsed by the kernel, like mdraid, and we want to detect the raid of the disk

<kay> kay: if there would be a simple solution, we would have fixed that already
<kay> kzak: for now, we just can not really support a raid sig at the last sector of the last partition
<kay> kzak: we need to find a way to distinguish the both

<kay> kzak: you want the raid on the disk discovered for all dmraid things regardless of the (wrongly) kernel parsed partitions

Comment 8 H. Peter Anvin, Intel 2010-01-11 17:49:13 UTC

Note that the real problem isn't the location of the superblock (superblock at the end is more or less standard for RAID, not just mdraid), but the fact that 0.90 superblocks don't contain their own offset and so don't actually know where the RAID starts.  They have the device size, which means you have a *minimum* size of the RAID; if partitions are examined before the whole device then that will usually rule out the partition for a whole-disk device since it won't fit.

Comment 9 Pim Zandbergen 2010-02-25 17:54:25 UTC

I've had exactly this problem and had previously fixed it using comment #1.

But the problem came back after a dracut and kernel update. In the past, I also had problems caused by the fact that both the whole disk and the partition are recognized as a RAID members.

So to end these problems for good, I repartitioned all drives one by one. My RAID member partitions were aligned to start at sector 2048 and used all available space. I shifted the partitions 1024 sectors to the start of the drive, leaving 1024 sectors unused at the end. I then zeroed these 1024 sectors of the drive in order to wipe the old superblock.

This works, and now I can update dracut and kernels without needing to resort to hacks.

Comment 10 Karel Zak 2010-04-22 19:58:33 UTC

I have fixed the problem in upstream git tree, commit:
http://git.kernel.org/?p=utils/util-linux-ng/util-linux-ng.git;a=commit;h=c81e70087cfebc299bdfbbd0675958483fc8a768

I'm going to backport the change to F-13/RHEL6 next week.

Comment 11 Karel Zak 2010-04-29 08:01:26 UTC

*** Bug 578377 has been marked as a duplicate of this bug. ***

Comment 12 Rich Rauenzahn 2010-05-03 23:15:52 UTC

Are you going to also be able to backport it to fc12?  This is now causing my raid array to come up in an degraded state, and when I add the drives, it goes into recovery mode for hours.

Comment 13 Fedora Update System 2010-05-04 09:17:32 UTC

util-linux-ng-2.17.2-4.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/util-linux-ng-2.17.2-4.fc13

Comment 14 Karel Zak 2010-05-04 12:20:42 UTC

(In reply to comment #12)
> Are you going to also be able to backport it to fc12?  

Unfortunately, it's very difficult. The libblkid in F-12 does not
contain partition tables parser.

I think it will be better (and more safe) to use util-linux-ng from
F-13 rather than create a package with some ugly hack which is
specific to F-12 only.

The original F-13 packages are at:
http://koji.fedoraproject.org/koji/buildinfo?buildID=171141
(or later on F-13 mirrors).


I have created a repo with the same packages for F-12 (rebuild by koji for F-12), you need to add something like:

  [blkidfix]
  name=F-12 libblkid fix
  failovermethod=priority
  baseurl=http://fedorapeople.org/~kzak/543749/$basearch/
  enabled=1
  gpgcheck=0

to your /etc/yum.repo.d/.

or you can wait (few weeks) for the final F-13 and then update to the
official F-13 packages.

Comment 15 Rich Rauenzahn 2010-05-04 17:49:36 UTC

Thank you for making that available.  I will try it out.

Comment 16 Fedora Update System 2010-05-10 23:52:23 UTC

util-linux-ng-2.17.2-4.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 17 Rich Rauenzahn 2010-05-25 14:34:40 UTC

I seem to be still be having problems with this package on FC12.  (I'm using the fc13 version above).

I still have to manually partprobe the partitions after boot, and now I am unable to reassemble my raid after doing so:

mdadm --re-add /dev/md1 /dev/sde1
mdadm: Cannot open /dev/sde1: Device or resource busy
mdadm --re-add /dev/md1 /dev/sdg1
mdadm: Cannot open /dev/sdg1: Device or resource busy
mdadm --re-add /dev/md1 /dev/sdc1
mdadm: --re-add for /dev/sdc1 to /dev/md1 is not possible
mdadm --re-add /dev/md1 /dev/sdd1
mdadm: --re-add for /dev/sdd1 to /dev/md1 is not possible
mdadm: failed to run array /dev/md1: Input/output error

I'll have to research later why I get the mdadm 'is not possible' errors, but essentially my drives partitioned with a RAID partition that are then combined into a RAID6 array with an LVM on top are not visible at boot.  partprobe brings them back.



# fdisk /dev/sde

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').

Command (m for help): p

Disk /dev/sde: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1       60801   488384001   fd  Linux raid autodetect

Comment 18 Alexey Kurnosov 2011-01-12 11:18:05 UTC

Created attachment 473000 [details]
dracut debug output

The bug still present.

# uname -a
Linux unsen.q53.spb.ru 2.6.34.7-63.fc13.x86_64 #1 SMP Fri Dec 3 12:38:46 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
# rpm -q fedora-release
fedora-release-13-1.noarch
# blkid 
/dev/mapper/System-Swap: UUID="2e3a815a-380f-4546-adf3-fab9904791f4" TYPE="crypto_LUKS" 
/dev/md0: UUID="27af0985-50eb-4b1a-bc45-0e4277f955fc" TYPE="ext2" 
/dev/mapper/System-LogVol00: UUID="5dbaa664-d5a8-434a-bff7-acda6181f1bb" TYPE="ext3" 
/dev/mapper/System-Root: UUID="b8a0db8a-769b-44f6-b012-604998a377e5" TYPE="ext3" 
/dev/System/Root: UUID="b8a0db8a-769b-44f6-b012-604998a377e5" TYPE="ext3" 
/dev/sda1: UUID="74a20bbd-f444-4513-e375-297d79f00f81" TYPE="linux_raid_member" 
/dev/sda2: UUID="086a15fc-beff-01a6-dea9-f59a8ffe0f3b" TYPE="linux_raid_member" 
/dev/sda3: UUID="f7ad3e2f-5504-1db2-7f61-6b67c93bfa90" TYPE="linux_raid_member" 
/dev/sdb1: UUID="74a20bbd-f444-4513-e375-297d79f00f81" TYPE="linux_raid_member" 
/dev/sdb2: UUID="086a15fc-beff-01a6-dea9-f59a8ffe0f3b" TYPE="linux_raid_member" 
/dev/sdb3: UUID="f7ad3e2f-5504-1db2-7f61-6b67c93bfa90" TYPE="linux_raid_member" 
/dev/sdd1: UUID="086a15fc-beff-01a6-dea9-f59a8ffe0f3b" TYPE="linux_raid_member" 
/dev/sdd2: UUID="f7ad3e2f-5504-1db2-7f61-6b67c93bfa90" TYPE="linux_raid_member" 
/dev/sde1: UUID="086a15fc-beff-01a6-dea9-f59a8ffe0f3b" TYPE="linux_raid_member" 
/dev/sde2: UUID="f7ad3e2f-5504-1db2-7f61-6b67c93bfa90" TYPE="linux_raid_member" 
/dev/sdc1: UUID="086a15fc-beff-01a6-dea9-f59a8ffe0f3b" TYPE="linux_raid_member" 
/dev/sdc2: UUID="f7ad3e2f-5504-1db2-7f61-6b67c93bfa90" TYPE="linux_raid_member" 
/dev/md1: UUID="nBSyeY-9QH0-ddBs-iPJ0-DdfM-fXsm-UQANy6" TYPE="LVM2_member" 
/dev/md2: UUID="M4SLwF-rs0M-iR17-nwiv-tHck-R0AG-fEgZvk" TYPE="LVM2_member" 
/dev/mapper/Storage-Opt: UUID="0121552c-9192-43b3-808e-8444546dcdc7" TYPE="crypto_LUKS" 
/dev/mapper/Storage-Test: UUID="8ad3728e-14ca-41c2-bd1e-0930e2e8a73f" UUID_SUB="03920518-8c03-4804-8f4f-4037b1c3d487" TYPE="btrfs" 
/dev/mapper/luks-2e3a815a-380f-4546-adf3-fab9904791f4: UUID="11a51fb8-5699-474f-ba85-15a75bfcc56c" TYPE="swap" 
/dev/mapper/Opt: UUID="5d41db3c-3f9a-4b88-9242-4dc1c3af71c6" TYPE="ext4" 
# cat /proc/mdstat 
Personalities : [raid1] [raid6] [raid5] [raid4] 
md2 : active raid5 sda3[0] sde2[4] sdd2[3] sdc2[2] sdb3[1]
      3865270784 blocks level 5, 32k chunk, algorithm 2 [5/5] [UUUUU]
      bitmap: 3/231 pages [12KB], 2048KB chunk

md1 : active raid5 sda2[0] sde1[4](S) sdd1[3] sdc1[2] sdb2[1]
      30723840 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/157 pages [0KB], 32KB chunk

md0 : active raid1 sda1[0] sdb1[1]
      200704 blocks [2/2] [UU]
      bitmap: 0/25 pages [0KB], 4KB chunk

unused devices: <none>
# rpm -q util-linux-ng
util-linux-ng-2.17.2-9.fc13.x86_64

Comment 19 Karel Zak 2011-01-12 13:05:50 UTC

(In reply to comment #18)
> Created attachment 473000 [details]
> dracut debug output
> 
> The bug still present.

 Could you be more verbose, please? 

If there is a difference between real on-disk(s) layout and blkid output then I need information about disks layout (fdisk -l or so) too.

Comment 20 Karel Zak 2011-01-12 13:39:52 UTC

Ah.. I found a small bug in the RAID detection. The bug was fixed in upstream code (commit a09f0933bb80c52ec1bc30a1678cef7e999aeff9), but not backported to Fedora.

Comment 21 Fedora Update System 2011-01-12 13:51:39 UTC

util-linux-ng-2.18-4.7.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/util-linux-ng-2.18-4.7.fc14

Comment 22 Fedora Update System 2011-01-12 14:21:52 UTC

util-linux-ng-2.17.2-10.fc13 has been submitted as an update for Fedora 13.
https://admin.fedoraproject.org/updates/util-linux-ng-2.17.2-10.fc13

Comment 23 Rich Rauenzahn 2011-01-13 05:42:11 UTC

Looking at my personal setup at sometime while trying to figure out why this was happening, I may have found the following - not completely sure, though, but others having this issue should check for this:

I think I may have had mdadm headers at the beginning of /dev/sdX AND dev/sdX1.

My setup is supposed to be a vanilla disk /dev/sdX with a partition /dev/sdX1 that is RAIDED.  I think I fixed this by wiping out the mdadm header at the beginning of /dev/sdX.

Sorry for not updating the bug when I did this... instead we have to rely on my weak memory :)

Comment 24 Fedora Update System 2011-01-20 19:56:50 UTC

util-linux-ng-2.18-4.7.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 25 Alexey Kurnosov 2011-02-05 08:40:50 UTC

I am glad my note helped to fix missed bug, but in my case there was another problem.
The whole disks had old not used Intel matrix superblocks. Looks like blkid report it as used and skip partitions RAID. So when the superblocks was erased a booting resumed.
I believe this is a bug (other side of the bug).

Comment 26 Alexey Kurnosov 2011-02-11 15:59:27 UTC

As far as new version of util-linux was released should we expect  correction of odd behavior there?

Comment 27 Karel Zak 2011-02-12 09:15:44 UTC

(In reply to comment #25)
> The whole disks had old not used Intel matrix superblocks. Looks like blkid
> report it as used and skip partitions RAID. So when the superblocks was erased
> a booting resumed.

You have to keep your devices clean and without unused (obsolete) superblocks. There is a new command wipefs(8) that is able remove unwanted things from your devices. 

It's impossible to detect that the superblock is not used. For filesystem superblocks we detect for all superblock on the device and if more superblock is detected then "ambivalent probing result" is reported. This is not used for RAIDs (the first detected RAID is reported), because it's very unusual situation and we want to keep superblocks detections as fast as possible.

Comment 28 Fedora Update System 2011-04-14 20:55:30 UTC

util-linux-ng-2.17.2-10.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.