Description of problem: After upgrading to FC12, one of my volume groups doesn't activate after boot. Looking backwards, the md device doesn't activate..looking further backwards, it appears that some of the sd drive's don't get scanned properly -- no partitions are picked up. If I add this to rc.local, then the system comes up as I want it: ---------------8<------------- #!/bin/sh dmesg > /tmp/boot.log cat /proc/mdstat >> /tmp/boot.log vgdisplay -v VolGroupRAID >> /tmp/boot.log ll /dev/sd* >> /tmp/boot.log cat /tmp/boot.log | mailx -s vgraid\ scan rich partprobe > /dev/null 2>&1 for i in `blkid | grep UUID=\"23a8685c-42e8-7d52-3a53-d40b51cce092\" | perl -ne'm{^(/dev/sd[a-z]1)} && print "$1\n"'` do mdadm --re-add /dev/md1 $i done mdadm -R /dev/md1 vgchange -a y VolGroupRAID service autofs restart service backuppc start ---------------8<------------- It also seems odd -- or maybe related -- that FC doesn't consistenly give the drives the same letter anymore at bootup. This doesn't cause any problems directly as I'm using labels and volume groups-- but I wonder if it is related. Version-Release number of selected component (if applicable): Source RPM: kernel-2.6.32.10-90.fc12.src.rpm How reproducible: everytime Steps to Reproduce: 1. 2. 3. Actual results: md1 doesn't come online Expected results: md1 should be activated during boot. Additional info: attaching output from rc.local script...
Created attachment 403658 [details] boot log and misc output before and after rc.local output of dmesg, ls /dev/sd*, mdstat and vgdisplay during boot before rc.local workaround and after the workaround
Minor nit/fix to my rc.local workaround -- I replaced ll with ls -l
Can you try to get debug messages from udev by adding 'rdudevdebug' to the kernel boot options? Are the sgX devices present when the partitions are missing? E.g. when you see: sd 4:0:0:0: Attached scsi generic sg4 type 0 sd 4:0:0:0: [sde] Write Protect is off is /dev/sg4 present even when /dev/sde1 is missing? Can you create the partitions manually when they're missing and then access them? # /sbin/MAKEDEV -x /dev/sde1
I can get the debug messages -- but how do I capture them? I gave up after about 15 minutes of text scrolling across the boot because I'm fairly sure they weren't being logged.... I can create (scan) them manually with partprobe, and then they are there. See my rc.local script above -- I'm doing this manually now at the end of the boot process. Yes, to sg4: [rrauenza@tendo ~]$ dmesg | grep sg4 sd 4:0:0:0: Attached scsi generic sg4 type 0 [rrauenza@tendo ~]$ [rrauenza@tendo ~]$ ll /dev/sd* brw-rw---- 1 root disk 8, 0 2010-04-21 20:08 /dev/sda brw-rw---- 1 root disk 8, 16 2010-04-21 20:08 /dev/sdb brw-rw---- 1 root disk 8, 17 2010-04-21 20:08 /dev/sdb1 brw-rw---- 1 root disk 8, 32 2010-04-21 20:08 /dev/sdc brw-rw---- 1 root disk 8, 33 2010-04-21 20:08 /dev/sdc1 brw-rw---- 1 root disk 8, 34 2010-04-21 20:08 /dev/sdc2 brw-rw---- 1 root disk 8, 35 2010-04-21 20:08 /dev/sdc3 brw-rw---- 1 root disk 8, 48 2010-04-21 20:08 /dev/sdd brw-rw---- 1 root disk 8, 49 2010-04-21 20:08 /dev/sdd1 brw-rw---- 1 root disk 8, 50 2010-04-21 20:08 /dev/sdd2 brw-rw---- 1 root disk 8, 51 2010-04-21 20:08 /dev/sdd3 brw-rw---- 1 root disk 8, 64 2010-04-21 20:08 /dev/sde brw-rw---- 1 root disk 8, 80 2010-04-21 20:08 /dev/sdf brw-rw---- 1 root disk 8, 81 2010-04-21 20:08 /dev/sdf1 brw-rw---- 1 root disk 8, 96 2010-04-21 20:08 /dev/sdg brw-rw---- 1 root disk 8, 97 2010-04-21 20:08 /dev/sdg1 [rrauenza@tendo ~]$ [root@tendo ~]# /sbin/MAKEDEV -x /dev/sde1 [root@tendo ~]# ll /dev/sd* brw-rw---- 1 root disk 8, 0 2010-04-21 20:08 /dev/sda brw-rw---- 1 root disk 8, 16 2010-04-21 20:08 /dev/sdb brw-rw---- 1 root disk 8, 17 2010-04-21 20:08 /dev/sdb1 brw-rw---- 1 root disk 8, 32 2010-04-21 20:08 /dev/sdc brw-rw---- 1 root disk 8, 33 2010-04-21 20:08 /dev/sdc1 brw-rw---- 1 root disk 8, 34 2010-04-21 20:08 /dev/sdc2 brw-rw---- 1 root disk 8, 35 2010-04-21 20:08 /dev/sdc3 brw-rw---- 1 root disk 8, 48 2010-04-21 20:08 /dev/sdd brw-rw---- 1 root disk 8, 49 2010-04-21 20:08 /dev/sdd1 brw-rw---- 1 root disk 8, 50 2010-04-21 20:08 /dev/sdd2 brw-rw---- 1 root disk 8, 51 2010-04-21 20:08 /dev/sdd3 brw-rw---- 1 root disk 8, 64 2010-04-21 20:08 /dev/sde brw-r----- 1 root disk 8, 65 2010-04-21 20:12 /dev/sde1 <======== brw-rw---- 1 root disk 8, 80 2010-04-21 20:08 /dev/sdf brw-rw---- 1 root disk 8, 81 2010-04-21 20:08 /dev/sdf1 brw-rw---- 1 root disk 8, 96 2010-04-21 20:08 /dev/sdg brw-rw---- 1 root disk 8, 97 2010-04-21 20:08 /dev/sdg1 [root@tendo ~]# sda is still missing its partitions. Doing a partprobe... [root@tendo ~]# partprobe [I think the warnings are fine to ignore... -- Rich] Warning: The kernel was unable to re-read the partition table on /dev/sdb (Device or resource busy). This means Linux won't know anything about the modifications you made until you reboot. You should reboot your computer before doing anything with /dev/sdb. Warning: The kernel was unable to re-read the partition table on /dev/sdc (Device or resource busy). This means Linux won't know anything about the modifications you made until you reboot. You should reboot your computer before doing anything with /dev/sdc. Warning: The kernel was unable to re-read the partition table on /dev/sdd (Device or resource busy). This means Linux won't know anything about the modifications you made until you reboot. You should reboot your computer before doing anything with /dev/sdd. Warning: The kernel was unable to re-read the partition table on /dev/sdf (Device or resource busy). This means Linux won't know anything about the modifications you made until you reboot. You should reboot your computer before doing anything with /dev/sdf. Warning: The kernel was unable to re-read the partition table on /dev/sdg (Device or resource busy). This means Linux won't know anything about the modifications you made until you reboot. You should reboot your computer before doing anything with /dev/sdg. [root@tendo ~]# ll /dev/sd* brw-rw---- 1 root disk 8, 0 2010-04-21 20:08 /dev/sda brw-rw---- 1 root disk 8, 1 2010-04-21 20:13 /dev/sda1 <============== brw-rw---- 1 root disk 8, 16 2010-04-21 20:08 /dev/sdb brw-rw---- 1 root disk 8, 17 2010-04-21 20:08 /dev/sdb1 brw-rw---- 1 root disk 8, 32 2010-04-21 20:08 /dev/sdc brw-rw---- 1 root disk 8, 33 2010-04-21 20:08 /dev/sdc1 brw-rw---- 1 root disk 8, 34 2010-04-21 20:08 /dev/sdc2 brw-rw---- 1 root disk 8, 35 2010-04-21 20:08 /dev/sdc3 brw-rw---- 1 root disk 8, 48 2010-04-21 20:08 /dev/sdd brw-rw---- 1 root disk 8, 49 2010-04-21 20:08 /dev/sdd1 brw-rw---- 1 root disk 8, 50 2010-04-21 20:08 /dev/sdd2 brw-rw---- 1 root disk 8, 51 2010-04-21 20:08 /dev/sdd3 brw-rw---- 1 root disk 8, 64 2010-04-21 20:08 /dev/sde brw-rw---- 1 root disk 8, 65 2010-04-21 20:12 /dev/sde1 brw-rw---- 1 root disk 8, 80 2010-04-21 20:08 /dev/sdf brw-rw---- 1 root disk 8, 81 2010-04-21 20:08 /dev/sdf1 brw-rw---- 1 root disk 8, 96 2010-04-21 20:08 /dev/sdg brw-rw---- 1 root disk 8, 97 2010-04-21 20:08 /dev/sdg1 [root@tendo ~]# Unrelated, or not, my disks are also assigned different drive letters than they were in FC11, and it appears to change slightly across boots as well. Can I rerun udev in debug after a boot to see why it isn't picking them up?
Haha :) funny output in the log :) sde: sdc1 sdc2 sdc3 sd 2:0:1:0: [sdd] 234441648 512-byte logical blocks: (120 GB/111 GiB) sd 2:0:1:0: [sdd] Write Protect is off sd 2:0:1:0: [sdd] Mode Sense: 00 3a 00 00 sd 2:0:1:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdd: sde1 sd 4:0:0:0: [sde] Attached SCSI disk sdb1 sd 1:0:0:0: [sdb] Attached SCSI disk sdd1 sdd2 sdd3 ..... ok, now to the real problem: /dev/sda might be recognized to be part of a raid as a disk (the whole disk, without partitions). Thus all partitions are removed. Please provide the output of: # blkid -o udev -p /dev/sda # blkid -o udev -p /dev/sde ID_FS_TYPE should not be "linux_raid_member" or "isw_raid_member". But in your case, I suspect it is.
I've done them all -- %%%%%%%%%%%%%%%%%%%%%%%% /dev/sda %%%%%%%%%%%%%%%%%%%%%%%% /dev/sdb %%%%%%%%%%%%%%%%%%%%%%%% /dev/sdc ID_FS_VERSION=0.90.0 ID_FS_UUID=16117fc1-adc8-2110-37a6-da05623d0241 ID_FS_UUID_ENC=16117fc1-adc8-2110-37a6-da05623d0241 ID_FS_TYPE=linux_raid_member ID_FS_USAGE=raid %%%%%%%%%%%%%%%%%%%%%%%% /dev/sdd %%%%%%%%%%%%%%%%%%%%%%%% /dev/sde ID_FS_VERSION=0.90.0 ID_FS_UUID=16117fc1-adc8-2110-37a6-da05623d0241 ID_FS_UUID_ENC=16117fc1-adc8-2110-37a6-da05623d0241 ID_FS_TYPE=linux_raid_member ID_FS_USAGE=raid %%%%%%%%%%%%%%%%%%%%%%%% /dev/sdf ID_FS_UUID=zH6jW8-19fR-rPNz-069z-IhmZ-fKoh-vyqFnB ID_FS_UUID_ENC=zH6jW8-19fR-rPNz-069z-IhmZ-fKoh-vyqFnB ID_FS_VERSION=LVM2\x20001 ID_FS_TYPE=LVM2_member ID_FS_USAGE=raid %%%%%%%%%%%%%%%%%%%%%%%% /dev/sdg %%%%%%%%%%%%%%%%%%%%%%%% /dev/sdh The drives/partitions that are actually raided are.. Personalities : [raid1] [raid6] [raid5] [raid4] md1 : active raid6 sdc1[0] sdh1[4] sdf1[3] sde1[2] sdd1[1] 1465150464 blocks super 1.1 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU] md0 : active raid1 sda3[0] sdb3[1] 116430976 blocks [2/2] [UU] unused devices: <none> Here are the partition tables: Disk /dev/sda: 120.0 GB, 120033041920 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x000cf695 Device Boot Start End Blocks Id System /dev/sda1 * 1 33 265041 83 Linux /dev/sda2 34 98 522112+ 82 Linux swap / Solaris /dev/sda3 99 14593 116431087+ fd Linux raid autodetect Disk /dev/sdb: 120.0 GB, 120034123776 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x0007e0d2 Device Boot Start End Blocks Id System /dev/sdb1 * 1 33 265041 83 Linux /dev/sdb2 34 98 522112+ 82 Linux swap / Solaris /dev/sdb3 99 14593 116431087+ fd Linux raid autodetect Disk /dev/sdc: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdc1 1 60801 488384001 fd Linux raid autodetect Disk /dev/sdd: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xa66accf0 Device Boot Start End Blocks Id System /dev/sdd1 1 60801 488384001 fd Linux raid autodetect Disk /dev/sde: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sde1 1 60801 488384001 fd Linux raid autodetect Disk /dev/sdf: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdf1 1 60801 488384001 fd Linux raid autodetect Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x9e49d4c1 Device Boot Start End Blocks Id System /dev/sdg1 1 121601 976760001 8e Linux LVM Disk /dev/sdh: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xa194b1dc Device Boot Start End Blocks Id System /dev/sdh1 1 60801 488384001 fd Linux raid autodetect So, yes, I think you are right -- the drive is listed as a raid member instead of the partition. As an aside -- Isn't that the right way to do RAID on Linux, is make a partition and assign it a partition type? It seemed the safest to keep something else with mucking with it. Or is whole disk the recommended way to do it now? I guess I could have done whole disk RAID with a whole disk LVM on top.
personally, I would have done the raid with the whole disk... ok, reassigning to util-linux-ng... I think Karel already has a fix for blkid.
*** This bug has been marked as a duplicate of bug 543749 ***