Bug 545689

Summary: dracut starts volume group when not all PV's are present
Product: [Fedora] Fedora Reporter: Ed Lally <fedora>
Component: dracutAssignee: Harald Hoyer <harald>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 12CC: harald, hdegoede, jburke, vanmeeuwen+fedora
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-17 08:29:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
anaconda.log from "install" startup
none
program.log from "install" startup
none
storage.log from "install" startup
none
syslog from "install" startup
none
anaconda.log from "rescue" startup
none
program.log from "rescue" startup
none
storage.log from "rescue" startup
none
syslog from "rescue" startup
none
mdadm.conf
none
output from "mdadm --detail"
none
output from "mdadm --examine" none

Description Ed Lally 2009-12-09 05:25:13 UTC
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 GTB6 (.NET CLR 3.5.30729)

I upgraded from F10 to F12 using preupgrade. I ran into an issue with /boot being too small and used the workaround documented in the Common Bugs page.  The upgrade itself completed with no errors, but I'm unable to boot afterward.

Symptoms:
   1. Grub starts, the initramfs loads, and the system begins to boot.
   2. After a few seconds I get error messages for buffer i/o errors on low-numbered blocks (usually 0-9) on certain dm devices.
   3. An error appears from device-mapper that it couldn't read an LVM snapshot's metadata.
   4. I get the message to press "I" for interactive startup.
   5. UDEV loads and the system tries to mount all filesystems.
   6. Errors appear stating that it couldn't mount various LVM partitions and LVM refuses to mount partial partitions.
   7. Startup fails due to the mount failure, the system reboots, and the steps repeat.


Reproducible: Didn't try




Troubleshooting done:

    * I have tried to run preupgrade again (the entry is still in my grub.conf file). The upgrade environment boots, but it fails to find the LVM devices and starts me on the panel to name my machine just like for a fresh install.
    * I also tried booting from the full install DVD, but I get the same effect.
    * Suspecting that the XFS drivers weren't being included, I have run dracut to create a new initramfs, making sure the XFS module was included.
    * I have loaded the preupgrade environment and stopped at the initial GUI splash screen to get to a shell prompt. From there I can successfully assemble the raid arrays, activate the volume group, and mount all volumes -- all my data is still intact (yay!).
    * I've run lvdisplay to check the LVM volumes, and most (all?) appear to have different UUIDs than what was in /etc/fstab before the upgrade -- not sure if preupgrade or a new LVM release somehow changed the UUIDs.  /etc/fstab still had the old UUIDs
    * I have modified my root partition's /etc/fstab to try calling the LVM volumes by name instead of UUID, but the problem persists (I also make sure to update the initramfs as well).
    * From the device-mapper and I/O errors above, I suspect that either RAID or LVM aren't starting up properly, especially since this has been a problem in prior releases


System details:

    * 4 500GB drives set up in two RAID 1 pairs using software RAID
    * Gigabyte P35 motherboard with Intel ICH9R SATA controller in AHCI mode (motherboard RAID functionality disabled)
    * FAT32 /boot partition RAIDed on the first drive pair as /dev/md0
    * Two LVM partitions -- one RAIDed on the second drive pair and one on the remainder of the first drive pair as /dev/md1 and /dev/md2
    * Each LVM partition is a separate physical volume in a single volume group
    * Root and other filesystems are in LVM; most (including /) are formatted in XFS.

Smolt record: http://smolt.fedoraproject.org/show?uuid=pub_0ce0aded-8647-4ccb-89bb-f8d52109ea77

Comment 1 Ed Lally 2009-12-09 05:30:15 UTC
I've made some progress in diagnosing the issue. The failure is happening because the third RAID array (md2) isn't being assembled at startup. That array contains the second physical volume in the LVM volume group, so if it doesn't start then several mount points can't be found and others are not mounted because LVM detects they're only partially available.

The RAID array is listed in my /etc/mdadm.conf file as /dev/md2 and identified there with its UUID but the Fedora 12 installer won't detect it by default even though it detects the md0 and md1 RAID devices. Booting the DVD in rescue mode does allow the filesystems to be detected and mounted, but the RAID device is set to be /dev/md127 instead of /dev/md2.  /dev/md127 is using the correct UUID, so it appears part or all of the /etc/mdadm.conf file is being ignored.

Comment 2 Hans de Goede 2009-12-09 08:36:24 UTC
Ed,

It seems there are 2 issues here:

1) Your system no longer boots properly
2) Anaconda can no longer find your Fedora installation to upgrade it.

As these 2 are likely related lets start with debugging 2, can you please
start F-12 anaconda again on your system and then once it asks for your hostname,
switch to the shell on tty2 and from there collect (use scp for example) all
the log files under /tmp and attach them here,

Did I understand you correctly that when using rescue mode anaconda does find
and mount your Fedora installation for you ?

In that case can you also boot into rescue mode and once more collect the
log files under /tmp ?

Also a copy of your /etc/mdadm.conf would be good to have, and can you
(after manually assembling it if needed), do:
mdadm --detail /dev/md2

And also on the md2 members run:
mdadm --examine /dev/sdx#

And also paste / attach the output of these commands here.


Thanks,

Hans

Comment 3 Ed Lally 2009-12-09 22:29:29 UTC
Created attachment 377320 [details]
anaconda.log from "install" startup

Comment 4 Ed Lally 2009-12-09 22:29:56 UTC
Created attachment 377321 [details]
program.log from "install" startup

Comment 5 Ed Lally 2009-12-09 22:30:20 UTC
Created attachment 377322 [details]
storage.log from "install" startup

Comment 6 Ed Lally 2009-12-09 22:30:42 UTC
Created attachment 377323 [details]
syslog from "install" startup

Comment 7 Ed Lally 2009-12-09 22:31:09 UTC
Created attachment 377324 [details]
anaconda.log from "rescue" startup

Comment 8 Ed Lally 2009-12-09 22:31:34 UTC
Created attachment 377325 [details]
program.log from "rescue" startup

Comment 9 Ed Lally 2009-12-09 22:31:57 UTC
Created attachment 377326 [details]
storage.log from "rescue" startup

Comment 10 Ed Lally 2009-12-09 22:32:15 UTC
Created attachment 377327 [details]
syslog from "rescue" startup

Comment 11 Ed Lally 2009-12-09 22:32:45 UTC
Created attachment 377328 [details]
mdadm.conf

Comment 12 Ed Lally 2009-12-09 22:33:15 UTC
Created attachment 377329 [details]
output from "mdadm --detail"

Comment 13 Ed Lally 2009-12-09 22:33:40 UTC
Created attachment 377330 [details]
output from "mdadm --examine"

Comment 14 Ed Lally 2009-12-09 22:34:57 UTC
Hi Hans,

I've collected the logs and attached them.  Please let me know if you need anything else.  Thanks for your help!

Regards,

Ed

Comment 15 Hans de Goede 2009-12-10 16:28:16 UTC
Ed,

Many thanks for all the logs, but I'm afraid I'm still clueless.

Can you please also run:
mdadm --examine --brief /dev/sdc1 /dev/sdd1

And also run
mdadm --examine /dev/md0-members
mdadm --examine --brief /dev/mdd0-members
mdadm --detail /dev/md0

And maybe also the same for md1 ?

Thanks,

Hans

Comment 16 Ed Lally 2009-12-11 01:09:01 UTC
Hans,

Here's the info you requested.  Thanks again for your help.

# mdadm --examine --brief /dev/sdc1 /dev/sdd1
ARRAY /dev/md2 UUID=3f687406:af897037:e8d8f881:cd575d16

#
# /dev/md0
#
# mdadm --examine /dev/sda2 /dev/sdb2
/dev/sda2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 978ff628:b4110833:64237072:aa66548e
  Creation Time : Sat Sep 15 21:10:30 2007
     Raid Level : raid1
  Used Dev Size : 192704 (188.22 MiB 197.33 MB)
     Array Size : 192704 (188.22 MiB 197.33 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 125

    Update Time : Thu Dec 10 19:41:02 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 95669d71 - correct
         Events : 16


      Number   Major   Minor   RaidDevice State
this     0       8        2        0      active sync   /dev/sda2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
/dev/sdb2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 978ff628:b4110833:64237072:aa66548e
  Creation Time : Sat Sep 15 21:10:30 2007
     Raid Level : raid1
  Used Dev Size : 192704 (188.22 MiB 197.33 MB)
     Array Size : 192704 (188.22 MiB 197.33 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 125

    Update Time : Thu Dec 10 19:41:02 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 95669d83 - correct
         Events : 16


      Number   Major   Minor   RaidDevice State
this     1       8       18        1      active sync   /dev/sdb2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2

# mdadm --examine --brief /dev/sda2 /dev/sdb2
ARRAY /dev/md125 UUID=978ff628:b4110833:64237072:aa66548e

# mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Sat Sep 15 21:10:30 2007
     Raid Level : raid1
     Array Size : 192704 (188.22 MiB 197.33 MB)
  Used Dev Size : 192704 (188.22 MiB 197.33 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Dec 10 19:41:02 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 978ff628:b4110833:64237072:aa66548e
         Events : 0.16

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2

# mdadm --examine --brief /dev/sda3 /dev/sdb3
ARRAY /dev/md1 UUID=16676b83:d2a5b88b:82545cae:2f617233

#
# /dev/md1
#
# mdadm --examine /dev/sda3 /dev/sdb3
/dev/sda3:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 16676b83:d2a5b88b:82545cae:2f617233
  Creation Time : Sun Sep 16 03:34:30 2007
     Raid Level : raid1
  Used Dev Size : 467218304 (445.57 GiB 478.43 GB)
     Array Size : 467218304 (445.57 GiB 478.43 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

    Update Time : Thu Dec 10 19:18:25 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : f1d5630e - correct
         Events : 408


      Number   Major   Minor   RaidDevice State
this     0       8        3        0      active sync   /dev/sda3

   0     0       8        3        0      active sync   /dev/sda3
   1     1       8       19        1      active sync   /dev/sdb3
/dev/sdb3:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 16676b83:d2a5b88b:82545cae:2f617233
  Creation Time : Sun Sep 16 03:34:30 2007
     Raid Level : raid1
  Used Dev Size : 467218304 (445.57 GiB 478.43 GB)
     Array Size : 467218304 (445.57 GiB 478.43 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

    Update Time : Thu Dec 10 19:18:25 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : f1d56320 - correct
         Events : 408


      Number   Major   Minor   RaidDevice State
this     1       8       19        1      active sync   /dev/sdb3

   0     0       8        3        0      active sync   /dev/sda3
   1     1       8       19        1      active sync   /dev/sdb3

# mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Sun Sep 16 03:34:30 2007
     Raid Level : raid1
     Array Size : 467218304 (445.57 GiB 478.43 GB)
  Used Dev Size : 467218304 (445.57 GiB 478.43 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Thu Dec 10 19:18:25 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 16676b83:d2a5b88b:82545cae:2f617233
         Events : 0.408

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3

Comment 17 Ed Lally 2009-12-11 01:20:09 UTC
Hans,

I ran a diff on the log files and in install_storage.log, there's an info message:

[2009-12-09 16:57:42,898]     INFO: product Fedora version 12 found on vgraid-raidroot_xfs is not upgradable

So apparently anaconda is detecting the RAID arrays and LVM disks, but it's deciding that it can't upgrade them. I assume this is why it closes the RAID/LVM volumes and just displays the raw partitions.

Is there a way to tell why it made that decision?

Thanks,

Ed

Comment 18 Ed Lally 2009-12-11 04:18:01 UTC
Hans,

I was thinking about the observed behavior some more.  Could the system be trying to start LVM as soon as /dev/md1 is brought up, rather than waiting for /dev/md2?  That could explain why we don't see the second array being started when booting from the hard drive.  It still wouldn't explain why upgrade mode is treating it as a new install, though.

Thanks,

Ed

Comment 19 Hans de Goede 2009-12-11 08:05:15 UTC
Hi,

(In reply to comment #17)
> Hans,
> 
> I ran a diff on the log files and in install_storage.log, there's an info
> message:
> 
> [2009-12-09 16:57:42,898]     INFO: product Fedora version 12 found on
> vgraid-raidroot_xfs is not upgradable
> 
> So apparently anaconda is detecting the RAID arrays and LVM disks, but it's
> deciding that it can't upgrade them. I assume this is why it closes the
> RAID/LVM volumes and just displays the raw partitions.
> 
> Is there a way to tell why it made that decision?
> 

Because you are using the Fedora 12 installer and 12 > 12 does not hold true.

When you say:
> So apparently anaconda is detecting the RAID arrays and LVM disks, but it's
> deciding that it can't upgrade them. I assume this is why it closes the
> RAID/LVM volumes and just displays the raw partitions.

You mean that it only shows the raw disks at the screen where you can select if you want to use entire disks / remove existing linux / use free space / custom ?

That is normal, it only shows disks there, when you choose custom partitioning, then it should show all your existing lvm and raid stuff, when you choose remove all or remove linux, pre-existing linux software raid and lvm will get removed.


As for the system not booting properly issue, that is a real problem. Which could be caused either by mdadm, initrd or initscripts issues.

Can you try booting with init=/bin/bash, this should give you a bash shell directly after the initrd is loaded, and then:

0.1) mount /proc, /sys, etc.
0.2)
touch /dev/.in_sysinit
0.3) 
/sbin/start_udev

1) Do cat /proc/mdstat (to see what the initrd has done)

2) Run:
  /sbin/mdadm -As --auto=yes --run

3) Another cat /proc/mdstat

And paste the 2 /proc/mdstat outputs here ?

Also when the fsck fails you should be able to get a shell, it would be interesting to get a /proc/mdstat from there too.

Comment 20 Ed Lally 2009-12-11 13:35:04 UTC
Hans,

I'm having a problem with the init parameter.  I'm trying to boot off the hard drive and I add init=/bin/bash to the end.  I'm getting the bash prompt, but it's after the system has already mounted the devices and pivoted from initrd to the new root.  Am I doing something wrong?

Thanks,

Ed

Comment 21 Hans de Goede 2009-12-11 14:19:38 UTC
(In reply to comment #20)
> Hans,
> 
> I'm having a problem with the init parameter.  I'm trying to boot off the hard
> drive and I add init=/bin/bash to the end.  I'm getting the bash prompt, but
> it's after the system has already mounted the devices and pivoted from initrd
> to the new root.  Am I doing something wrong?
> 

No, getting the shell after the pivot root is the entire idea here, as your fsck problem happens after the pivot root, I'm trying to gather information why rc.sysinit (which runs after the pivot root) fails to online md2.

Regards,

Hans

Comment 22 Ed Lally 2009-12-11 15:00:17 UTC
(In reply to comment #21)
> No, getting the shell after the pivot root is the entire idea here, as your
> fsck problem happens after the pivot root, I'm trying to gather information why
> rc.sysinit (which runs after the pivot root) fails to online md2.
> Regards,
> Hans  

OK, I get it now.  The filesystem root is also in the LVM array, so steps 0.1-0.3 are already done for me by the time the pivot occurs.  Do you want me to just start at step 1?

From prior troubleshooting, I found that when rc.sysinit attempts to fsck I get a serious of warnings for the volumes that couldn't be started (because only part of the LVM volume group is available). The warnings tell me to use the "--partial" parameter to mount the volumes, and the system then automatically reboots so I never get the shell mentioned at the end of comment #19.

Comment 23 Hans de Goede 2009-12-11 15:05:56 UTC
Ok,

So let me get this clear, the issue with booting is that the initrd has
started some of the raidset's which together are enough to be able to start the Logical Volume holding /, but the third raid set is not started, and that means that another Logical Volume in the same Volume Group as / cannot start, because it is missing a PV which is needed for that LV.

IOW the VG holding / and the filesystem failing to fsck, is being started by
the initrd in an incomplete mode where it does not have all its PV's available ?

Do I understand that correctly ?

Regards,

Hans

Comment 24 Ed Lally 2009-12-11 15:24:07 UTC
(In reply to comment #23)
> Ok,
> So let me get this clear, the issue with booting is that the initrd has
> started some of the raidset's which together are enough to be able to start the
> Logical Volume holding /, but the third raid set is not started, and that means
> that another Logical Volume in the same Volume Group as / cannot start, because
> it is missing a PV which is needed for that LV.
> IOW the VG holding / and the filesystem failing to fsck, is being started by
> the initrd in an incomplete mode where it does not have all its PV's available
> ?
> Do I understand that correctly ?
> Regards,
> Hans  

You have it right.  /dev/md0 and /dev/md1 are being started by initrd.  /dev/md0 has the /boot partition and /dev/md1 has the first physical volume in the volume group.  Four whole volumes and several partial ones are on /dev/md1.  /dev/md2 (which is not being started) is the second and final volume in the volume group.

Comment 25 Hans de Goede 2009-12-11 15:37:05 UTC
Ok,

So the problem is that the VolumeGroup gets started without all it PV's by our new
initrd called dracut, moving this over to dracut.

Harald,

This bug is a bit of a long story, but the summary can be read in comment #22 and further.

Regards,

Hans

Comment 26 Ed Lally 2009-12-11 15:58:43 UTC
(In reply to comment #25)
> Ok,
> So the problem is that the VolumeGroup gets started without all it PV's by our
> new
> initrd called dracut, moving this over to dracut.
> Harald,
> This bug is a bit of a long story, but the summary can be read in comment #22
> and further.
> Regards,
> Hans  

Thanks Hans.  I appreciate all your help on this.

Comment 27 Ed Lally 2009-12-12 03:09:30 UTC
Harald,

Now that we've isolated this to dracut, I've been doing some research using the rdudevinfo parameter to get more details.  From dmesg it looks like the kernel hardware probe is detecting all disks.  When udev starts it's creating device nodes for the first two drives (Maxtor 6H500F0) and their partitions -- those drives are used to build /dev/md0 and /dev/md1.  

The second set of drives (Seagate ST3500630AS) have device nodes created for the drives, but never have nodes created for the single partition on each.  I think this is why mdadm doesn't assemble the array and why LVM can't activate the volumes.

Any suggestions on how to resolve this?

Thanks,

Ed

Comment 28 Hans de Goede 2009-12-12 23:02:44 UTC
Ed,

Hmm, what does:
blkid -o udev -p /dev/sdX

Give as output for the 2 disks of the md2 set ?

Comment 29 Ed Lally 2009-12-13 02:33:15 UTC
# blkid -o udev -p /dev/sdc
ID_FS_VERSION=0.90.0
ID_FS_UUID=79f22f87-5247-feb4-d1e5-2b7e70cacba0
ID_FS_UUID_ENC=79f22f87-5247-feb4-d1e5-2b7e70cacba0
ID_FS_TYPE=linux_raid_member
ID_FS_USAGE=raid

# blkid -o udev -p /dev/sdd
ID_FS_VERSION=0.90.0
ID_FS_UUID=79f22f87-5247-feb4-d1e5-2b7e70cacba0
ID_FS_UUID_ENC=79f22f87-5247-feb4-d1e5-2b7e70cacba0
ID_FS_TYPE=linux_raid_member
ID_FS_USAGE=raid

Comment 30 Hans de Goede 2009-12-14 08:29:33 UTC
(In reply to comment #29)
> # blkid -o udev -p /dev/sdc
> ID_FS_VERSION=0.90.0
> ID_FS_UUID=79f22f87-5247-feb4-d1e5-2b7e70cacba0
> ID_FS_UUID_ENC=79f22f87-5247-feb4-d1e5-2b7e70cacba0
> ID_FS_TYPE=linux_raid_member
> ID_FS_USAGE=raid
> 
> # blkid -o udev -p /dev/sdd
> ID_FS_VERSION=0.90.0
> ID_FS_UUID=79f22f87-5247-feb4-d1e5-2b7e70cacba0
> ID_FS_UUID_ENC=79f22f87-5247-feb4-d1e5-2b7e70cacba0
> ID_FS_TYPE=linux_raid_member
> ID_FS_USAGE=raid  

Ah and there we have our problem, there also are mdraid signatures for an
older raid set on the entire disk, and dracut is probably finding these and trying to bring up the old raid set.

What you could try is
mdadm --zero-superblock /dev/sdc
mdadm --zero-superblock /dev/sdd

Note the argument is the whole disk, not the partitions you are currently using.

I would backup first though!

Comment 31 Ed Lally 2009-12-14 13:31:02 UTC
(In reply to comment #30)
> Ah and there we have our problem, there also are mdraid signatures for an
> older raid set on the entire disk, and dracut is probably finding these and
> trying to bring up the old raid set.

Hans,

Thanks for finding that, although I don't recall having put an array on the entire disk rather than a partition. In any case, why would dracut detect that when prior versions (and even the installer) didn't?

Is there a chance this is related to bug 543749?

Thanks,

Ed

Comment 32 Hans de Goede 2009-12-14 14:28:25 UTC
(In reply to comment #31)
> (In reply to comment #30)
> > Ah and there we have our problem, there also are mdraid signatures for an
> > older raid set on the entire disk, and dracut is probably finding these and
> > trying to bring up the old raid set.
> 
> Hans,
> 
> Thanks for finding that, although I don't recall having put an array on the
> entire disk rather than a partition.

Well, the metadata there has a different uuid, so I'm pretty sure it is separate metadata and not a misdetection of your current partition based set.

> In any case, why would dracut detect that
> when prior versions (and even the installer) didn't?
> 

dracut is a pretty new and cross distro mkinitrd generated initrd replacement,
anaconda is not finding this raid set as anaconda does not support using
the whole disk as a raid member, dracut however does.

> Is there a chance this is related to bug 543749?

No that seems to be a different bug.

Comment 33 Ed Lally 2009-12-16 04:52:48 UTC
(In reply to comment #32)

Hans,

Thanks again.  I'm backing up the full drive and will give this a try tomorrow evening.

Regards,

Ed

Comment 34 Ed Lally 2009-12-17 02:58:16 UTC
(In reply to comment #32)

Hans,

I zeroed the superblock and it worked, sort of.  I'm now getting an error where dmraid finds a Promise fakeraid superblock on the drives, causing dracut to still not see the underlying mdraid array.  That appears to be left over from an old Windows install on the box.  I've tried removing the superblock with dmraid -rE, but dmraid reports that it doesn't find a disk to remove.

I have been able to temporarily work around this by adding rd_NO_DM to the kernel startup parameters.

After I boot, I then get errors that some libraries (libfreebl3.so and libnssutil3.so) can't be found, causing a whole bunch of services to fail to start.  I realize that's not a problem in your area, but perhaps you have a recommendation.

Thanks again for ALL of your help!

Regards,

Ed

Comment 35 Hans de Goede 2009-12-17 08:29:05 UTC
(In reply to comment #34)
> (In reply to comment #32)
> 
> Hans,
> 
> I zeroed the superblock and it worked, sort of.  I'm now getting an error where
> dmraid finds a Promise fakeraid superblock on the drives, causing dracut to
> still not see the underlying mdraid array.  That appears to be left over from
> an old Windows install on the box.  I've tried removing the superblock with
> dmraid -rE, but dmraid reports that it doesn't find a disk to remove.
> 

Heh, stale metadata hell, how nice. Did you pass a disk as parameter to
dmraid -rE, so for example:
dmraid -rE /dev/sdc

?

> I have been able to temporarily work around this by adding rd_NO_DM to the
> kernel startup parameters.
> 

Yep, that should do the trick.

> After I boot, I then get errors that some libraries (libfreebl3.so and
> libnssutil3.so) can't be found, causing a whole bunch of services to fail to
> start.  I realize that's not a problem in your area, but perhaps you have a
> recommendation.
> 

Try running "yum upgrade", and after that "package-cleanup --problems", if the
last one still spots some issues, you'll have to fix them yourself.

Regards,

Hans


p.s.

As there really is nothing we can do about stale metadata issues, I'm going to close this one as not a bug.

Comment 36 Ed Lally 2009-12-17 16:02:24 UTC
(In reply to comment #35)
> Heh, stale metadata hell, how nice. Did you pass a disk as parameter to
> dmraid -rE, so for example:
> dmraid -rE /dev/sdc

I did, and had no luck.  I'm thinking I'll have to break the RAID mirror, zero-out the drive, rebuild it, and repeat on the other side.  I'm starting to REALLY hate fakeraid!

> Try running "yum upgrade", and after that "package-cleanup --problems", if the
> last one still spots some issues, you'll have to fix them yourself.

Just an FYI: I got past that one, but it was a mess.  Turns out the upgrade log showed an error upgrading the libtdb package (see bug 520541) and a whole mess of packages failed to install.  ldconfig was throwing an error "/usr/lib/libtdb.so.1 is not a symbolic link".  I booted into rescue mode, copied libnssutil3.so and libfreebl3.so to my partition, and was able to get yum running enough to upgrade all the remaining RPMs.

Thank you again for all of your help!  I'm sorry to have used up your time on this.

Regards,

Ed