This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 496258 - F10: Anaconda only works for simple setups.
F10: Anaconda only works for simple setups.
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: anaconda (Show other bugs)
12
All Linux
low Severity medium
: ---
: ---
Assigned To: Radek Vykydal
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-17 10:55 EDT by Gerry Reno
Modified: 2010-01-20 13:40 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-01-20 13:40:17 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
tarball of /tmp from failed preupgrade on machine with multiple raid arrays and VGs (35.03 KB, application/x-compressed-tar)
2009-04-23 03:38 EDT, Gerry Reno
no flags Details
device mapping information for machine (4.11 KB, text/plain)
2009-04-23 04:08 EDT, Gerry Reno
no flags Details
anaconda.log where anaconda Rescue Mode failed to assemble raid arrays (9.61 KB, text/plain)
2009-04-24 14:59 EDT, Gerry Reno
no flags Details
syslog where anaconda Rescue Mode failed to assemble raid arrays (60.10 KB, text/plain)
2009-04-24 15:00 EDT, Gerry Reno
no flags Details

  None (edit)
Description Gerry Reno 2009-04-17 10:55:42 EDT
Description of problem:
In our servers with multiple raid arrays and multiple volume groups anaconda does not see the current setup correctly.  When we select Custom Setup at the partitioning area and then view what anaconda thinks is the current setup, it always get this wrong.  The raid arrays are there but only one volume group is shown despite there being four volume groups.

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. Run anaconda on a system with a non-trivial setup (multiple raid arrays and multiple volume groups)
2.
3.
  
Actual results:
Current setup is not identified correctly.

Expected results:
Current setup is identified correctly.

Additional info:
Comment 1 Chris Lumens 2009-04-17 11:31:01 EDT
Given the amount of changes made to the storage code since F10, bugs in the partitioning code for that release aren't all that helpful.  If you could please verify whether or not you see the same bugs in current code, that would be more useful in making sure they're not in the final release.
Comment 2 Gerry Reno 2009-04-17 12:23:44 EDT
I went looking for an iso for rawhide or rc.  I could not locate one.  Do you have a link for this?
Comment 3 Chris Lumens 2009-04-17 12:56:04 EDT
There aren't ISOs for rawhide, since that would mean creating and distributing several extra gigs of data every single day.  The RC has not yet been released.
Comment 4 Gerry Reno 2009-04-17 13:10:23 EDT
I tried to test this from the rawhide boot.iso.  But it refuses to find the
kickstart file.  I have another bug opened on that issue as well.
Comment 5 Gerry Reno 2009-04-18 18:43:41 EDT
Here is what anaconda needs to support and which I have never seen it able to correctly support:

Muliple raid arrays of any type including with spare disks in the array and proper  GRUB install of bootloader (which means NOT trying to install it on one of the spare disks which are totally blank, like anaconda was trying to do with F9 installer).  I have a previous bug open on that issue.

Multiple volume groups over all types of raid arrays and partitions.  Currently it's hit or miss as to whether you get a proper display of the system volume groups in the partitioning screen.

Respect for 'ignoredisk' command and proper tracking of any raid array or volume group where the ignored disks are a member.  And subsequently NOT trying to activate those arrays or volume groups which have non-redundant members that have been ignored.  If you still have enough members to validly start the array then start it, otherwise do not attempt to start it.  And tell the user why you could not start an array (because they had requested anaconda to ignore non-redundant members).

Not getting confused when there are existing old metadata blocks from old arrays and old volume groups on a disk.
Comment 6 Gerry Reno 2009-04-23 01:22:08 EDT
I just had a preupgrade failure where /boot is mounted on an mdraid array (/dev/md0).  The problem was that it could not find the kickstart file.  When I checked the UUID it was using to find the kickstart start file, it was assigned to every element of the /dev/md0 array.  The problem is that the first element in the blkid list happens to be the spare disk for this array which has raid metadata on it but is otherwise totally blank.  And anaconda must have tried to find the kickstart file on this blank disk and failed.  Anaconda should only be looking at working array elements and not spare elements.
Comment 7 Gerry Reno 2009-04-23 03:38:34 EDT
Created attachment 340880 [details]
tarball of /tmp from failed preupgrade on machine with multiple raid arrays and VGs

In the logs I can see where there are invalid arguments apparently being passed to mdadm which causes the raid arrays to not assemble.  And this leads to not being able to activate the volume groups and therefore no filesystem.  And of course then the error message that anaconda was not able to find the root of the existing installation.
Comment 8 Gerry Reno 2009-04-23 04:08:26 EDT
Created attachment 340891 [details]
device mapping information for machine
Comment 9 Gerry Reno 2009-04-23 11:14:49 EDT
If you look in program.log you will see that mdadm is complaining about creating a device called "/dev/md/0".  md devices are named like /dev/md0 and not /dev/md/0.  So there must be some typo in the code.
Comment 10 Gerry Reno 2009-04-24 14:59:53 EDT
Created attachment 341241 [details]
anaconda.log where anaconda Rescue Mode failed to assemble raid arrays

On our systems with raid arrays, anaconda fails to assemble the arrays.
Comment 11 Gerry Reno 2009-04-24 15:00:57 EDT
Created attachment 341242 [details]
syslog where anaconda Rescue Mode failed to assemble raid arrays
Comment 12 Bill Nottingham 2009-05-28 23:02:51 EDT
That anaconda.log is from F10 anaconda; can you attach one from F11?
Comment 13 Bill Nottingham 2009-05-28 23:03:20 EDT
Erm, never mind. Fixing bug version instead.
Comment 14 Radek Vykydal 2009-06-09 11:32:47 EDT
Here is what I found interesting in the logs:

From F11 preupgrade syslog (comment #7):

<6>md: bind<sda1>   
<6>md: bind<sdb1>   
<6>md: bind<sdb2>
<6>md: bind<sdb3>
<6>md: bind<sdb5>
<6>md: bind<sdc1>
<4>md: sdd1 has same UUID but different superblock to sdb1
<4>md: sdd1 has different UUID to sdb1
<6>md: export_rdev(sdd1)
<4>md: sdd2 has same UUID but different superblock to sdb2
<4>md: sdd2 has different UUID to sdb2
<6>md: export_rdev(sdd2) 
<4>md: sdd3 has same UUID but different superblock to sdb3
<4>md: sdd3 has different UUID to sdb3
<6>md: export_rdev(sdd3)
<4>md: sde1 has same UUID but different superblock to sdb1
<4>md: sde1 has different UUID to sdb1
<6>md: export_rdev(sde1)
<4>md: sde2 has same UUID but different superblock to sdb2
<4>md: sde2 has different UUID to sdb2
<6>md: export_rdev(sde2)
<4>md: sde3 has same UUID but different superblock to sdb3
<4>md: sde3 has different UUID to sdb3
<6>md: export_rdev(sde3)
<6>md: bind<sdg1>
<6>md: bind<sdh1>
<6>md: bind<sdi1>
<4>md: kicking non-fresh sdb5 from array!
<6>md: unbind<sdb5>
<6>md: export_rdev(sdb5)

from F11 preupgrade program.log (comment #7):

Running... ['mdadm', '--incremental', '--quiet', '/dev/sdd1']
mdadm: failed to add /dev/sdd1 to /dev/md/1: Invalid argument.
Running... ['udevadm', 'settle', '--timeout=10']
Running... ['mdadm', '--incremental', '--quiet', '/dev/sdd2']
mdadm: failed to add /dev/sdd2 to /dev/md/2: Invalid argument.
Running... ['udevadm', 'settle', '--timeout=10']
Running... ['mdadm', '--incremental', '--quiet', '/dev/sdd3']
mdadm: failed to add /dev/sdd3 to /dev/md/0: Invalid argument.
Running... ['udevadm', 'settle', '--timeout=10']
Running... ['mdadm', '--incremental', '--quiet', '/dev/sde1']
mdadm: failed to add /dev/sde1 to /dev/md/1: Invalid argument.
Running... ['udevadm', 'settle', '--timeout=10']
Running... ['mdadm', '--incremental', '--quiet', '/dev/sde2']
mdadm: failed to add /dev/sde2 to /dev/md/2: Invalid argument.
Running... ['udevadm', 'settle', '--timeout=10']
Running... ['mdadm', '--incremental', '--quiet', '/dev/sde3']
mdadm: failed to add /dev/sde3 to /dev/md/0: Invalid argument.

Exactly sdd1, sdd3, sdd3, sde1, sde2, and sde3 are failing when assembled
incrementaly (md0, md1, md2 to which they should belong had been already created
by incremental assembly of sdb1, sdb2, sdb3).

From storage.log, UUIDs that udev reports are the same for {sdb1,sdd1,sde1},
for {sdb2,sdd2,sde2}, and also for {sdb3,sdd3,sde3}, so it seems the superblocks
of md0, md1, md2 members are inconsistent (probably due to sdb members suberblocks
being corrupted?).

Also preferred minors obtained for mdX arrays by mdadm -E (anaconda does this
when first member of given uuid is to be added to array) differ from what you
have in raid dump info of your system in comment #8. if they are obtained from
sdb members. From program.log (comment #7):

Running... ['mdadm', '--examine', '--brief', '/dev/sda1']
ARRAY /dev/md3 level=raid5 num-devices=5 UUID=aa3c026d:100e38a3:94e047ac:3cdc7536
Running... ['udevadm', 'settle', '--timeout=10']
Running... ['mdadm', '--incremental', '--quiet', '/dev/sda1']
Running... ['mdadm', '--examine', '--brief', '/dev/sdb1']
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=386279ca:d5568110:b59db45b:4877c7dd
   spares=1
Running... ['udevadm', 'settle', '--timeout=10']
Running... ['mdadm', '--incremental', '--quiet', '/dev/sdb1']
Running... ['mdadm', '--examine', '--brief', '/dev/sdb2']
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=e3c3a593:75f3d840:8b37f13b:9fe78bbf
   spares=1
Running... ['udevadm', 'settle', '--timeout=10']
Running... ['mdadm', '--incremental', '--quiet', '/dev/sdb2']
Running... ['mdadm', '--examine', '--brief', '/dev/sdb3']
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=1ca9683a:3a2d4b76:4f587a38:014a65cb
   spares=1


F10 rescue log from comment #10 shows inconsistency of superblock info
between sdbX and other members too:

18:46:16 INFO    : mdadm -E /dev/sda1
18:46:17 INFO    : mdadm -E /dev/sdb1
18:46:17 INFO    : mdadm -E /dev/sdb2
18:46:17 INFO    : mdadm -E /dev/sdb3
18:46:17 ERROR   : raid set inconsistency for md3: found members of multiple raid sets that claim to be md3.  Using only the first array found.

... conflict with /dev/sda1 being member of md3 (sdb3 should be member of md0)

18:46:17 INFO    : mdadm -E /dev/sdb5
18:46:17 ERROR   : raid set inconsistency for md3: all drives in this raid set do not agree on raid parameters.  Skipping raid device

... conflict with /dev/sda1 being member of md3
    and similar:

18:46:17 INFO    : mdadm -E /dev/sdc1
18:46:17 INFO    : mdadm -E /dev/sdd1
18:46:17 ERROR   : raid set inconsistency for md0: all drives in this raid set do not agree on raid parameters.  Skipping raid device
18:46:17 INFO    : mdadm -E /dev/sdd2
18:46:17 ERROR   : raid set inconsistency for md1: all drives in this raid set do not agree on raid parameters.  Skipping raid device
18:46:17 INFO    : mdadm -E /dev/sdd3
18:46:17 ERROR   : raid set inconsistency for md2: all drives in this raid set do not agree on raid parameters.  Skipping raid device
18:46:17 INFO    : mdadm -E /dev/sde1
18:46:17 ERROR   : raid set inconsistency for md0: all drives in this raid set do not agree on raid parameters.  Skipping raid device
18:46:17 INFO    : mdadm -E /dev/sde2
18:46:17 ERROR   : raid set inconsistency for md1: all drives in this raid set do not agree on raid parameters.  Skipping raid device
18:46:17 INFO    : mdadm -E /dev/sde3
18:46:17 ERROR   : raid set inconsistency for md2: all drives in this raid set do not agree on raid parameters.  Skipping raid device
18:46:17 INFO    : mdadm -E /dev/sdf1
18:46:17 INFO    : mdadm -E /dev/sdg1
18:46:17 INFO    : mdadm -E /dev/sdh1


Does your system boot with removed spares correctly? No similar messages in syslog? - well, probably not as F10 rescue syslog doesn't have them either...
Can you attach output of mdadm -E of /dev/sdbX members please?
Comment 15 Gerry Reno 2009-06-09 13:20:02 EDT
I have since gone and manually straightened out all the superblocks on these devices which I think were corrupted by previous failed install attempts using prior versions of anaconda.

And here is the mdadm output you requested:


mdadm: No md superblock detected on /dev/sdb4.
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 386279ca:d5568110:b59db45b:4877c7dd
  Creation Time : Sun May  4 15:26:21 2008
     Raid Level : raid1
  Used Dev Size : 192640 (188.16 MiB 197.26 MB)
     Array Size : 192640 (188.16 MiB 197.26 MB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 0

    Update Time : Thu Apr 16 01:18:18 2009
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 47018549 - correct
         Events : 8


      Number   Major   Minor   RaidDevice State
this     2       8       17        2      spare   /dev/sdb1

   0     0       8       49        0      active sync   /dev/sdd1
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       17        2      spare   /dev/sdb1
/dev/sdb2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : e3c3a593:75f3d840:8b37f13b:9fe78bbf
  Creation Time : Sun May  4 15:26:21 2008
     Raid Level : raid1
  Used Dev Size : 32001408 (30.52 GiB 32.77 GB)
     Array Size : 32001408 (30.52 GiB 32.77 GB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 1

    Update Time : Thu Apr 16 08:38:09 2009
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1
       Checksum : c1efcd5c - correct
         Events : 38


      Number   Major   Minor   RaidDevice State
this     2       8       18        2      spare   /dev/sdb2

   0     0       8       50        0      active sync   /dev/sdd2
   1     1       8       66        1      active sync   /dev/sde2
   2     2       8       18        2      spare   /dev/sdb2
/dev/sdb3:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 1ca9683a:3a2d4b76:4f587a38:014a65cb
  Creation Time : Sun May  4 15:26:21 2008
     Raid Level : raid1
  Used Dev Size : 211945472 (202.13 GiB 217.03 GB)
     Array Size : 211945472 (202.13 GiB 217.03 GB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 2

    Update Time : Thu Apr 16 01:18:18 2009
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1
       Checksum : ef4bd4f2 - correct
         Events : 3528


      Number   Major   Minor   RaidDevice State
this     2       8       19        2      spare   /dev/sdb3

   0     0       8       51        0      active sync   /dev/sdd3
   1     1       8       67        1      active sync   /dev/sde3
   2     2       8       19        2      spare   /dev/sdb3
/dev/sdb5:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : aa3c026d:100e38a3:94e047ac:3cdc7536
  Creation Time : Sun May  4 15:26:21 2008
     Raid Level : raid5
  Used Dev Size : 244131648 (232.82 GiB 249.99 GB)
     Array Size : 976526592 (931.29 GiB 999.96 GB)
   Raid Devices : 5
  Total Devices : 6
Preferred Minor : 3

    Update Time : Thu Apr 16 01:18:18 2009
          State : clean
 Active Devices : 5
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 1
       Checksum : d5c53e6c - correct
         Events : 46

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8       21        5      spare   /dev/sdb5

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       97        1      active sync   /dev/sdg1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8        1        3      active sync   /dev/sda1
   4     4       8       33        4      active sync   /dev/sdc1
   5     5       8       21        5      spare   /dev/sdb5
Comment 16 Radek Vykydal 2009-06-10 08:53:48 EDT
Thanks for the info. So after straightening the preferred minor numbers you were able to do preupgrade?
It seems that this bug is about handling of removed spare members with corrupted superblocks, i.e. superblocks with mismatching preferred minor numbers. I am not sure if we should support installs or upgrades on such setups, perhaps we should warn user about the fail in UI and offer to stop the installation at least.

In your case, kickstart command ignoredisk --sdb could be a workaround alternative to fixing the superblocks, but this would work for install and upgrade, not for preupgrade probably.
Comment 17 Gerry Reno 2009-06-10 10:12:05 EDT
In my view, anaconda should not be relying upon preferred minor numbers at all.  The preferred minor only indicates what array that device was a part of at the time of array creation, nothing more.  That device since could have been moved and made a part of any/many other arrays.  In fact I use a pooling approach to spare disks and I can assign any member of the pool to any array so the preferred minors are almost guaranteed not to match.

In the case of upgrade in particular, anaconda has a very difficult time reassembling the setup.  This is due to reliance on these preferred minors.  Instead anaconda should probably just reassemble the setup just like the system does when it boots and that is to use 'mdadm -A -s' which will scan the devices for the correct array members and then assemble the array from these devices.  Whenever anaconda just refused to properly assemble the setup, I could always boot right into rescue and issue a 'mdadm -A -s' and the setup was immediately assembled correctly.
Comment 18 Joel Andres Granados 2009-09-09 08:02:07 EDT
Is This still relevant in f12 alpha?
There has been quite a lot of work with md devices.  A retest would be much appreciated.
Comment 19 Bug Zapper 2009-11-16 04:56:01 EST
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Note You need to log in before you can comment on or make changes to this bug.