Bug 475024

Summary: duplicate label message in install with hardware defined software RAID
Product: [Fedora] Fedora Reporter: Ray Todd Stevens <raytodd>
Component: anacondaAssignee: Hans de Goede <hdegoede>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 10CC: bloch, bobgus, hdegoede, raina
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-13 08:10:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
anoconda log from run
none
yum log
none
the syslog from the failed run none

Description Ray Todd Stevens 2008-12-06 19:53:26 UTC
We are running a server with hardware RAID.   I am assuming this is where the problem is.   When we try to run an install we get a message right after selecting upgrade instead in install that there are duplicate labels.

It then explains that this means that multiple pieces of hardware have the same label and that it can not continue.

I think this may be that in some manner it is seeing each disk of the raid 1 hardware array, Symbollogic I believe, as separate.   I can't get the system to upgrade because of this.

Comment 1 Ray Todd Stevens 2008-12-08 19:50:24 UTC
By the way this is the same server that I also reported bug 446845.

Comment 2 Chris Lumens 2008-12-08 21:49:30 UTC
Yes, it sounds to me like we are seeing your two hard drives as separate drives with separate filesystems, instead of all together as one RAID device.  Can you attach /tmp/aanconda.log and /tmp/syslog to this bug report?

Comment 3 Ray Todd Stevens 2008-12-08 22:32:42 UTC
I would assume that you are quite correct, although I will say that this machine was generated under 8 and upgraded to 9 with no problems.   So something somewhere has changed.

Exactly where would I find these files.  The system never mounted any volumes that I can find, so exactly what do I need to do here?

Comment 4 Ray Todd Stevens 2008-12-09 20:53:41 UTC
OK figured it out

here is my setup right now

/dev/mapper/VolGroup00-LogVol00
                     470762416 357141132  89322176  80% /
/dev/mapper/ddf1_4035305a8680c3272020202020202020eb4e47603a354a45p1
                        194442    175257      9146  96% /boot
tmpfs                  1036808        48   1036760   1% /dev/shm

under fc9 I can get a logic partition table under sda and sdb, but it works.

the raid control is symbios

Comment 5 Ray Todd Stevens 2008-12-09 20:54:43 UTC
Created attachment 326400 [details]
anoconda log from run

There should be a simpler way of doing this.

Comment 6 Ray Todd Stevens 2008-12-09 20:55:30 UTC
Created attachment 326401 [details]
yum log

You didn't say you needed this, but figured it didn't hurt.

Comment 7 Ray Todd Stevens 2008-12-09 20:55:58 UTC
Created attachment 326402 [details]
the syslog from the failed run

Comment 8 Chris Lumens 2008-12-09 21:02:09 UTC
Yes there should be a simpler way of doing this.  That's one of the things I'm working on for F11.

Comment 9 Ray Todd Stevens 2008-12-09 21:13:56 UTC
A couple of thoughts.   

I had to manually thought ifconfig enable the network and then do an scp.   Just having a simple way to get the network up would be a big help.

Also how about the simple ability to use a flash drive.   Maybe if the flash drive is connected and has a text file called dumpit in the roon with a list of files then anaconda at its termination regardless of what that is simply dumps those files to the flash drive.

Comment 10 Chris Lumens 2008-12-09 21:21:54 UTC
Yep, those are all things I'm hoping to take care of with a little script you can run on tty2.  The idea is to run a script that'll collect all the information for you, bring up the network if needed, and save to an existing bug, make a new bug, or save to a local disk.  Of course, this plan could completely change during the course of working on it.  But that's what I'm hoping to do.

Comment 11 Ray Todd Stevens 2008-12-10 17:42:53 UTC
OK I think I can see the problem in the log.   A sda and sdb are both coming up.   Now this appears to also happen in my fc9 regular boot, but the boot then is somehow smart enough to only try and mount the raid volume.

Wonder what is different?

Comment 12 Ray Todd Stevens 2008-12-12 22:15:36 UTC
Interestinger and interestinger.   Played with this a little more.   Found three identical partition tables.

sda
sdb
dm-0

the raid system seems to use a software based system called dmraid.   

Now as I said this worked with fc9, but not with fc10.

Comment 13 Ray Todd Stevens 2008-12-20 19:27:18 UTC
OK I have three machines all with the same problem.   All are using the dmraid stuff.   This is almost definitely where the problem is at.

Comment 14 Ray Todd Stevens 2008-12-20 19:34:30 UTC
In reading through the bugzilla bug reports it appears that I am by far not the only one experiencing this error.   It appears that something has changed in the new fc10 os, where if a drive is part of a software raid system that is not directly the fedora software raid system, that it is now scanned and mounted along with the regular raid device.  This did not used to occur.

Comment 15 Ray Todd Stevens 2008-12-20 19:42:12 UTC
Could this be something unfixed by fixing one of my previous bugs?

I was thinking back in fc8 and fc9 I had a problem that I filed a bug report on where I had some drives that had been part of a dmraid configuration, and they were now being used in a machine that didn't support dmraid.  However they still apparently had some markings on them somewhere probably in the boot sector, and there was no way to install on them without putting "nodmraid" in the boot lines of the install.  I did find out that I no longer need this parameter to do the install.   Could this change be what is now causing this problem.

Comment 16 Ray Todd Stevens 2008-12-20 20:38:19 UTC
OK interesting, from what I can tell one of the system that is experiencing this problem doesn't have volume labels.   It appears that it might be something in a conflict between lvm and dmraid (or other hardware defined software raid systems)

It seems to come back to the fact that the dmraid drive is found, but so are all of its components.   All of them seem to be loaded and then the problem starts.   Somehow being part of a hardware defined software raid system should exclude a drive from then being loaded as a regular drive.   This is the way it appears to work in previous versions, but now it appears that these drives are loaded and treated as their own separate drives, which seems to be the problem.

It appears that just ignoring this message will not work, as it also appears that in all probability an update would then wipe the system out by destroying the raid connection and only loading on the first drive in the array.

PS this is if you are using raid 1 which is not the default.   You default everything to raid 0 which seems to me to be a less than wise choice.   The if one drive fails both are useless.   About 4 times the failure risk.

Comment 17 Ray Todd Stevens 2008-12-20 21:06:16 UTC
Well I have found some more out.

First this may or may not be a dmraid driver problem.   The dm-0 drive is loaded first before the sd drives.   When I do an fdisk on it I get a corrupted partition table.   In the logs it looks like it should have loaded from the log messages, but as I said the partition table is basically garbage.   Then the sda and sdb drives are loaded.   I wonder if dm-0 (none of the other dm drive exist dm-1 thru dm-9) had valid data if the sd drives would in fact load ????

Also as a side I did try to easily pull the loads and about anything else I could think if off the system and send them again, but I tried this with a flash drive.   This flash drive loads fine on a fc9 machine in normal run mode, but instead of loading as /media/drive it loaded as /dec/sdc and nothing else, and also had a corrupted but differently corrupted partition table.

More and more interesting.

Comment 18 Ray Todd Stevens 2008-12-20 21:13:20 UTC
One more note from my past experiences.  In looking through my system logs I found what may be the entire key here.

I had a system that I needed to do several things to.  I was going to upgrade it ti fc10 and then add some drives and redo the volume structure a little.  However I ran into this same problem.   I decided to do the whole thing as one try.  Not good.   Even as a scratch install with totally new drives the dmraid thing would not let me install.   I just decided to forget the whole dmraid thing and go with linux software raid.  This worked fine and so it is up, and I didn't file a report on it.   But there seems to be something very broken with the dmraid stuff.   From a quick look through the bugs here it also appears that other hardware defined software raid systems are also having problems with fc10.

This may be where to look for the problem.

I also have a set of drives with dmraid and fc9 ready to upgrade to fc10 that are set aside and not in use so I can test something if need be.

Comment 19 Adam Huffman 2009-01-12 11:25:16 UTC
I've just seen this on a Dell Precision 390.  I had been unable to upgrade to F9 owing to a problem with HAL and the hard drives on there.

There's a RAID1 array created using Intel ISW - Anaconda seems to be seeing both the array and one of the members, when claiming that there are multiple '/boot' labels.

Comment 20 Ray Todd Stevens 2009-01-12 15:39:34 UTC
I just had an off line discussion with Adam about his situation, and I think it is relevant so I am going to post the important part of this discussion back here with additional comments.

On Mon, Jan 12, 2009 at 08:43:47AM -0500, Ray Todd Stevens wrote:
> I am courious.   You say upgrading TO FC9.    I am haivng this problem
> upgrading FC9 to FC10.   Would you be upgrading to FC9 or FC10?
> 
> Also  I found one interesting thing was that if I booted to rescue mode
> ane looked at the partition tables of the drives both drives still
> existed, and had the same valid partition table, but that the raid drive
> existed, claimed to have a partition table, but had garbage for that
> partition table.   What is your experience with this?

Hello

The box was originally running Fedora 7 and had been stuck there because
of a separate Anaconda problem, namely that it crashed when parsing the
hard disk names.  That has finally been fixed in the F10 installer.

Anyway, I yum upgraded to F9 and then tried preupgrade to get to F10,
which is when I saw the message about duplicate /boot labels.

On this box, /dev/sda, /dev/sdb (the two component drives) and
/dev/mapper/isw_bedfhddfij_ARRAY all have valid partition tables.

Interesting,   Yeah I had this same fc7 problem on serveral boxes, including one that is experiencing this problem.   I have tried the preupgrade thing and the full upgrade from all three types of disks.   All with the same problem.

You will also note that I had the same problem with a scratch install.  This appears to be a problem related to the RAID system and is pretty well embedded.   

Incidentally it will properly install as a RAID 0 system, and if you install FC 9 on RAID 0 and then upgrade this works fine.   Now I do notice that for the full software raid system everything defaults to RAID 0.   I am not sure why one would actually use RAID 0 without any additional protection, but that is the default here, and I wonder if RAID 0 is the only thing normally tested.

I also note a number of other reports of this same set of symptoms in bugzilla but attributed to other parts of the system, and or other sets of hardware.   It might be good for someone to condense them into a single report and then find out which part of the system is causing the problem and assign it there.

Comment 21 Ray Todd Stevens 2009-01-15 00:40:10 UTC
Another possible clue.   I have noticed that FC10 has some kind of a quirk where the system that identifies disk drives doesn't seem to either complete or communicate with anaconda properly.  There are a number of bug reports out there on this one too.

Comment 22 Bob Gustafson 2009-01-15 04:32:57 UTC
Check the info on Bug #474399

There seems to be some duplication.

Comment 23 Hans de Goede 2009-01-16 22:08:30 UTC
There are issues in F-10 with anaconda not seeing dmraid setups as a raid set but rather as 2 separate disks, that *might* be what is happening here. I've managed to reproduce and I believe fix this using a system with isw raid.
I've provided updates.img files for this here;
http://people.atrpms.net/~hdegoede/updates474399-i386.img
http://people.atrpms.net/~hdegoede/updates474399-x86_64.img

To use this with an i386 install using isw "hardware" raid type the following
at the installer bootscreen (press <tab> to get to the cmdline editor):
updates=http://people.atrpms.net/~hdegoede/updates474399-i386.img

For an x86_64 install use:
updates=http://people.atrpms.net/~hdegoede/updates474399-i386.img

Please let me know if this resolves the issue for you.

Comment 24 Raina Otoni 2009-01-19 15:35:40 UTC
Hello

Though my hardware is adaptec AAR-1220SA(dmraid ddf),
I have installed F10 without problem by your updates.img file.

Thanks

(In reply to comment #23)
> There are issues in F-10 with anaconda not seeing dmraid setups as a raid set
> but rather as 2 separate disks, that *might* be what is happening here. I've
> managed to reproduce and I believe fix this using a system with isw raid.
> I've provided updates.img files for this here;
> http://people.atrpms.net/~hdegoede/updates474399-i386.img
> http://people.atrpms.net/~hdegoede/updates474399-x86_64.img
> 
> To use this with an i386 install using isw "hardware" raid type the following
> at the installer bootscreen (press <tab> to get to the cmdline editor):
> updates=http://people.atrpms.net/~hdegoede/updates474399-i386.img
> 
> For an x86_64 install use:
> updates=http://people.atrpms.net/~hdegoede/updates474399-i386.img
> 
> Please let me know if this resolves the issue for you.

Comment 25 Bob Gustafson 2009-01-19 16:06:11 UTC
(In reply to comment #24)
> Hello
> 
> Though my hardware is adaptec AAR-1220SA(dmraid ddf),
> I have installed F10 without problem by your updates.img file.
> 
> Thanks

The Adaptec AAR-1220SA looks like a nice board.

I am working with the RAID on my Asus P5K-E motherboard. According to the User Manual, it is ¨Intel Matrix Storage Technology through the onboard Intel ICH9R RAID controller¨.

Comment 26 Ray Todd Stevens 2009-01-25 21:41:48 UTC
I cuncur.   I am the original reporter, and it seems to fix the dmraid problem too.

It might be good to try and find out how many other of the raid bugs this fixes.

Comment 27 Bob Gustafson 2009-01-25 22:06:19 UTC
With some 'hardware' Raid setups, see  Bug #474399 and my system ( Comment #25 ), the problem still remains.

The latest commentary on Bug #474399 seems to indicate that if the dmraid code is compiled into the kernel, some hardware raid (mine) does not work. If the dmraid code is compiled as loadable modules, then hardware raid does work.

At the moment, the kernel used with FC10 apparently has the dmraid code compiled directly into the kernel.

----

I have upgraded from fc8 to fc9, but not yet fc10.

Comment 28 Ray Todd Stevens 2009-01-26 16:04:18 UTC
Interesting I have dmraid installed and booting under fc10 under the patch above.   I am running raid 1 on two 500 mb drives SATA.  If that helps.

Comment 29 Bob Gustafson 2009-01-26 19:03:29 UTC
(In reply to comment #28)
> Interesting I have dmraid installed and booting under fc10 under the patch
> above.   I am running raid 1 on two 500 mb drives SATA.  If that helps.

1) Are you running hardware or software RAID? (software works..)

2) If hardware, what kind of hardware? (some hardware does not work..)

Comment 30 Ray Todd Stevens 2009-01-26 19:19:22 UTC
I am running dmraid, which is a form of raid that is defined in the settings in the bios, but is then fully executed in the drivers in the OS.   That I can tell there is absolutely no hardware support on the board for the raid other than the ability to do these bios settings.

Full software raid has been running find (the md0 stuff)

From what I can tell many of the full hardware raid stuff is also working.

It6 is this hybrid that seems to be a problem, and actually seems to be a very bad design idea.   I have been moving away from it.

Comment 31 Hans de Goede 2009-02-13 08:10:08 UTC
Closing this per Comment #26.

*** This bug has been marked as a duplicate of bug 474399 ***