Bug 643967

Summary: LiveCD fails to use BIOS RAID, DVD works
Product: [Fedora] Fedora Reporter: Sandro Mathys <sandro>
Component: anacondaAssignee: Anaconda Maintenance Team <anaconda-maint-list>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 14CC: anaconda-maint-list, awilliam, dcantrell, hdegoede, jonathan, vanmeeuwen+fedora
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-21 18:02:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
anaconda-logs.tgz (/tmp/*log*)
none
anaconda-logs.tgz (/tmp/*log*) -- using anaconda-14.20-1, lvm2
none
/var/log/messages showing md0 and errors none

Description Sandro Mathys 2010-10-18 16:57:35 UTC
This discussion started in #583906 but as the original problem was solved it seemed helpful to have a new bug report for the remaining issues. I'll reproduce the important details here.

#####

- Reproducible in ~3/4 of all tries with the LiveCD
- NOT reproducible at all with the DVD

- There's 4 disks in this system, sd{a,b,c,d} - and only those four
- All are 100% consumed in the BIOS RAID (level 10)
- The BIOS RAID storage is shown in the installer in the specialized storage
tab
- The BIOS RAID storage itself is in working order (there's windows on it and the dvd installer can write onto it)

Sequence:
1) I choose Specialized Storage Devices
2) I choose/activate/enable my BIOS RAID set from the Firmware RAID tab (i.e. the BIOS RAID is shown there!) and click next
2) I get an error message about my single disks contained in the BIOS RAID set:
"Disks sda, sdb, sdc, sdd contain BIOS RAID metadata, but are not part of any
recognized BIOS RAID sets. Ignoring disks sda, sdb, sdc, sdd."
3a) if I continue I'll get the traceback from #583906 once I try to review the partitioning layout
3b) with the proposed patch from #583906 I get "no disks found" immediately after the cited error message

So there's two problems here, that are probably closely connected:
- error message about the single disks is shown
- "no disks found" even though I was able to choose the bios raid

#####

After the traceback from step 3a) was fixed and we hit 3b), bcl asked me to attach the lsmod output of both a livecd and a dvd. According to clumens the only difference that might be of importance was the module "linear" which was loaded on the dvd but not on the livecd. He asked me to load that module before starting the installer but that did no solve the above problems.

#####

If there's anything more I can try kindly let me know. I'll also hang out in #fedora-qa whenever I can.

Comment 1 Sandro Mathys 2010-10-18 16:59:04 UTC
Blocking F14Blocker after failing this test case:
https://fedoraproject.org/wiki/QA:Testcase_Install_to_BIOS_RAID

Comment 2 James Laska 2010-10-18 18:55:51 UTC
Created attachment 454194 [details]
anaconda-logs.tgz (/tmp/*log*)

I've tested and reproduced this failure using the F-14-TC1-Desktop-x86_64 live image (with anaconda-14.20-1 and python-pyblock-0.49-2 installed).  This system uses an nVidia RAID controller

Comment 3 James Laska 2010-10-18 20:39:12 UTC
Created attachment 454212 [details]
anaconda-logs.tgz (/tmp/*log*) -- using anaconda-14.20-1, lvm2

Just retested using a custom-built x86_64 live image containing the following updates:
 * kernel-2.6.35.6-44.fc14.x86_64
 * anaconda-14.20-1.fc14.x86_64
 * lvm2-2.02.73-3.fc14.x86_64
 * python-pyblock-0.51-1.fc14.x86_64

The problem remains :(

Comment 4 Jesse Keating 2010-10-18 23:54:04 UTC
Just voicing opinion here, I think it might be acceptable to common bug the fact that you can't do (some?) bios raid installs from live images, and instead need to use the DVD.

I'm going to test my dmraid set here, which has been pretty solid, just for more data points, but as it stands, I'd classify this as Nice to Have as opposed to release blocker.

Comment 5 Jesse Keating 2010-10-19 04:36:02 UTC
So I've been testing with TC1, and it has some interesting results.

After a fresh creation of the disk array, I can install to it with the Desktop Live image, reboot and things are happy.  If I try subsequent installs with the live, things go south.  There is no traceback, and it detects the array OK, but when i try to install to the array even using all the whole disk, the installer reports that there is not enough free space on the device to perform the installation.

So, I still think we can just common bugs this.

Comment 6 Hans de Goede 2010-10-19 07:11:47 UTC
Hi all,

As the former anaconda BIOS RAID person and someone who clearly still cares about Fedora and anaconda, I've taken the liberty to look into the log attached here and attached to bug 583906.

So there are 2 completely different failures being seen here.

1) looking at the logs from jlaska's first installation attempt on the nvidia-raid set. There is the following:

18:52:08,150 DEBUG storage: device 'nvidia_cgfidbddp2' not in exclusiveDisks
18:52:08,150 DEBUG storage: ignoring nvidia_cgfidbddp2 (/devices/virtual/block/dm-4)

jlaska, it lookes like you went the advanced / server storage route and found a bug there with dmraid using BIOS RAID's even when the raid set is in exclusive disks we wrongly ignore the partitions on the raid-set. That would be bug 1.

2) looking at the first logs of the installation attempted in bug 583906 it says:
devices to scan for multipath: ['sda', 'sda1', 'sda2', 'sda3', 'sda4', 'sdb', 'sdb1', 'sdb2', 'sdb3', 'sdb4', 'sdc', 'sdc1', 'sdc2', 'sdc5', 'sdc6', 'sdd', 'sdd1', 'sdd2', 'sdd5', 'sdd6', 'sr0']

Note there are no md* devices there, so the livecd failed to activate the raid set on boot. This means that there either is an issue / incompatibility between the set and mdraid, but then I would expect DVD installs to also fail, or there is an issue with the initscripts / udev rules which makes them fail to bring up the set.

Note that once anaconda is started, it is way too late to bring up the raid set, as quite likely some partitions on the raw disks are already in use directly making the disks in use and thus making it impossible to activate the raid set at this point. IOW I believe this is not an anaconda bug.

So I think this bug should be split, given that the logs for the nvidia issue (1) are attached here, I think it would be best to open a second bug for the failing of activating mdraid using BIOS RAID arrays from the livecd.

Regards,

Hans

Comment 7 Adam Williamson 2010-10-19 15:30:22 UTC
note that we still have rd_NO_MD and rd_NO_DM in the boot parameters for the live CD. That's one related difference between the live image and the DVD that I can think of. We do this specifically so the arrays won't be constructed at boot time as we felt that wasn't appropriate for the live environment (generally the live environment shouldn't touch permanent storage on the system until you expressly tell it to). We don't have 'noismwraid' (or whatever exactly it is) any more, though.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 8 Adam Williamson 2010-10-19 18:15:43 UTC
dlehman has a proposed patch for this:

http://fpaste.org/S0GQ/

Sandro will test soon.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 9 Adam Williamson 2010-10-19 18:50:50 UTC
the proposed patch is for James' bug, the one Hans calls 'bug 1', not Sandro's bug, which Hans calls 'bug 2'. James, can you please test the patch when you get time? Thanks.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 10 Sandro Mathys 2010-10-19 19:09:56 UTC
Removing rd_NO_MD and rd_NO_DM options on the boot line didn't change the behavior.

I think I just hit something, though. my bios raid is /dev/md127 - but for an unknown reason the livecd seems to create /dev/md0 sometimes.

I currently *think*:
- if md0 is present, the I hit the issue and md127 isn't created
- if md0 is not present, I do NOT hit the issue and md127 is created

Just that md0 shows up, vanishes, shows up again...with no reboot, i.e. while running the livecd. I didn't find out yet what I do to create/remove it, i.e. what triggers md0 showing up/vanishing.

Does anyone have a guess what is responsible for md0 (instead of md127)?

Comment 11 Sandro Mathys 2010-10-19 19:25:02 UTC
Created attachment 454420 [details]
/var/log/messages showing md0 and errors

Comment 12 Adam Williamson 2010-10-19 20:28:14 UTC
I split jlaska's bug off as https://bugzilla.redhat.com/show_bug.cgi?id=644616 . This report should be for Sandro's bug. Jesse can report another, if he likes, as his issue is clearly not jlaska's or Sandro's :)



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 13 Sandro Mathys 2010-10-19 20:50:14 UTC
hansg, adamw and I we were able to find the root cause and a valid workaround for this bug (i.e. my bug, not James' which seems to be a different one after all).

Because there's an available easy workaround, we decided to lift blocker status.

#####

Let me first mention the WORKAROUND here:
DON'T do anything that might mount, write to or activate your disks/raid arrays before starting or while running the installer. This includes not to start the installer again after it's already been running once already.

If you violated this rule, reboot before you start the installer!

#####

Now let me try to explain what actually happens here :)

Because I have a raid10, there's both, /dev/md0 and /dev/md127. Both are necessary and valid .

1) Now if I boot the livecd, NONE of them are activated.

2) If I start the installer, both are still missing. If I get to the point where the devices are examined, BOTH are activated. This means I do NOT hit the issue here.

3) If I now cancel the installation, BOTH are still activated. (note: this is very different from 1)

4) If I start the installer, ONE is DEactivated the other is still activated. If I get to the point where the devices are examined, BOTH are activated (or rather tried to). But because one is already activated (i.e. the disks it consists of or not already busy/in use) this will fail. This means I DO hit the issue here.

4) If I now cancel the installation, still only ONE is activated. (note: different from both, 1 and 3)

5) If I start the installer, ONE is DEactivated and with the other still DEactivated - now NONE of them are activated. <start at 1 again>

#####

So the above mentioned workaround makes sure the installation always starts with situation 1 as situation 3 is fatal.

The fix still needed in this bug needs to deal with situation 3, i.e. instead of only deactivating one of both raid arrays, ALL should be deactivated when the installer is started.

Comment 14 Adam Williamson 2010-10-20 00:48:07 UTC
hans, you said you had a potential fix for this, right? sandro, did you test it yet? I lose track. If we have a fix for this I'd quite like to take it for RC1 if we can make it in time.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 15 Sandro Mathys 2010-10-20 06:20:05 UTC
Hans made me test some patch which didn't seem to help - not sure what patch that was exactly but I think he indeed said it should fix this issue in some way. Hans, please clarify - and re-post patch if it applies here, thanks.

Comment 16 Hans de Goede 2010-10-20 07:10:42 UTC
(In reply to comment #14)
> hans, you said you had a potential fix for this, right? sandro, did you test it
> yet? I lose track. If we have a fix for this I'd quite like to take it for RC1
> if we can make it in time.
> 

Erm, no. dlehman has a patch which should fix the issue with running the installer a second time on the same livecd boot. The much bigger issue of the livecd not activating the RAID sets has not been fixed yet (and I don't know if it even has been looked into yet). This (failing to activate the BIOS RAID sets on livecd boot) should be considered a blocker, as it can cause serious data corruption.

Regards,

Hans

Comment 17 Sandro Mathys 2010-10-20 07:19:50 UTC
Well, if you fix the bigger issue, the workaround won't work anymore and this minor bug here becomes a blocker again as well :)

Hans, is dlehman's fix in anaconda 14.22 already?

Comment 18 Hans de Goede 2010-10-20 08:03:48 UTC
(In reply to comment #17)
> Well, if you fix the bigger issue, the workaround won't work anymore and this
> minor bug here becomes a blocker again as well :)
> 

You may have a valid point there.

> Hans, is dlehman's fix in anaconda 14.22 already?

No.

Comment 19 Sandro Mathys 2010-10-21 17:32:34 UTC
I can no longer reproduce any of the bugs mentioned in this report with F14 Final TC6 i686 Gnome LiveCD. As a reminder: so far I used a F14 Final TC1.1 i686 KDE LiveCD.

So either this issue vanished between TC1.1 and TC6 or it's KDE-only. Unfortunately, there's no TC6 KDE spin available to verify the latter.

Comment 20 Sandro Mathys 2010-10-21 22:18:59 UTC
Can reproduce with rc1 kde live - not sure what's so different between the kde and gnome spins in regard of bios raid :/

i.e. live boot -> no /dev/md* at all

Comment 21 Sandro Mathys 2010-10-21 22:36:48 UTC
Wait, ignore that last comment - let's just pretend it's not there at all. Still need to reproduce this, but obviously I need some sleep before I do so.

Comment 22 Adam Williamson 2010-10-22 03:33:56 UTC
you got tc1 and rc1 mixed up, didn't you? =)



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 23 Sandro Mathys 2010-10-22 06:10:03 UTC
I wished I did! It's much worse than that - I used the wrong system to reproduce, i.e. there was no such thing as a bios raid in there :/

Comment 24 Adam Williamson 2010-10-22 06:21:14 UTC
okay, step AWAY from the crack pipe, sandro =)



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 25 Chris Lumens 2010-10-25 18:43:51 UTC
So, this is still a bug?  Or it's fixed now?  What's the status?

Comment 26 Adam Williamson 2010-10-25 20:33:30 UTC
on the basis of comment #13 there's an issue with running the installer twice on Intel BIOS RAID (possibly only RAID 0). It'd be good if Sandro could say for sure whether he can still reproduce with RC1, his later posts seem confused.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 27 Sandro Mathys 2010-10-27 19:06:09 UTC
Okay, I can reproduce my initial issue on both, KDE and Gnome RC1 x86_64 live spins right now. The workaround still works.

Regarding the raid assembly I see a really weird behaviour and the two spins indeed do behave differently. Right now it seems to be assembled after boot, but not in the same way on both spins. Also, they seem to be activated/deactivated differently after running anaconda several times.

And yet I see the original issue with every second run of anaconda.

In case it doesn't seem to make sense what I wrote here - the behaviour I see really doesn't seem to make much sense to me either. Maybe it would make sense after another couple of tried with both spins but I currently lack the time to do that. I hope hansg can eventually reproduce a difference with the two spins and maybe some other weird behaviour, though.

Comment 28 Chris Lumens 2010-11-05 19:20:26 UTC
anaconda doesn't really clean up after itself following a run, since it's not expected to be run again.  This is yet another thing that sucks about the livecd.  If you cannot produce the problem on the initial first run of anaconda, I am inclined to not really do anything about it.