Bug 596224

Summary:	Anaconda swaps names of md devices
Product:	[Fedora] Fedora	Reporter:	Alexander Todorov <atodorov>
Component:	anaconda	Assignee:	Hans de Goede <hdegoede>
Status:	CLOSED RAWHIDE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	low
Version:	rawhide	CC:	alex.williamson, hdegoede, jonathan, mbanas, myllynen, vanmeeuwen+fedora
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	591970
Clones:	596227 (view as bug list)		Environment:
Last Closed:	2010-05-26 14:26:27 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Alexander Todorov 2010-05-26 12:19:57 UTC

+++ This bug was initially created as a clone of Bug #591970 +++

Description of problem:
Anaconda seems to hang when detecting storage devices if the disk contains RAID partitions which are member of non existing RAID array (I think so). If you wait longer than 5 minutes the UI shows the devices/partitions available.

Version-Release number of selected component (if applicable):
anaconda-13.21.39-1.el6 / -0512.0 compose

How reproducible:
Always

Steps to Reproduce:
1. Install a KVM domU as described in https://bugzilla.redhat.com/show_bug.cgi?id=587442#c12
2. Anaconda will hit the above bug.
3. Start a second install for KVM domU with only one of the disks (i.e. vda)
  
Actual results:
Anaconda seems to hang while discovering storage devices. It actually takes too long (more than 5 minutes). 

Expected results:
Anaconda sees /dev/vda right away and let's the user partition the disk

Additional info:
After waiting long enough I was able to proceed to the partitioning UI. There I selected Custom partitioning and I see the following devices available:
md0 - type unknown
md1 - type unknown
vda1 - ext4 (this was /boot)
vda2 - md1 - raid member
vda3 - md0 - raid member
vda4 - extended
vda5 - swap


Why is anaconda taking so long to find vda disk (no other disks present) and why is it showing the RAID devices when there's only 1 raid partition available.

Also vda2 and vda3 seem to belong to the wrong array or md devices swapped order.

--- Additional comment from atodorov on 2010-05-13 18:12:48 EEST ---

In storage.log you can notice the 5 minutes delay between log messages.

14:42:23,239 DEBUG   : md1 state is inactive
14:42:23,256 DEBUG   :               MDRaidArrayDevice.teardown: md1 ; status: False ;
14:42:23,264 DEBUG   : md1 state is inactive
14:42:23,446 DEBUG   :                 PartitionDevice.teardown: vda2 ; status: True ;
14:42:23,487 DEBUG   :                  MDRaidMember.teardown: device: /dev/vda2 ; status: False ; type: mdmember ;
14:42:23,557 DEBUG   :                  MDRaidMember.teardown: device: /dev/vda2 ; status: False ; type: mdmember ;
14:42:23,560 DEBUG   :                  PartitionDevice.teardown: vda2 ; status: True ;
14:42:23,569 DEBUG   :                   MDRaidMember.teardown: device: /dev/vda2 ; status: False ; type: mdmember ;
14:42:23,577 DEBUG   :                   MDRaidMember.teardown: device: /dev/vda2 ; status: False ; type: mdmember ;
14:47:25,129 DEBUG   :                    DiskDevice.teardown: vda ; status: True ;
14:47:25,159 DEBUG   :                     DiskLabel.teardown: device: /dev/vda ; status: False ; type: disklabel ;
14:47:25,190 DEBUG   :                     DiskLabel.teardown: device: /dev/vda ; status: False ; type: disklabel ;
14:52:28,039 DEBUG   :                  Ext4FS.supported: supported: True ;
14:52:28,188 DEBUG   :              PartitionDevice.setup: vda1 ; status: True ; orig: False ;
14:52:28,202 DEBUG   :               PartitionDevice.setupParents: kids: 0 ; name: vda1 ; orig: False ;
14:52:28,214 DEBUG   :                DiskDevice.setup: vda ; status: True ; orig: False ;
14:52:28,235 DEBUG   :               DiskLabel.setup: device: /dev/vda ; status: False ; type: disklabel ;
14:52:28,249 DEBUG   :                DiskLabel.setup: device: /dev/vda ; status: False ; type: disklabel ;
14:52:28,276 INFO    : set SELinux context for mountpoint /mnt/sysimage to None
14:52:28,439 INFO    : set SELinux context for newly mounted filesystem root at /mnt/sysimage to None
14:52:28,496 DEBUG   :              PartitionDevice.teardown: vda1 ; status: True ;
14:52:28,629 DEBUG   :               PartitionDevice.teardown: vda1 ; status: True ;
14:57:31,356 DEBUG   :                 DiskDevice.teardown: vda ; status: True ;
14:57:31,382 DEBUG   :                  DiskLabel.teardown: device: /dev/vda ; status: False ; type: disklabel ;
14:57:31,397 DEBUG   :                  DiskLabel.teardown: device: /dev/vda ; status: False ; type: disklabel ;

--- Additional comment from pm-rhel on 2010-05-13 18:19:49 EEST ---

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

--- Additional comment from hdegoede on 2010-05-14 10:58:53 EEST ---

Please reproduce and attach complete logs.

--- Additional comment from atodorov on 2010-05-14 11:18:05 EEST ---

Created an attachment (id=413966)
logs from the system

Oops, thanks for reminding me to attach those.

--- Additional comment from hdegoede on 2010-05-14 11:48:51 EEST ---

Looking at the logs it is stuck for 5 minutes in an udev_settle call, which is likely caused by some mdadm udev rules taking ages in this scenario, moving this over to mdadm.

--- Additional comment from dledford on 2010-05-26 03:01:06 EEST ---

OK, there were multiple questions brought up in the original bug, so first to address the "why is this taking so long question" (which is the only real bug present, more on that later):

This happens when anaconda attempts to tear down the raid devices.  I don't think it's a udev rule at all as far as I can tell.  It seems like it's a kernel issue instead.  However, debugging the issue is slow because the installer environment contains so few things that would be useful in figuring the issue out (like strace for example).

Anyway, here's what I know so far, and I'm detailing it here in case any of this rings a bell to Hans because if it does then he might know something I don't and be able to short cut the time to solve this issue:

Anaconda brings up the md devices to inspect them, then tries to stop them before proceeding
When anaconda calls mdadm to stop the device, the device is stopped immediately
Anaconda then calls udevadm settle, this then waits 5 minutes because things never actually settle
Immediately after the md device is stopped, a continuous loop of kernel add/remove actions and subsequent udev add/remove events starts happening
If you do ls /sys/block over and over again, you will see devices popping into and out of existence repeatedly, so unless udev has the ability to force the kernel to actually create kernel structures and devices, I don't think this can be attributed to udev rules, instead I think udev is simply trying to process the endless stream of kernel events and so we see udev looping forever and blame udev when in fact the kernel is driving this issue
The installed system kernel does not exhibit this behaviour

That's my analysis so far, I'll continue working on it.

As for "why do the raid devices show up when not all member disks are present?"  Each device in an array contains a complete superblock that details the array.  Anaconda simply notices that the device has a superblock and pulls the raid array info from that superblock.  There is no attempt to verify that there are sufficient members present for the array to be successfully started before the array info is added into the list of devices to install on.

As for "md0 and md1 seem to be the wrong order" that's because they *are* in the wrong order.  The arrays are added to the device list when we first find a device with a superblock.  Which device we find first (/dev/vda2 or /dev/vda3) is random because we find them in the order that udev processes them and udev can process the partitions on a drive in any order.  According to the storage.log, /dev/vda3 was found first, and according to the blkid information it has a raid superblock that clearly indicates that it *wants* the name /dev/md1, but because it was found first, anaconda blindly renamed it to /dev/md0.  And when /dev/vda2 was processed, the logs show it has a superblock that clearly indicated it wanted the name /dev/md0, but because it was already taken, it was blindly renamed to /dev/md1.

I consider this a *SERIOUS* shortcoming in how anaconda processes md raid devices, but I was told that changing anaconda so that it didn't renumber arrays in the order that it finds them was too drastic of a change for f13/rhel6 and it would have to wait for f14 (or later) and rhel7.  All I know is that since we switched to letting udev find devices for us, and we can no longer count on devices being found "in order", the decision to let anaconda renumber raid devices like this is a recipe for disastrous data loss.  Somewhere, someone is going to blindly think they are preserving their /home filesystem when they select /dev/md0 for upgrade and leave /dev/md1 unformatted and then we are going to wipe out everything they care about.  Personally, I think this is important enough to either A) open a new bug about this behaviour and make it an RC blocker or B) disable upgrades on md raid arrays entirely and instead require that if any md raid arrays exist and are to be preserved, then the install can only happen on newly created partitions, in that way we can tell users that they can boot up with the DVD in rescue mode and hand select their raid arrays they want to upgrade on and hand zero their superblocks, that way they won't be in any md device listings and their space will be free, that way it will be impossible for anaconda to confuse the user by putting them in the wrong order and allow the user to do the install simply by reusing the now empty partitions for a brand new md raid array.

--- Additional comment from dledford on 2010-05-26 03:05:05 EEST ---

*** Bug 589981 has been marked as a duplicate of this bug. ***

Comment 1 Alexander Todorov 2010-05-26 12:21:45 UTC

Hi Doug and others,
let's use this bug to track the issue where anaconda is swapping the names of md devices. 

As Doug says this can confuse the user and cause a data loss which is bad thing.

Comment 2 Hans de Goede 2010-05-26 14:26:27 UTC

Fixed in master.