Bug 591970 - udev doesn't detect storage devices/partitions properly
Summary: udev doesn't detect storage devices/partitions properly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: anaconda
Version: 6.0
Hardware: All
OS: Linux
high
high
Target Milestone: beta
: ---
Assignee: Anaconda Maintenance Team
QA Contact: Release Test Team
URL:
Whiteboard:
: 589981 (view as bug list)
Depends On:
Blocks: 582286 599197
TreeView+ depends on / blocked
 
Reported: 2010-05-13 15:12 UTC by Alexander Todorov
Modified: 2010-07-02 20:49 UTC (History)
13 users (show)

Fixed In Version: anaconda-13.21.50-8
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 596224 599197 (view as bug list)
Environment:
Last Closed: 2010-07-02 20:49:24 UTC
Target Upstream Version:


Attachments (Terms of Use)
logs from the system (22.55 KB, application/x-gzip)
2010-05-14 08:18 UTC, Alexander Todorov
no flags Details
syslog from a failed run (2.38 MB, application/x-bzip2)
2010-05-29 19:06 UTC, Doug Ledford
no flags Details
syslog from a successful run (80.75 KB, application/x-bzip2)
2010-05-29 19:10 UTC, Doug Ledford
no flags Details
Patch to check for location of mdadm and blkid binaries (needed for anaconda install enviroment) (1.47 KB, patch)
2010-06-02 10:25 UTC, Phil Knirsch
no flags Details | Diff
Patch to remove duplicate md handling code from 70-anaconda.rules udev rule file (2.06 KB, patch)
2010-06-02 10:26 UTC, Phil Knirsch
no flags Details | Diff
still waiting (12.76 KB, image/png)
2010-06-10 14:29 UTC, Jan Ščotka
no flags Details
Screenshot of partitions in anaconda (64.67 KB, image/png)
2010-06-10 16:49 UTC, Phil Knirsch
no flags Details
Anaconda's storage.log for this run (64.98 KB, text/plain)
2010-06-10 16:49 UTC, Phil Knirsch
no flags Details

Description Alexander Todorov 2010-05-13 15:12:03 UTC
Description of problem:
Anaconda seems to hang when detecting storage devices if the disk contains RAID partitions which are member of non existing RAID array (I think so). If you wait longer than 5 minutes the UI shows the devices/partitions available.

Version-Release number of selected component (if applicable):
anaconda-13.21.39-1.el6 / -0512.0 compose

How reproducible:
Always

Steps to Reproduce:
1. Install a KVM domU as described in https://bugzilla.redhat.com/show_bug.cgi?id=587442#c12
2. Anaconda will hit the above bug.
3. Start a second install for KVM domU with only one of the disks (i.e. vda)
  
Actual results:
Anaconda seems to hang while discovering storage devices. It actually takes too long (more than 5 minutes). 

Expected results:
Anaconda sees /dev/vda right away and let's the user partition the disk

Additional info:
After waiting long enough I was able to proceed to the partitioning UI. There I selected Custom partitioning and I see the following devices available:
md0 - type unknown
md1 - type unknown
vda1 - ext4 (this was /boot)
vda2 - md1 - raid member
vda3 - md0 - raid member
vda4 - extended
vda5 - swap


Why is anaconda taking so long to find vda disk (no other disks present) and why is it showing the RAID devices when there's only 1 raid partition available.

Also vda2 and vda3 seem to belong to the wrong array or md devices swapped order.

Comment 1 Alexander Todorov 2010-05-13 15:12:48 UTC
In storage.log you can notice the 5 minutes delay between log messages.

14:42:23,239 DEBUG   : md1 state is inactive
14:42:23,256 DEBUG   :               MDRaidArrayDevice.teardown: md1 ; status: False ;
14:42:23,264 DEBUG   : md1 state is inactive
14:42:23,446 DEBUG   :                 PartitionDevice.teardown: vda2 ; status: True ;
14:42:23,487 DEBUG   :                  MDRaidMember.teardown: device: /dev/vda2 ; status: False ; type: mdmember ;
14:42:23,557 DEBUG   :                  MDRaidMember.teardown: device: /dev/vda2 ; status: False ; type: mdmember ;
14:42:23,560 DEBUG   :                  PartitionDevice.teardown: vda2 ; status: True ;
14:42:23,569 DEBUG   :                   MDRaidMember.teardown: device: /dev/vda2 ; status: False ; type: mdmember ;
14:42:23,577 DEBUG   :                   MDRaidMember.teardown: device: /dev/vda2 ; status: False ; type: mdmember ;
14:47:25,129 DEBUG   :                    DiskDevice.teardown: vda ; status: True ;
14:47:25,159 DEBUG   :                     DiskLabel.teardown: device: /dev/vda ; status: False ; type: disklabel ;
14:47:25,190 DEBUG   :                     DiskLabel.teardown: device: /dev/vda ; status: False ; type: disklabel ;
14:52:28,039 DEBUG   :                  Ext4FS.supported: supported: True ;
14:52:28,188 DEBUG   :              PartitionDevice.setup: vda1 ; status: True ; orig: False ;
14:52:28,202 DEBUG   :               PartitionDevice.setupParents: kids: 0 ; name: vda1 ; orig: False ;
14:52:28,214 DEBUG   :                DiskDevice.setup: vda ; status: True ; orig: False ;
14:52:28,235 DEBUG   :               DiskLabel.setup: device: /dev/vda ; status: False ; type: disklabel ;
14:52:28,249 DEBUG   :                DiskLabel.setup: device: /dev/vda ; status: False ; type: disklabel ;
14:52:28,276 INFO    : set SELinux context for mountpoint /mnt/sysimage to None
14:52:28,439 INFO    : set SELinux context for newly mounted filesystem root at /mnt/sysimage to None
14:52:28,496 DEBUG   :              PartitionDevice.teardown: vda1 ; status: True ;
14:52:28,629 DEBUG   :               PartitionDevice.teardown: vda1 ; status: True ;
14:57:31,356 DEBUG   :                 DiskDevice.teardown: vda ; status: True ;
14:57:31,382 DEBUG   :                  DiskLabel.teardown: device: /dev/vda ; status: False ; type: disklabel ;
14:57:31,397 DEBUG   :                  DiskLabel.teardown: device: /dev/vda ; status: False ; type: disklabel ;

Comment 2 RHEL Program Management 2010-05-13 15:19:49 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Hans de Goede 2010-05-14 07:58:53 UTC
Please reproduce and attach complete logs.

Comment 4 Alexander Todorov 2010-05-14 08:18:05 UTC
Created attachment 413966 [details]
logs from the system

Oops, thanks for reminding me to attach those.

Comment 5 Hans de Goede 2010-05-14 08:48:51 UTC
Looking at the logs it is stuck for 5 minutes in an udev_settle call, which is likely caused by some mdadm udev rules taking ages in this scenario, moving this over to mdadm.

Comment 6 Doug Ledford 2010-05-26 00:01:06 UTC
OK, there were multiple questions brought up in the original bug, so first to address the "why is this taking so long question" (which is the only real bug present, more on that later):

This happens when anaconda attempts to tear down the raid devices.  I don't think it's a udev rule at all as far as I can tell.  It seems like it's a kernel issue instead.  However, debugging the issue is slow because the installer environment contains so few things that would be useful in figuring the issue out (like strace for example).

Anyway, here's what I know so far, and I'm detailing it here in case any of this rings a bell to Hans because if it does then he might know something I don't and be able to short cut the time to solve this issue:

Anaconda brings up the md devices to inspect them, then tries to stop them before proceeding
When anaconda calls mdadm to stop the device, the device is stopped immediately
Anaconda then calls udevadm settle, this then waits 5 minutes because things never actually settle
Immediately after the md device is stopped, a continuous loop of kernel add/remove actions and subsequent udev add/remove events starts happening
If you do ls /sys/block over and over again, you will see devices popping into and out of existence repeatedly, so unless udev has the ability to force the kernel to actually create kernel structures and devices, I don't think this can be attributed to udev rules, instead I think udev is simply trying to process the endless stream of kernel events and so we see udev looping forever and blame udev when in fact the kernel is driving this issue
The installed system kernel does not exhibit this behaviour

That's my analysis so far, I'll continue working on it.

As for "why do the raid devices show up when not all member disks are present?"  Each device in an array contains a complete superblock that details the array.  Anaconda simply notices that the device has a superblock and pulls the raid array info from that superblock.  There is no attempt to verify that there are sufficient members present for the array to be successfully started before the array info is added into the list of devices to install on.

As for "md0 and md1 seem to be the wrong order" that's because they *are* in the wrong order.  The arrays are added to the device list when we first find a device with a superblock.  Which device we find first (/dev/vda2 or /dev/vda3) is random because we find them in the order that udev processes them and udev can process the partitions on a drive in any order.  According to the storage.log, /dev/vda3 was found first, and according to the blkid information it has a raid superblock that clearly indicates that it *wants* the name /dev/md1, but because it was found first, anaconda blindly renamed it to /dev/md0.  And when /dev/vda2 was processed, the logs show it has a superblock that clearly indicated it wanted the name /dev/md0, but because it was already taken, it was blindly renamed to /dev/md1.

I consider this a *SERIOUS* shortcoming in how anaconda processes md raid devices, but I was told that changing anaconda so that it didn't renumber arrays in the order that it finds them was too drastic of a change for f13/rhel6 and it would have to wait for f14 (or later) and rhel7.  All I know is that since we switched to letting udev find devices for us, and we can no longer count on devices being found "in order", the decision to let anaconda renumber raid devices like this is a recipe for disastrous data loss.  Somewhere, someone is going to blindly think they are preserving their /home filesystem when they select /dev/md0 for upgrade and leave /dev/md1 unformatted and then we are going to wipe out everything they care about.  Personally, I think this is important enough to either A) open a new bug about this behaviour and make it an RC blocker or B) disable upgrades on md raid arrays entirely and instead require that if any md raid arrays exist and are to be preserved, then the install can only happen on newly created partitions, in that way we can tell users that they can boot up with the DVD in rescue mode and hand select their raid arrays they want to upgrade on and hand zero their superblocks, that way they won't be in any md device listings and their space will be free, that way it will be impossible for anaconda to confuse the user by putting them in the wrong order and allow the user to do the install simply by reusing the now empty partitions for a brand new md raid array.

Comment 7 Doug Ledford 2010-05-26 00:05:05 UTC
*** Bug 589981 has been marked as a duplicate of this bug. ***

Comment 8 Alexander Todorov 2010-05-26 12:26:56 UTC
Hi,
filed bug #596224 (Fedora) and bug #596227 (RHEL) to track the fact that anaconda swaps the names of md devices.

Comment 10 Doug Ledford 2010-05-27 20:21:47 UTC
The 60 md raid patches in the -30 kernel did not resolve this issue.  However I have made progress on identifying the issue.  Basically it boils down to the fact that, on device shutdown, we get a remove event in the kernel followed by a remove event in udev.  I believe that udev is then opening the device file before the remove event is fully processed, causing the kernel to readd the device (very stupid of the kernel to allow this to happen).  My supporting evidence is that if I kill udev, the infinite loop of kernel events stops.  Furthermore, even though the md device is gone from /sys/block, if I create a device node for the md device and attempt to open it, the md device magically reappears in /sys/block although it does not reappear in /proc/mdstat.  So, I'll be investigating why this happens at the kernel level, but I think it's safe to say that since this doesn't happen on an installed system, I think there is a difference in the udev rules that causes udev to attempt to open a device file on removal and that combined with the kernel readding the device to the system on open() is the source of the issue.

Comment 11 Doug Ledford 2010-05-27 21:24:02 UTC
OK, for testing purposes this is what I have set up:

2 drive imsm raid array on /dev/sdc and /dev/sdd
1 raid1 array inside the imsm container
3 partitions on raid1 array that are also visible as /dev/sd[cd]{1,2,3} when container is not running, and I have three md native metadata arrays on those three partitions

I have found that this problem has both a 100% reproducible trigger, and a 100% reproducible workaround.  And this points the bug at something other than mdadm and probably other than the kernel too.

Reproducer:

Boot system into rescue mode from DVD
Start ethernet interface
Skip detection of system
Start shell
mdadm -I /dev/sdc (first imsm drive)
mdadm -I /dev/sdd (second drive, array started, subarray started as well)
mdadm -S /dev/md126 (subarray) <- this triggers infinite loop

alternatively, after starting the shell I can do this:

mdadm -I /dev/sdc1 (native md raid1 array first drive)
mdadm -I /dec/sdd1 (second native drive, array started)
mdadm -S /dev/md/test <- this triggers infinite loop

So while I originally thought this was related to imsm arrays, it turns out that it can be triggered handily on native metadata arrays as well.  In both scenarios above we end up with an infinite loop of udev commands.  Now, as it turns out, if you follow the above procedures, but insert the following steps between starting the shell and the mdadm commands, the problem disappears 100% of the time:

kill 48 (the udevd --daemon process)
udevd --daemon

So, if we restart udevd, problem goes away.  Harald?

Comment 12 Doug Ledford 2010-05-27 21:26:18 UTC
Harald: ping re previous comment that you didn't get mail about

Comment 13 Harald Hoyer 2010-05-28 07:07:22 UTC
(In reply to comment #12)
> Harald: ping re previous comment that you didn't get mail about    

udevadm control --log-priority=debug

then trigger the remove and have a look, which rule acts on ACTION="remove" and opens the device.

Comment 15 Harald Hoyer 2010-05-28 16:05:43 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > Harald: ping re previous comment that you didn't get mail about    
> 
> udevadm control --log-priority=debug
> 
> then trigger the remove and have a look, which rule acts on ACTION="remove" and
> opens the device.    

log is in /var/log/messages

Comment 16 Doug Ledford 2010-05-29 19:06:52 UTC
Created attachment 417911 [details]
syslog from a failed run

Harald, please look these logs over.  I've attempted to see if I could find anything suspicious and I was unable to do so.  At line 32,505 and time index 12:22:32 is the last of the actions resulting from the assembly of the raid array (mdadm -I /dev/sdc; mdadm -I /dev/sdd, where those two commands resulted in /dev/md127 being started which was an imsm container, and also /dev/md126 being started which was the raid1 array that container owned, and also in /dev/md126p{1,2,3} being created).  At line 32,506 and time index 12:25:25 starts the infinite loop that was caused by the command mdadm -S /dev/md126.  I eventually killed udev to stop the loop.

Hans, if you read these logs closely, you'll see that 64-md-raid.rules and 70-anaconda.rules actually duplicate lots of stuff, so many things that really only need to be done once are being done twice.  The one exception to that is that 64-md-raid.rules has the location of blkid wrong and so fails to run it, while 70-anaconda.rules has it right.  If you could drop the other md raid stuff and just keep the blkid part of the rules it would avoid this duplication.

Note: this file compressed extremely well, it's 2MB compressed, over 120MB uncompressed.

Comment 17 Doug Ledford 2010-05-29 19:10:19 UTC
Created attachment 417913 [details]
syslog from a successful run

This is the same series of commands (mdadm -I /dev/sdc; mdadm -I /dev/sdd; mdadm -S /dev/md126), but this is done after I have killed and restarted udevd and the infinite loop does not happen in this case.  Line 29102, time index 13:36:30 represents the start of the mdadm -S /dev/md126 actions.

Comment 18 Phil Knirsch 2010-06-01 15:57:14 UTC
As harald is on PTO this week i've taken a look myself here through the logs.

As Doug already mentions, having the md related rules basically duplicated in 64-md and 70-anaconda kinda scares me, though i can't see how that could trigger this loop. Just to be save though we probably should strip the 70-anaconda rule to only contain stuff that isn't in the 64-md rules already.

Getting the proper blkid location into 64-md would need a small patch but should be doable. On the other hand it gets called at a different place in 70-anaconda, a lot earlier than in 64-md. But again, i fail to see why that should trigger the loop.

Regarding the remove action this is the first one i see for it:

<27>May 29 12:25:25 udevd-work[693]: PROGRAM '/usr/sbin/multipath -c /dev/md126' /lib/udev/rules.d/40-multipath.rules:10

<27>May 29 12:25:25 udevd-work[693]: '/usr/sbin/multipath -c /dev/md126' started

<27>May 29 12:25:25 udevd-work[693]: arg[0] '/usr/sbin/multipath'

<27>May 29 12:25:25 udevd-work[693]: arg[1] '-c'

<27>May 29 12:25:25 udevd-work[693]: arg[2] '/dev/md126'

<27>May 29 12:25:25 udevd-work[693]: '/usr/sbin/multipath' (stdout) 'May 29 12:25:25 | /etc/multipath.conf does not exist, blacklisting all devices.'

<27>May 29 12:25:25 udevd-work[693]: '/usr/sbin/multipath' (stdout) 'May 29 12:25:25 | A sample multipath.conf file is located at'

<27>May 29 12:25:25 udevd-work[693]: '/usr/sbin/multipath' (stdout) 'May 29 12:25:25 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf'

<27>May 29 12:25:25 udevd-work[693]: '/usr/sbin/multipath' (stdout) 'May 29 12:25:25 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf'

<27>May 29 12:25:25 udevd-work[693]: '/usr/sbin/multipath' (stdout) '/dev/md126 is not a valid multipath device path'

<27>May 29 12:25:25 udevd-work[693]: result='May 29 12:25:25 | /etc/multipath.conf does not exist, blacklisting all devices.
May 29 12:25:25 | A sample multipath.conf file is located at
May 29 12:25:25 | /u<27>May 29 12:25:25 udevd[48]: 'UDEV_LOG=7' added

<27>May 29 12:25:25 udevd[48]: 'ANACONDA=1' added

<27>May 29 12:25:25 udevd[48]: udev_device: 0x1438160 created

<27>May 29 12:25:25 udevd[48]: 'ACTION=remove' added

<27>May 29 12:25:25 udevd[48]: 'DEVPATH=/devices/virtual/block/md126/md126p3' added

<27>May 29 12:25:25 udevd[48]: 'SUBSYSTEM=block' added

<27>May 29 12:25:25 udevd[48]: 'DEVNAME=md126p3' added

<27>May 29 12:25:25 udevd[48]: 'DEVTYPE=partition' added

<27>May 29 12:25:25 udevd[48]: 'SEQNUM=1571' added

<27>May 29 12:25:25 udevd[48]: 'MAJOR=259' added

sr/share/doc/device-mapper-multipath-0.4.9/multipath.conf
May 29 12:25:25 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf
/dev/md126 is not a valid multipath device path
'

Doing an:

egrep "(70-anaconda|64-md)" 

on both syslog files shows an interesting difference between the infinite loop and the working syslog:

not working:


<27>May 29 12:25:55 udevd-work[738]: * RULE /lib/udev/rules.d/64-md-raid.rules:3, token: 1252, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[738]: * RULE /lib/udev/rules.d/64-md-raid.rules:36, token: 1321, count: 1, label: 'md_end', flags: 0x00
<27>May 29 12:25:55 udevd-work[738]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[738]: * RULE /lib/udev/rules.d/70-anaconda.rules:4, token: 1436, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[738]: * RULE /lib/udev/rules.d/70-anaconda.rules:5, token: 1439, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[738]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/64-md-raid.rules:3, token: 1252, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/64-md-raid.rules:4, token: 1255, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/64-md-raid.rules:36, token: 1321, count: 1, label: 'md_end', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:4, token: 1436, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 12:25:55 udevd-work[738]: * RULE /lib/udev/rules.d/64-md-raid.rules:3, token: 1252, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[738]: * RULE /lib/udev/rules.d/64-md-raid.rules:36, token: 1321, count: 1, label: 'md_end', flags: 0x00
<27>May 29 12:25:55 udevd-work[738]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[738]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/64-md-raid.rules:3, token: 1252, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/64-md-raid.rules:4, token: 1255, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/64-md-raid.rules:9, token: 1258, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/64-md-raid.rules:12, token: 1261, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/64-md-raid.rules:16, token: 1264, count: 4, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/64-md-raid.rules:17, token: 1268, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/64-md-raid.rules:36, token: 1321, count: 1, label: 'md_end', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:4, token: 1436, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:5, token: 1439, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:8, token: 1442, count: 4, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:10, token: 1446, count: 2, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:11, token: 1448, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:13, token: 1451, count: 4, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:14, token: 1455, count: 4, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:15, token: 1459, count: 4, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:16, token: 1463, count: 4, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:18, token: 1467, count: 2, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: IMPORT '/usr/sbin/blkid -o udev -p /dev/.tmp-block-9:126' /lib/udev/rules.d/70-anaconda.rules:18
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:20, token: 1469, count: 1, label: 'anaconda_mdraid', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:21, token: 1470, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:25, token: 1473, count: 4, label: 'lock'(plain)
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:26, token: 1477, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:44, token: 1528, count: 1, label: 'anaconda_mdraid_member', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:46, token: 1529, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:47, token: 1532, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:50, token: 1535, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:51, token: 1538, count: 3, label: '', flags: 0x00
<27>May 29 12:25:55 udevd-work[693]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00


working:

<27>May 29 13:36:30 udevd-work[720]: * RULE /lib/udev/rules.d/64-md-raid.rules:3, token: 1252, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[720]: * RULE /lib/udev/rules.d/64-md-raid.rules:36, token: 1321, count: 1, label: 'md_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[720]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[720]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/64-md-raid.rules:3, token: 1252, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/64-md-raid.rules:4, token: 1255, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/64-md-raid.rules:36, token: 1321, count: 1, label: 'md_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/64-md-raid.rules:3, token: 1252, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/64-md-raid.rules:4, token: 1255, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/64-md-raid.rules:9, token: 1258, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/64-md-raid.rules:12, token: 1261, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/64-md-raid.rules:16, token: 1264, count: 4, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/64-md-raid.rules:17, token: 1268, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/64-md-raid.rules:36, token: 1321, count: 1, label: 'md_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00


Basically, in the inifinite loop variant we get the whole 70-anaconda ruleset executed whereas in the working variant only very few 70-anaconda rules are being triggered. Here a grep over the 70-anaconda rules for the working version only:

<27>May 29 13:36:24 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: el: '', flags: 0x00
<27>May 29 13:36:24 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:24 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:24 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:24 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:24 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:25 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:25 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:25 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:25 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:25 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:25 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:25 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:25 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:25 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:25 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:25 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:25 udevd-work[720]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:25 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:25 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:25 udevd-work[720]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:25 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:25 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:25 udevd-work[689]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[720]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[720]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
May 29 13:36:30 | A0]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[720]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[720]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[744]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[744]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[745]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, toue)
<27>May 29 13:36:30 udevd-work[745]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[720]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[720]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:30 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:32 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:32 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:32 udevd-work[720]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:32 udevd-work[720]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:32 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:32 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00
<27>May 29 13:36:32 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:2, token: 1433, count: 3, label: '', flags: 0x00
<27>May 29 13:36:32 udevd-work[690]: * RULE /lib/udev/rules.d/70-anaconda.rules:53, token: 1541, count: 1, label: 'anaconda_end', flags: 0x00


At no point is any rule ever really being executed inside there, only rule 2 which is the check for the anaconda enviroment and then obviously rule 53 which is the LABEL="anaconda_end" where rule 2 jumps to in case it's not anaconda. 

What this all tells me that for some odd reason we seem to get a race between the 64-md rules and the 70-anaconda rules.


So the solutions i see are:

1) Trim the 70-anaconda rules to only include what isn't in 64-md (bascially only the dm-stuff)
2) Delete 64-md rules from the anaconda image as they are replicated in 70-anaconda anyway

I'd personally vote for 1) as thats we won't have 70-anaconda later in the runtime environment anyway, so the less stuff we duplicate there the better.

Sounds like a plan?

Thanks & regards, Phil

Comment 20 Doug Ledford 2010-06-01 21:51:24 UTC
I vote for #1 too, and in support of that I'm going to build a new mdadm that has a functioning rules file for both regular installs and regular running mode (the brokenness now is that we specify /sbin/blkid when that is correct on a running system and incorrect on an install image, don't let me get started on how stupid it is that blkid is in /sbin on a running system and /usr/sbin on an install image).  I'll switch it to just use blkid and assume that blkid is in the path of a udev daemon (which it should be).  In that way anaconda should be able to rip all of it's mdadm specific rules out.

Comment 21 Doug Ledford 2010-06-01 21:53:39 UTC
I take that back.  In F14/Fedora-devel mdadm supplies the 64-md-raid.rules in F13/RHEL6 it is provided by udev.  So, Harald will have to build a new udev package with a new 64-md-raid.rules file that fixes the problem that I speak of in combination with a new anaconda that strips the md stuff out of 70-anaconda.rules.

Comment 22 Phil Knirsch 2010-06-02 10:24:24 UTC
Alright, built udev-147-2.16.el6 with a fixed 64-md-raid.rules file that checks for mdadm and blkid binary locations.

I'll attach the patch for the 64-md-raid.rules and a stab at a patch for the stripped down 70-anaconda.rules file from anaconda. Hans, if you could review that and include it in the next anaconda build that would be great.

Thanks & regards, Phil

Comment 23 Phil Knirsch 2010-06-02 10:25:45 UTC
Created attachment 418979 [details]
Patch to check for location of mdadm and blkid binaries (needed for anaconda install enviroment)

Comment 24 Phil Knirsch 2010-06-02 10:26:34 UTC
Created attachment 418981 [details]
Patch to remove duplicate md handling code from 70-anaconda.rules udev rule file

Comment 27 Jan Ščotka 2010-06-10 14:29:54 UTC
Created attachment 422930 [details]
still waiting

Hi,
I've used http://download.englab.brq.redhat.com/pub/rhel/nightly/RHEL6.0-20100610.n.0/6.0/Server/x86_64/os/Packages/
where are:
udev-147-2.17.el6.x86_64.rpm
anaconda-13.21.50-4.el6.x86_64.rpm
and problem still persist

I've used simplier way:
creating md:
# dd if=/dev/zero of=disk0.img bs=1M count=1 seek=10000
# dd if=/dev/zero of=disk1.img bs=1M count=1 seek=10000
# dd if=/dev/zero of=disk2.img bs=1M count=1 seek=10000
# losetup /dev/loop0 disk0.img
# losetup /dev/loop1 disk1.img
# losetup /dev/loop2 disk2.img
# mdadm --create /dev/md0 --chunk=64 --level=raid5 --raid-devices=3 /dev/loop0 /dev/loop1 /dev/loop2 -R
# losetup -d /dev/loop0
# losetup -d /dev/loop1
# losetup -d /dev/loop2

then via virt manager created new machine where disk0.img is used as storage device and it hangup, and dmesg in second tesrminal show same bug as reported:
attached image shows it

Comment 28 Jan Ščotka 2010-06-10 14:45:38 UTC
due to comment #27

Comment 29 Phil Knirsch 2010-06-10 16:48:10 UTC
Alright, so harald and me went on a debug spree and this is what we found:

I had a very similar setup to Jan's from comment #27 and using that tree we would actually hit the infinite loop once more that we had hoped to get fixed via the cleanup of the 70-anaconda.rules.

As it turned out the initial cleanup removed the blkid call from the 70-anaconda.rules, but that resulted in all existing partitions showing up as unknown when selecting "Create Custom Layout" in the partitioning pre-selection screen.

What we did now is to try various scenarios and cases with blkid in the proper spot, with and without 70-anaconda.rules and with 70-anaconda.rules and blkid removed again.

The results were as follows:

Default case; 70-anaconda.rules and /usr/sbin/blkid only): Infinite loop
Removed 70-anaconda.rules and /usr/sbin/blkid: No loop, but all partitions unknown
70-anaconda.rules unmodified and /sbin/blkid: Infinite loop
70-anaconda.rules removed and /sbin/blkid: No loop, proper partitions but md0 wasn't listed
70-anaconda.rules removed blkid call and /sbin/blkid: No loop, proper partitions and md0 and md/0 now listed

I've attached the storage.log and a screenshot of the partition layout to the bz as it's a bit odd.

Whats also strange is that with the 70-anaconda.rules completely removed no md device was detected despite the 64-md-raid.rules existing.

Maybe Hans can make some odds and ends out of this.

Overall, we basically need 2 changes now:

1) blkid needs to be symlinked to /sbin/blkid
2) the blkid call from 70-anaconda.rules needs to be dropped again, basically going back to my original patch.

We still need to figure out why we get those 2 md devices listed then though as that looks kinda odd.

Thanks & regards, Phil

Comment 30 Phil Knirsch 2010-06-10 16:49:12 UTC
Created attachment 422974 [details]
Screenshot of partitions in anaconda

Comment 31 Phil Knirsch 2010-06-10 16:49:37 UTC
Created attachment 422976 [details]
Anaconda's storage.log for this run

Comment 32 Hans de Goede 2010-06-10 18:56:57 UTC
The md device being listed twice is an anaconda bug, here is a patch for it
with explanation:
https://www.redhat.com/archives/anaconda-devel-list/2010-June/msg00144.html

And here is an updates.img for those who want to give it a try:
http://people.fedoraproject.org/~jwrdegoede/updates-591970.img

Comment 33 Phil Knirsch 2010-06-11 09:18:38 UTC
Hi Hans.

I've just tested the updates.img and the partiton display looks correct now. Thanks!

Looking at current origin/rhel6-branch the /sbin/blkid i already seen the following:

Include /sbin/blkid in the initrd.img (ee7e222bbb87c9a906ebe2637b7d84de61c50eba)

and 70-anaconda.rules doesn't contain the blkid call anymore which means with the next build we should have everything working again.

Thanks & regards, Phil

Comment 35 Harald Hoyer 2010-06-11 13:02:08 UTC
reassigning to anaconda ..

Comment 37 Martin Banas 2010-06-22 13:43:36 UTC
Fixed on PPC64, RHEL6.0-20100621.0, anaconda-13.21.50-9.el6

Comment 38 Pavel Holica 2010-06-25 10:22:39 UTC
Fixed on x86_64, RHEL6.0-20100622.1 Server, anaconda-13.21.50-9.el6.x86_64

Comment 39 Alexander Todorov 2010-06-25 12:11:54 UTC
Tested on x86_64, RHEL6.0-20100622.1 Server with the instructions from comment #0. Anaconda didn't wait 5 minutes before showing available storage devices. Based on this and comment #37 and comment #38 I'm moving this bug to VERIFIED.

Comment 40 releng-rhel@redhat.com 2010-07-02 20:49:24 UTC
Red Hat Enterprise Linux Beta 2 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.