Bug 491155

Summary: Failed to start array (RAID0) with root fs
Product: [Fedora] Fedora Reporter: Doug Ledford <dledford>
Component: udevAssignee: Harald Hoyer <harald>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: high    
Version: 11CC: bruno, dledford, harald, jarmstrong, jarod, kay.sievers, nicolas.mailhot, notting
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 490972 Environment:
Last Closed: 2009-06-30 10:18:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 490972    
Bug Blocks:    

Description Doug Ledford 2009-03-19 15:58:39 UTC
The 64-md-raid.rules file is set to trigger on either add or change events.  This makes it impossible to do incremental assembly of non root raid devices during rc.sysinit bringup.  We need to call udevadm settle, when we do so, there are so many change events on the already assembled root raid array that it completely stuffs up the udev queue and causes it to timeout.  By changing it to add only, no timeouts occur.

In addition, watching change events is problematic at other times too.  For instance, when I plug in a hot plug raid device, the devices are automatically assembled.  If I then stop those raid devices and use fdisk to repartition my raid drives, then the attempt to write the partition table causes a change event *before* the kernel re-reads the partition table, this causes udev to regrab the devices and attempt to restart the *old* arrays based upon the non-updated partition table, they get started, and then the reread of the partition table fails.

So, watching change events in the md-raid rules file == bad.  This needs corrected in order for the other changes necessary to get incremental assembly working can be effective.


+++ This bug was initially created as a clone of Bug #490972 +++

+++ This bug was initially created as a clone of Bug #488038 +++

Description of problem:
System fails to boot because it couldn't assemble RAID0 array with root / filesystem. Looks like kernel with older mdadm in initrd successfully assembles all arrays (so I have root accessible at least in ro mode) but then mdadm spawned from initscripts breaks everything.


Version-Release number of selected component (if applicable):
mdadm-3.0-0.devel2.2.fc11
older version mdadm-3.0-0.devel2.1.fc11 is fine


How reproducible:
always after boot

Steps to Reproduce:
1. Boot the system (root=/dev/md1)
2.
3.
  
Actual results:
....kernel messages:
mdadm: /dev/md1 has been started with 2 drives.
....initscripts messages:
Setting hostname localhost.localdomain:   [ OK ]
mdadm: /dev/md0 is already in use
mdadm: /dev/md3 is already in use
mdadm: failed to RUN_ARRAY /dev/md/3_0: Cannot allocate memory
mdadm: Not enough devices to start the array.
mdadm: /dev/md/0_0 has been started with 1 drive (out of 2)

... boot then fails on fsck unable to access the filesystem.


Additional info:

/dev/mdadm.conf:
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=0.90 UUID=0c2a49fd:ba124f6d:e0634eb5:9e0e1855
ARRAY /dev/md1 level=raid0 num-devices=2 metadata=0.90 UUID=2d64fe1d:e87e3bfe:18f25720:de7af605
ARRAY /dev/md3 level=raid0 num-devices=2 metadata=0.90 UUID=f8b1e8e6:7a83767a:00b7a568:dbd45bb9


dmesg:
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
md: md1 stopped.
md: bind<sdb2>
md: bind<sda2>
md1: setting max_sectors to 128, segment boundary to 32767
raid0: looking at sda2
raid0:   comparing sda2(7678976) with sda2(7678976)
raid0:   END
raid0:   ==> UNIQUE
raid0: 1 zones
raid0: looking at sdb2
raid0:   comparing sdb2(7678976) with sda2(7678976)
raid0:   EQUAL
raid0: FINAL 1 zones
raid0: done.
raid0 : md_size is 15357952 blocks.
raid0 : conf->hash_spacing is 15357952 blocks.
raid0 : nb_zone is 1.
raid0 : Allocating 8 bytes for hash.
 md1: unknown partition table

--- Additional comment from dledford on 2009-03-18 13:09:30 EDT ---

This bug has been identified and needs a change to initscripts in order to be solved properly.

Specifically, in rc.sysinit, there is this line:

# Start any MD RAID arrays that haven't been started yet
[ -f /etc/mdadm.conf -a -x /sbin/mdadm ] && /sbin/mdadm -As --auto=yes --run

This needs to be changed as follows:

# Wait for local RAID arrays to finish incremental assembly before continuing
udevsettle

It turns out that the original line races with udev's attempts to perform incremental assembly on the array.  In the end, udev ends up grabbing some devices and sticking them in a partially assembled array, and the call to mdadm grabs some other devices and sticks them in a *different* array, and neither array gets started properly.  With this change, the udev incremental assembly rules work as expected.

Changing to initscripts package.

--- Additional comment from dledford on 2009-03-18 13:10:43 EDT ---

*** Bug 487965 has been marked as a duplicate of this bug. ***

--- Additional comment from jwilson on 2009-03-18 13:37:14 EDT ---

Hrm. So things are mildly better w/the change prescribed in comment #1 on one of my affected systems. Instead of getting at least two different arrays created for what is supposed to be my /boot volume, I get only /dev/md0, but it contains only a single member.

--- Additional comment from jwilson on 2009-03-18 13:49:31 EDT ---

Also, this change results in the following spew:

the program '/bin/bash' called 'udevsettle', it should use udevadm settle <options>', this will stop working in a future release
udevadm[2036]: the program '/bin/bash' called 'udevsettle', it should use 'udevadm settle <options>', this will stop working in a future release

Even after changing over to 'udevadm settle --timeout=30' and adding a 'sleep 5' after that, I'm still only getting a single drive added to /dev/md0 every time.

--- Additional comment from dledford on 2009-03-18 13:54:04 EDT ---

What version of mdadm are you using?  I tested this with mdadm-3.0-0.devel3.1.fc11 which is not yet in rawhide, only locally built, and with that version it worked fine.  As far as the udevsettle versus udevadm settle, that would be because I'm testing this on an F9 machine with older udev, so it would need to be changed for the later versions of udev in rawhide.  However, no timeout nor any sleeps are necessary for me with the current mdadm (which also includes an updated mdadm rules file that could certainly play a role in what you are seeing).  My impression is that to fully solve the problem, you really need both udpates, but a bug can only be against one component at a time.  I'll clone for the mdadm half of the issue.

--- Additional comment from dledford on 2009-03-18 14:31:18 EDT ---

mdadm-3.0-0.devel3.1.fc11 has been built to address this issue.  Note it still needs the initscript update to be expected to work.

--- Additional comment from jwilson on 2009-03-18 14:44:57 EDT ---

So with the initscript update hand-made and mdadm updated to 3.0-0.devel3.1.fc11, I'm still only getting one of four disks added to md0 (my /boot array).


# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : inactive sdc1[2](S)
      200704 blocks
       
md1 : active raid6 sda3[0] sdd3[3] sdc3[2] sdb3[1]
      307981824 blocks level 6, 256k chunk, algorithm 2 [4/4] [UUUU]
      
unused devices: <none>



#cat /etc/mdadm.conf

# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root

ARRAY /dev/md1 level=raid6 num-devices=4 metadata=0.90 UUID=368714fb:5469cef4:f60fd542:027945d8
ARRAY /dev/md0 level=raid1 num-devices=4 metadata=0.90 UUID=04aacae3:99941fc2:d486ae8d:01f4d665


# mdadm -Eb /dev/sdc1:
ARRAY /dev/md0 level=raid1 num-devices=4 UUID=04aacae3:99941fc2:d486ae8d:01f4d665


# mdadm -Eb /dev/sda1
ARRAY /dev/md0 level=raid1 num-devices=4 UUID=04aacae3:99941fc2:d486ae8d:01f4d665

--- Additional comment from jwilson on 2009-03-18 15:01:50 EDT ---

# ll /*/udev/rules.d/
/etc/udev/rules.d/:
total 80
-rw-r--r--. 1 root root   397 2009-03-06 08:09 40-multipath.rules
-rw-r--r--. 1 root root 19994 2009-02-25 11:32 60-libmtp.rules
-rw-r--r--. 1 root root  1060 2009-03-06 03:33 60-pcmcia.rules
-rw-r--r--. 1 root root  6824 2009-03-09 00:13 60-wacom.rules
-rw-r--r--. 1 root root   595 2009-03-06 17:13 70-persistent-cd.rules
-rw-r--r--. 1 root root   845 2009-03-06 17:11 70-persistent-net.rules
-rw-r--r--. 1 root root  1914 2009-03-03 17:55 85-pcscd_ccid.rules
-rw-r--r--. 1 root root   244 2009-02-25 01:54 85-pcscd_egate.rules
-rw-r--r--. 1 root root   320 2008-09-18 03:54 90-alsa.rules
-rw-r--r--. 1 root root    83 2009-03-05 20:26 90-hal.rules
-rw-r--r--. 1 root root    53 2009-02-24 11:41 91-drm-modeset.rules
-rw-r--r--. 1 root root  4216 2009-03-10 14:56 95-devkit-disks.rules
-rw-r--r--. 1 root root  2283 2009-03-09 15:37 97-bluetooth-serial.rules
-rw-r--r--. 1 root root    85 2009-03-02 16:42 98-devkit.rules

/lib/udev/rules.d/:
total 112
-rw-r--r--. 1 root root  421 2009-03-09 22:05 10-console.rules
-rw-r--r--. 1 root root  348 2009-03-03 08:17 40-alsa.rules
-rw-r--r--. 1 root root 1431 2009-03-03 08:17 40-redhat.rules
-rw-r--r--. 1 root root  172 2009-03-03 08:17 50-firmware.rules
-rw-r--r--. 1 root root 4562 2009-03-03 08:17 50-udev-default.rules
-rw-r--r--. 1 root root  141 2009-03-03 08:17 60-cdrom_id.rules
-rw-r--r--. 1 root root  283 2009-03-09 22:05 60-net.rules
-rw-r--r--. 1 root root 1538 2009-03-03 08:17 60-persistent-input.rules
-rw-r--r--. 1 root root  718 2009-03-03 08:17 60-persistent-serial.rules
-rw-r--r--. 1 root root 4441 2009-03-03 08:17 60-persistent-storage.rules
-rw-r--r--. 1 root root 1514 2009-03-03 08:17 60-persistent-storage-tape.rules
-rw-r--r--. 1 root root  711 2009-03-03 08:17 60-persistent-v4l.rules
-rw-r--r--. 1 root root 3914 2009-03-02 10:33 61-option-modem-modeswitch.rules
-rw-r--r--. 1 root root  525 2009-03-03 08:17 61-persistent-storage-edd.rules
-rw-r--r--. 1 root root  107 2009-03-03 08:17 64-device-mapper.rules
-rw-r--r--. 1 root root 1701 2009-03-18 14:29 64-md-raid.rules
-rw-r--r--. 1 root root 1218 2009-03-02 10:33 70-acl.rules
-rw-r--r--. 1 root root  390 2009-03-03 08:17 75-cd-aliases-generator.rules
-rw-r--r--. 1 root root 2403 2009-03-03 08:17 75-persistent-net-generator.rules
-rw-r--r--. 1 root root  336 2009-03-09 23:40 77-nm-probe-modem-capabilities.rules
-rw-r--r--. 1 root root 2283 2009-03-02 10:33 78-sound-card.rules
-rw-r--r--. 1 root root  137 2009-03-03 08:17 79-fstab_import.rules
-rw-r--r--. 1 root root  779 2009-03-03 08:17 80-drivers.rules
-rw-r--r--. 1 root root  221 2009-02-24 04:44 85-regulatory.rules
-rw-r--r--. 1 root root  175 2009-03-09 22:05 88-clock.rules
-rw-r--r--. 1 root root  234 2009-03-03 08:17 95-udev-late.rules

--- Additional comment from dledford on 2009-03-18 15:47:39 EDT ---

A new version of mdadm that solves the file conflict is in rawhide.

--- Additional comment from jwilson on 2009-03-19 09:33:11 EDT ---

Got the even newer mdadm and the original 64-md-raid.rules file back in place, and restarted... Now the machine is hanging at 'Starting udev: _' for a good couple of minutes before finally continuing boot. I was hoping all that delay might have meant the array got built correctly, but alas, /dev/md0 is still getting created with only a single drive in it.

--- Additional comment from jwilson on 2009-03-19 09:40:44 EDT ---

...the heck? Second try, similar issue, but I noticed a ton of spew printed to the console after the lengthy hang starting udev:

  /sys/devices/virtual/block/md0 (2417)
  /sys/devices/virtual/block/md1 (2415)
  /sys/devices/virtual/block/md0 (2413)
  /sys/devices/virtual/block/md1 (2411)
  /sys/devices/virtual/block/md0 (2409)
  /sys/devices/virtual/block/md0 (2407)
  /sys/devices/virtual/block/md1 (2405)
  /sys/devices/virtual/block/md0 (2403)
  /sys/devices/virtual/block/md0 (2401)

Comment 1 Harald Hoyer 2009-03-19 16:31:36 UTC
adding Kay Sievers to CC

Comment 2 Bug Zapper 2009-06-09 12:24:55 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 3 Harald Hoyer 2009-06-30 10:18:58 UTC
rawhide udev-143 now has /lib/udev/rules.d/65-md-incremental.rules