Bug 671913 - yum update yesterday creates dmraid related problem that prevents boot
Summary: yum update yesterday creates dmraid related problem that prevents boot
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: dmraid
Version: 5.8
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-22 21:27 UTC by Kevin Hendricks
Modified: 2019-02-25 17:21 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-24 15:19:45 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Kevin Hendricks 2011-01-22 21:27:38 UTC
Description of problem:

After yum updating yesterday, my el5 machine will no longer boot.  The error message is:

ERROR: sil: wrong # of devices in RAID set "sil_agajebcbafbd" [1/2] on /dev/sdb

Did a web-search and found a similar problem with umbuntu in their bug database and they tracked it back to a recent changes to dmraid.

https://bugs.launchpad.net/ubuntu/+source/dmraid/+bug/292302

I can now only boot successfully by adding nodmraid to the kernel boot parameters.  

This problem may be in the rc.d init scripts or rc.sysinit scripts that invoke dmraid, I tried downgrading dmraid and dmraid-events to no avail.  Tried booting with older kernels and no help that way either.  Something changed in how dmraid is invoked in the boot sequence which prevents my machine from booting successfully.  The /dev/sdb device in question passes all fdisk partition and fsck checks with flying colors when nodmraid is passed to the kernel.   

Without adding nodmraid, no partitions on /dev/sdb can be mounted at all (claims they are busy) and therefore fsck will not run either.  I can not manually mount them (always get back the busy message).

Nasty bugger of a change. 

Version-Release number of selected component (if applicable):
el5 yum updated yesterday

How reproducible:
Problem appears immediately after update, adding nodmraid to the kernel boot parameters works around the issue

Steps to Reproduce:
boot the machine

1.
2.
3.
  
Actual results:
See above

Expected results:
successful boot



Additional info:

Comment 1 Kevin Hendricks 2011-01-22 22:03:50 UTC
Hi,

If it helps, here is what running dmraid -rD -d -vvv shows *after* booting with kernel parameter nodmraid:


WARN: locking /var/lock/dmraid/.lock
NOTICE: skipping removable device /dev/hdc
NOTICE: /dev/sda: asr     discovering
NOTICE: /dev/sda: ddf1    discovering
NOTICE: /dev/sda: hpt37x  discovering
NOTICE: /dev/sda: hpt45x  discovering
NOTICE: /dev/sda: isw     discovering
NOTICE: /dev/sda: jmicron discovering
NOTICE: /dev/sda: lsi     discovering
NOTICE: /dev/sda: nvidia  discovering
NOTICE: /dev/sda: pdc     discovering
NOTICE: /dev/sda: sil     discovering
NOTICE: /dev/sda: via     discovering
NOTICE: /dev/sdb: asr     discovering
NOTICE: /dev/sdb: ddf1    discovering
NOTICE: /dev/sdb: hpt37x  discovering
NOTICE: /dev/sdb: hpt45x  discovering
NOTICE: /dev/sdb: isw     discovering
NOTICE: /dev/sdb: jmicron discovering
NOTICE: /dev/sdb: lsi     discovering
NOTICE: /dev/sdb: nvidia  discovering
NOTICE: /dev/sdb: pdc     discovering
NOTICE: /dev/sdb: sil     discovering
NOTICE: sil: areas 1,2,3,4[4] are valid
NOTICE: writing metadata file "sdb_0.dat"
NOTICE: writing offset to file "sdb_0.offset"
NOTICE: writing metadata file "sdb_1.dat"
NOTICE: writing offset to file "sdb_1.offset"
NOTICE: writing metadata file "sdb_2.dat"
NOTICE: writing offset to file "sdb_2.offset"
NOTICE: writing metadata file "sdb_3.dat"
NOTICE: writing offset to file "sdb_3.offset"
NOTICE: writing size to file "sdb.size"
NOTICE: /dev/sdb: sil metadata discovered
NOTICE: /dev/sdb: via     discovering
INFO: RAID device discovered:

/dev/sdb: sil, "sil_agajebcbafbd", mirror, ok, 625140400 sectors, data@ 0
WARN: unlocking /var/lock/dmraid/.lock

If you need or want any of the metadata files generated above, just let me know and I would be happy to tgz them up and post them.

I would also be happy to run any diagnostic commands that you think might help find out what changed and why I can no longer boot successfully without the nodmraid kernel option.

Thanks,

Kevin

Comment 2 Heinz Mauelshagen 2011-01-24 10:59:19 UTC
Kevin,

this looks like some remnant SoftRAID metadata being discovered in early boot due to initrd changes applied by the update which hasn't been discovered before.

Or did you have an operational SoftRAID mirror on /dev/sdb and some other device before?

In case the former applies, remove the Silicon Image metadata from sdb with "dmraid -rE /dev/sdb", thus bein able to avoid the nodmraid kernel command line argument again.

I don't assume the latter applies but please come back if so.

Comment 3 Kevin Hendricks 2011-01-24 14:57:23 UTC
Hi Heinz,

Yes, I believe it was the former although it has been so long ago that I don't really remember if I first set up the disks under nvraid or not.  I must have.   Needless to say I had disabled all raid in the bios long ago (needed the extra disk space) so I should have known something was wrong.  The first boot on this machine was with pre Fedora 5 over 6 years ago.

I removed the metadata as you suggested and it booted just fine.  So put this one down to user error.  Scared me, as this machine has been a rock.

Thanks,

Kevin

Comment 4 Heinz Mauelshagen 2011-01-24 15:19:45 UTC
Kevin,

happy to hear this fixed the situation for you.

dmraid is supposed to discover any RAID devices it supports in early boot now so this is not a bug. You can work around related issues where RAID devices are being recognized you don't want to be recognizred either way
(nodmraid|erase metadata).


Note You need to log in before you can comment on or make changes to this bug.