485337 – Fedora 10 fails to boot (nash received SIGSEGV) after disconnecting one disk of a two disk fakeraid RAID 1 [mirrored] array

Bug 485337 - Fedora 10 fails to boot (nash received SIGSEGV) after disconnecting one disk of a two disk fakeraid RAID 1 [mirrored] array

Summary: Fedora 10 fails to boot (nash received SIGSEGV) after disconnecting one disk ...

Keywords:
Status:	CLOSED DUPLICATE of bug 485882
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	dmraid
Sub Component:
Version:	10
Hardware:	i386
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	LVM and device-mapper development team
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-02-12 22:37 UTC by gregjo
Modified:	2013-01-22 20:52 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-02-17 08:24:24 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Log of mkinitrd execution. (8.27 KB, application/x-unknown) 2009-02-16 22:26 UTC, gregjo	no flags	Details
View All

Description gregjo 2009-02-12 22:37:49 UTC

Description of problem:
The system boots fine as a mirrored [RAID 1] array. If one of the two drives in the RAID is disconnected the system fails to boot. 
The log indicates
"device-mapper: reload ioctl failed: No such device or address"
"device-mapper: table ioctl failed: No such device or address"
"nash received SEGSEGV"

Version-Release number of selected component (if applicable):
% uname -r
2.6.27.12-170.2.5.fc10.i686.PAE
% rpm -q nash
nash-6.0.71-3.fc10.i386
% rpm -q dmraid
dmraid-1.0.0.rc15-2.fc10.i386
% rpm -q device-mapper
device-mapper-1.02.27-7.fc10.i386

How reproducible:
100%

Steps to Reproduce:
1. My motherboard is an EVGA X58 SLI. The motherboard supports fakeraid via JMicron JMB363. I have 2X 1Terabyte drives configured in a RAID 1 configuration. The drive is partitioned such that I can dual boot. WindowsXP is on one partition. Fedora 10 is on another partition. The system has only these two 1Terabyte drives in the box.
2. Once the OS is installed correctly, I powered down the system and disconnected one of the drives, in order to simulate a hardware failure of one of the drives in the RAID. Upon powering up, the Fedora 10 installation fails to boot.

  
Actual results:
device-mapper: reload ioctl failed: No such device or address
device-mapper: table ioctl failed: No such device or address
nash received SIGSEGV! Backtrace (15):
/bin/nash[0x8054e0d]
[0xe4240c]
/usr/lib/libnash.so.6.0.71[0xfbb3cc]
/usr/lib/libnash.so.6.0.71[nashDmDevGetName+0x5a)[0xfbc31f]
/usr/lib/libnash.so.6.0.71[0xfb861c]
/usr/lib/libnash.so.6.0.71[0xfb874d]
/usr/lib/libnash.so.6.0.71[nashBdevIterNext+0x106)[0xfb8bd9]
/usr/lib/libnash.so.6.0.71[0xfb8e78]
/usr/lib/libnash.so.6.0.71[nashFindFsByUUID+0x2e)[0xfb8efd]
/usr/lib/libnash.so.6.0.71[nashAGetPathBySpec+0x8e)[0xfb9074]
/bin/nash[0x804f6bf]
/bin/nash[0x8054c78]
/bin/nash[0x80553d7]
/lib/libc.so.6(__libc_start_main+0xe5)[0x5fa6e5]
/bin/nash[0x804b2b1]

Expected results:
The system should continue to operate correctly on the remaining drive in the RAID.

Additional info:
If I boot on the WindowsXP partition, the system runs correctly on the remaining drive in the array.

Comment 1 Hans de Goede 2009-02-12 22:55:35 UTC

This is most likely caused by the initrd doing its own dm table creation. In rawhide we no longer do that.

If you're interested in testing of the new rawhide mkinitrd indeed fixes this, try installing mkinitrd-6.0.76 or newer from rawhide and then regenerating your initrd, after this you should be able to still boot.

Atleast assuming that dmraid can activate a mirror set even if only one drive is present, Heinz ?

Comment 2 Hans de Goede 2009-02-12 23:19:07 UTC

Sorry, mkinitrd-6.0.76 does not yet have the changes I was refering too. If you want to test the mkinitrd way of handling dmraid use this mkinitrd script:
https://bugzilla.redhat.com/attachment.cgi?id=331781

Before using it make sure you have nash-6.0.71-4 installed (from updates-testing) and that you've upgraded your dmraid to this version:
http://koji.fedoraproject.org/koji/buildinfo?buildID=82481

Comment 3 gregjo 2009-02-13 16:12:15 UTC

I attempted what you suggested, however it still failed. I'm a total mkinitrd noob, so let me retrace my steps to see if I did anything wrong.

I updated nash from updates-testing:
% yum update nash --enablerepo=updates-testing
% rpm -q nash
nash-6.0.71-4.fc10.i386

I installed the dmraid rpm from koji.
% rpm -q dmraid
dmraid-1.0.0.rc15-4.fc11.i386

Then I did the following with mkinitrd. This is where it got a little fuzzy for me. I created a "custom" version of the img file:

% ./mkinitrd /boot/initrd-2.6.27.12-170.2.5.fc10.i686.PAEcustom.img 2.6.27.12-170.2.5.fc10.i686.PAE

Then I modified my /boot/grub/grub.conf to point to the "custom" initrd img file

Once again I pulled the plug on one of the disks, and attempted to boot. I received the following error, which appears to be the same error...except the listed offsets are slightly different:

device-mapper: reload ioctl failed: No such device or address
device-mapper: table ioctl failed: No such device or address
nash received SIGSEGV! Backtrace (15):
/bin/nash[0x8054e8f]
[0xa4140c]
/usr/lib/libnash.so.6.0.71[0x1433b8]
/usr/lib/libnash.so.6.0.71[nashDmDevGetName+0x5a)[0x14430b]
/usr/lib/libnash.so.6.0.71[0x140608]
/usr/lib/libnash.so.6.0.71[0x14072b]
/usr/lib/libnash.so.6.0.71[nashBdevIterNext+0x106)[0x140ba9]
/usr/lib/libnash.so.6.0.71[0x140e44]
/usr/lib/libnash.so.6.0.71[nashFindFsByUUID+0x2e)[0x140ec9]
/usr/lib/libnash.so.6.0.71[nashAGetPathBySpec+0x8e)[0x1401040]
/bin/nash[0x804f741]
/bin/nash[0x8054cfa]
/bin/nash[0x8055459]
/lib/libc.so.6(__libc_start_main+0xe5)[0x1676e5]
/bin/nash[0x804b2b1]

Comment 4 Hans de Goede 2009-02-15 10:04:50 UTC

(In reply to comment #3)
> I attempted what you suggested, however it still failed. I'm a total mkinitrd
> noob, so let me retrace my steps to see if I did anything wrong.
> 
> I updated nash from updates-testing:
> % yum update nash --enablerepo=updates-testing
> % rpm -q nash
> nash-6.0.71-4.fc10.i386
> 

Good.

> I installed the dmraid rpm from koji.
> % rpm -q dmraid
> dmraid-1.0.0.rc15-4.fc11.i386
> 

Also good, but since my last comment I've learned you need an even newer version, please install the one from here:
http://koji.fedoraproject.org/koji/buildinfo?buildID=82600

> Then I did the following with mkinitrd. This is where it got a little fuzzy for
> me. I created a "custom" version of the img file:
> 
> % ./mkinitrd /boot/initrd-2.6.27.12-170.2.5.fc10.i686.PAEcustom.img
> 2.6.27.12-170.2.5.fc10.i686.PAE
> 

The command is correct, but you need to do this with a special new version of mkinitrd (which will be in rawhide soon), download this version from here:
https://bugzilla.redhat.com/attachment.cgi?id=331850

And make a new custom initrd with this version of the mkinitrd script, note this is a newer version then the one in I linked to in comment #2. This version should actually work with kernel 2.6.27, the version from comment #2 only worked with 2.6.29 or newer.

Can you please give things a try with these new dmraid and even newer mkinitrd script?

When you create to the custom initrd please pass -v to mkinitrd and redirect the output to a log file, like this:

./mkinitrd -v /boot/initrd-2.6.27.12-170.2.5.fc10.i686.PAEcustom.img \
  2.6.27.12-170.2.5.fc10.i686.PAE > log

And attach the log file, then I can check it is behaving as expected.

Comment 5 gregjo 2009-02-16 22:26:35 UTC

Created attachment 332141 [details]
Log of mkinitrd execution.

This is the log file for the mkinitrd command.

Comment 6 gregjo 2009-02-16 22:27:08 UTC

I installed the newer dmraid rpm from koji.
% rpm -q dmraid
dmraid-1.0.0.rc15-5.fc11.i386

I downloaded the newer mkinitrd script. I have attached the log file. 

After degrading the array to a single disk, the following error is displayed upon booting:

/dev/sda: "jmicron" and "isw" formats discovered (using isw)!
ERROR: isw device for volume "Mirror0" broken on /dev/sda in RAID set "isw_bdedhfgbae_Mirror0"
ERROR: isw: wrong # of devices in RAID set "isw_bdedhfgbae_Mirror0" [1/2] on /dev/sda
ERROR: no mapping possible for RAID set isw_bdedhfgbae_Mirror0
Unable to access resume device (UUID=e6cbd316-3bb3-4698-aa7d-87ddb484b7b8)
mount: error mounting /dev/root on /sysroot as ext3: No such file or directory

As we've discussed on another issue, it is interesting that both jmicron and isw formats are seen. I initially partitioned the drives with gparted while booting from a Linux rescue CD, however the Fedora installer didn't like that....it saw the header as corrupt. So I let the Fedora installer remove all partitions and then repartition the drive.

Comment 7 Hans de Goede 2009-02-17 08:24:24 UTC

Ok, well atleast we got rid of the segfault :) I consider the remainingf issue a dmraid issue, for which I've filed bug 485882.

*** This bug has been marked as a duplicate of bug 485882 ***

Note You need to log in before you can comment on or make changes to this bug.