446284 – RHEL can't boot after installation on a Intel sata array

Bug 446284 - RHEL can't boot after installation on a Intel sata array

Summary: RHEL can't boot after installation on a Intel sata array

Keywords:
Status:	CLOSED DUPLICATE of bug 471689
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	mkinitrd
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Peter Jones
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-05-13 21:52 UTC by Fernando Lozano
Modified:	2008-12-02 20:46 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-12-02 20:45:56 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Sysreport of bug server after update to 5.2 (and my fixes to initrd) (631.53 KB, application/octet-stream) 2008-05-23 20:25 UTC, Fernando Lozano	no flags	Details
Dmesg, raid fail to boot after intall redhat 5.2, on motherboard with ICH9R (20.38 KB, text/plain) 2008-07-16 19:11 UTC, Hanh Le	no flags	Details
Remove the dmname, as the raid set is set incorrectly. (427 bytes, patch) 2008-11-25 02:10 UTC, Wade Mealing	no flags	Details \| Diff
View All

Description Fernando Lozano 2008-05-13 21:52:43 UTC

Problem Description:

I guess this bug may be related to bug #399161 submited for Fedora 8.

RHEL 5.1  and 5.2 beta, both i386 and x86_64, can't boot after installing with a
kernel panic stating can't find the root partition.

If I make a RHEL 5.0 install and later update the kernel via RHN, same problem.



Version-Release number of selected component (if applicable):


How reproducible:

Tried many times install 5.1 / 5.2beta and many updates to the kernel, all with
same results.


Steps to Reproduce:
1. Install RHEL 5.0 on a server based on a Intel S5000VSA SATAR motherboard
using SATA disks as a RAID 0 array for the root partition, default partitionlayout.

2. After installation, update the kernel using RHN.

  
Actual results:

Kernel panic, can't find / partition


Expected results:

System boots normally.


Additional info:

The intel raid board lists on lspc as:
"Intel 631xESB/632xESB/ SATA Storage Controller"

Comparing the initrd generated by anaconda during 5.0 install and the initrd
generated by the kernel upgrade, I noticed several commands related to
device-mapper are absent. The commands are:

rmparts sdc
rmparts sdb
rmparts sda
dm create ddf1_4c5349202020202080862682000000003547905e00000a28 0 2923825152
striped 3 128 8:0 0 8:16 0 8:32 0
dm partadd ddf1_4c5349202020202080862682000000003547905e00000a28 

After inserting those commands on the init script for the updated initrd image,
the system again boots properly, so I guess the bug is in mkinitrd who should
have inserted those commands during the kernel update.

Suppport ticket #1825896 and related provide the init scripts and sysreports for
the server where I found this problem.

Comment 1 Fernando Lozano 2008-05-14 15:41:16 UTC

Just to let you know my motherborard and raid controller are certified for RHEL:
https://hardware.redhat.com/show.cgi?id=238494

Comment 2 Fernando Lozano 2008-05-16 19:49:04 UTC

OOps, my mistake, I was refering to bug #349161
(https://bugzilla.redhat.com/show_bug.cgi?id=349161) and not bug #399161; Both
refere to incorrect initrd that prevents access to root partition on hardware raid.

Comment 3 Fernando Lozano 2008-05-21 22:51:39 UTC

After updating my server to RHEL5.2 though RHN (just updating packages running
yum) same problem, just a little bit different: the update generated a initrd
containing a init script with the following commands, that were not included by
previous updates:

mkblkdevs
echo Scanning and configuring dmraid supported devices
dmraid -ay -i -p "ddf1_4c5349202020202080862682000000003547905e00000a28"
kpartx -a -p p "/dev/mapper/ddf1_4c5349202020202080862682000000003547905e00000a28"
echo Scanning logical volumes

(Commandos "mkblkdevs" and "echo Scanning..." are there just to mark the positio
n where the new commands were inserted)

My server could not find the root partition, so I deleted those new commands and
replaced them by the commandos that were on the original initrd generated by
RHEL5.0 installer but not on any of the subsequent updates:

mkblkdevs
rmparts sdc
rmparts sdb
rmparts sda
dm create ddf1_4c5349202020202080862682000000003547905e00000a28 0 2923825152
striped 3 128 8:0 0 8:16 0 8:32 0
dm partadd ddf1_4c5349202020202080862682000000003547905e00000a28
echo Scanning logical volumes

And now my server boots again.

Comment 4 Peter Jones 2008-05-22 22:04:33 UTC

There's a couple of weird things here.  First, the device name generated
indicates DDF1 disk metadata.  That's weird, because it's not what any Intel
raid firmware I've ever seen uses.  I'd expect something like "isw_Volume_0". 
So the first thing we need to figure out is if the data is being probed wrong. 
Can you run "dmraid -ay -t" and post the output?

Comment 5 Fernando Lozano 2008-05-23 20:19:02 UTC

Here's the output you asked for:

# dmraid -ay -t 
ddf1_4c534920202020208086268200000000356841f000000a28: 0 2923825152 striped 3
128 /dev/sda 0 /dev/sdb 0 /dev/sdc 0

I am also giving you a sysreport of the server after updating all packages to
RHEL5.2.

Comment 6 Fernando Lozano 2008-05-23 20:25:36 UTC

Created attachment 306550 [details]
Sysreport of bug server after update to 5.2 (and my fixes to initrd)

Comment 7 Fernando Lozano 2008-06-12 20:20:26 UTC

just to let you know updating to kernel 2.6.18-92.1.1 (from RHN) requires me to
make the same fixes to initrd.

Comment 8 Hanh Le 2008-07-16 19:11:40 UTC

Created attachment 311984 [details]
Dmesg, raid fail to boot after intall redhat 5.2, on motherboard with ICH9R

Comment 9 Bryn M. Reeves 2008-07-24 17:14:57 UTC

Having the "dm create"/"dm addpart" commands replaced with dmraid/kpartx is
expected - RHEL5 has moved from using nash support for activating dmraid devices
to including dmraid itself in the initrd images so this seems like a problem
with dmraid being able to activate the device itself - adding the maintainer on CC.

Comment 10 Jukka Lehtonen 2008-08-11 08:29:30 UTC

I just fought a week with this/similar during install of CentOS 5.2 into ICH10R RAID1.  I do get RAID-set of type isw_*, and my boot did "succeed", because the commands after 'dmraid -ay *' did spot the partitions on /dev/sda (and duplicates on /dev/sdb). The convenience of mirrors compared to stripes.

The solution that does work for me is to use:
dmraid -ay -i -p
without the RAID-set string.

As with others, that manual edit of initrd*.img is required after each kernel update.  Comment #15 on bug #349161 appears to explain where dmraid fails with the explicit RAID-set string.

Comment 12 Wade Mealing 2008-11-25 02:06:27 UTC

This seems to affect all platforms, not just 64 bit.. changing to all.

Comment 13 Wade Mealing 2008-11-25 02:10:02 UTC

Created attachment 324554 [details]
Remove the dmname, as the raid set is set incorrectly.

Ive only done very basic testing of this, and I'm not sure if activating all raid sets on the system is a problem, but this seems to workforme.

Comment 14 Bryn M. Reeves 2008-11-25 09:53:47 UTC

The patch in comment #13 is a workaround - our strategy for device activation in the initramfs is to only activate those devices needed for booting (i.e. the ones that contain the root file system). This is true for dmraid, mpath, LVM, and MD devices.

If we're setting the wrong dmname for this array then that should be fixed, rather than trying to activate every dmraid device on the system.

Comment 15 Heinz Mauelshagen 2008-11-25 10:55:20 UTC

Agreeing to Bryn's comment #14.
Activating *all* RAID sets in an eg. DDF1 environment may be a long list to work during an initrd run and should be postponed to rc processing.

Comment 16 Peter Jones 2008-12-02 20:45:56 UTC


*** This bug has been marked as a duplicate of bug 471689 ***

Note You need to log in before you can comment on or make changes to this bug.