23640 – 2.4.0 kernel boot fails with RAID mirroring

Bug 23640 - 2.4.0 kernel boot fails with RAID mirroring

Summary: 2.4.0 kernel boot fails with RAID mirroring

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	anaconda
Sub Component:
Version:	7.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Matt Wilson
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-01-09 01:20 UTC by David Ross
Modified:	2007-04-18 16:30 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2001-05-04 18:44:29 UTC
Embargoed:

Attachments	(Terms of Use)
Kickstart file for IBM eServer w RAID1 (1.57 KB, text/plain) 2001-03-01 16:27 UTC, David Ross	no flags	Details
Wolverine RAID boot output (10.35 KB, text/plain) 2001-03-06 21:37 UTC, David Ross	no flags	Details
View All

Description David Ross 2001-01-09 01:20:21 UTC

We have built our IBM Netfinity 4000R servers with RH7.0 and the stock kernel, and used
mirroring on our two internal SCSI disks.
We had built a 2.2.18 kernel and with no problems.
We then built a 2.4.0 kernel (similar config to 2.2.18) and booted.  After the two drives are identified, the kernel panics with
"unable to mount root fs".

Anybody have any ideas how that could happen? 
We obviously did select RAID in the kernel config.
Help!!!

Comment 1 David Ross 2001-01-09 18:36:41 UTC

Bit more info.  We had actually tried 2.4.0-test9 previously without any
problems!  

More correctly the message we get is ext2: unable to read superblock
EXT2-fs: unable to mount root fs.

Why would this be???  I can't image it's a kernel bug, but I don't know what we
did wrong.

Comment 2 Pekka Savola 2001-01-10 18:30:47 UTC

You used stock 2.4.0 tarball?  Did you 'make config' based on previous .config (use make oldconfig first) 
or fresh?  

Did you remember to use 'mkinitrd' ?

Comment 3 David Ross 2001-01-11 20:13:22 UTC

>You used stock 2.4.0 tarball?  Did you 'make config' based on previous .config
>(use make oldconfig first) 
> or fresh?  
>
> Did you remember to use 'mkinitrd' ?

Yes to all!  I've tried with 2.4.0, 2.4.0-ac4, 2.4.0-pre1, 2.4.0-prerelease,
test12, test11, test10.  No luck.  I'm just going to try test9 again.

Comment 4 Nick Urbanik 2001-01-27 06:52:57 UTC

Solution is described in linux/Documentation/md.txt.

You don't need any initrd for this, just a kernel command line in lilo.conf.

My system boots off /dev/md0 using RAID 1:

image=/boot/vmlinuz-2.4.1-pre10-2
        label=linux
        append = "md=0,/dev/hdc1,/dev/hdb3"
        read-only
        root=/dev/md0

Comment 5 Michael K. Johnson 2001-01-27 16:19:04 UTC

nicku, when RAID is built as a module (as it is in our kernels) you
*do* need an initrd for RAID root.

Comment 6 David Ross 2001-02-23 16:48:20 UTC

Alrighty then!  We gave up on RAID for the short term.  Now RedHat 7.1Beta is
out and we though - cool!  maybe this will solve this problem because it
installs a 2.4 kernel from scratch.

We kickstarted the 7.1B install with RAID 1.  Install works great, machine boots
(although not the SMP kernel, but that's a different problem).

Using RH7.1B as the build machine, we built a 2.4.1 stock kernel (NOT RH's
source).  Much the same as last time.  Install and boot - messages look good,
SCSI and RAID modules load fine, then pow:

EXT2-fs: unable to read superblock
iosfs_read_super: bread failed, dev=09:02, iso_blknum=16, block=32
Kernel panic: VFS: Unable to mount root fs on 09:02

We have tried recompiling the kernel every which way from Sunday to no avail.
If I didn't know better I'd say that RH has somehow changed the RAID or
filesystem code from the stock kernel.  Is that possible?

What could we possibly be doing wrong???!  HELP!

Comment 7 Michael K. Johnson 2001-02-23 19:35:37 UTC

If you are using an adaptec driver, please try upgrading to wolverine,
our latest beta release.  If not, please describe your hardware.

Comment 8 David Ross 2001-03-01 16:27:29 UTC

Created attachment 11440 [details]
Kickstart file for IBM eServer w RAID1

Comment 9 David Ross 2001-03-01 16:44:59 UTC

Installed with Wolverine.  This time it didn't boot at all.  Exactly the same
error.  Somewhat reassuring actually.

Hardware is an IBM eServer 330, dual PIII/800.  Adaptec AIC 7892 SCSI.  Dual
hot-swap Seagate 9GB drives (sorry, don't know model off hand - can look if
helpful).  Dual eepro100 ethernet.  2GB of RAM.

I've attached my kickstart file.

Comment 10 Michael K. Johnson 2001-03-02 02:43:46 UTC

This really looks like a configuration error.

Can you please set up a serial console and give us the output from
your boot attempt with wolverine installed?
 1 connect a null modem cable to another machine
 2 run minicom on the other machine, set to 115200n8 no flow control
 3 turn on capturing in minicom
 4 boot the affected machine with args
   boot: linux console=ttyS0,115200 console=tty0
 5 attach the capture file

Comment 11 David Ross 2001-03-06 21:37:14 UTC

Created attachment 11944 [details]
Wolverine RAID boot output

Comment 12 David Ross 2001-03-09 15:37:01 UTC

Oops, guess I've got to actually update this bug before you see that I've put an
attachment on.
Here's my log!  I hope it's all-revealing!

Comment 13 Michael K. Johnson 2001-03-12 20:10:42 UTC

This is starting to look like it might be an anaconda bug.

Comment 14 David Ross 2001-03-13 19:58:28 UTC

Don't have a lot of info here, but my colleague has installed this server with RedHat 7.0 (2.2.18 kernel) with RAID, then
built a 2.4.2-ac16 kernel - with RAID and SCSI (aic7xxx) built-in, i.e. not modules.
The machine booted!  I generally make these things modules, and use an initrd.  I don't know what is used in the out-of-the-box Wolverine kernel, 
but we were wondering if it's maybe a module loading order problem.  Just grasping at straws.
Also noticed that 2.4.2-pre3 and newer appear to have a new aic7xxx driver.  Maybe that has something to do with it...

Comment 15 David Ross 2001-03-26 18:43:16 UTC

Woohoo!  Built a server with RH7.0 with all updates applied.  Built a 2.4.3-pre8
kernel with RAID and SCSI built in (not modules).  It works!!!
Now I'm not sure whether the problem is fixed by the RH7.0 updates, or by some
change in the 2.4 kernel.  Sigh.  I'll try to find time to test an older 2.4
kernel.

Comment 16 David Ross 2001-04-01 03:48:39 UTC

Confirmed that 2.2.18 with SCSI and RAID 1 built in on top of RH7.0 with all 
updates does not work. 
Built 2.4.3 (final) with aic7xxx and RAID 1 built in and it works fine!  Still 
getting the md3 overlaps with md2 messages, but it appears to be some sort of 
kernel problem that is now fixed???

Comment 17 Brent Fox 2001-04-25 22:08:19 UTC

Do you still see the problem with Red Hat Linux 7.1?

Comment 18 Brent Fox 2001-05-04 18:44:22 UTC

Closing due to inactivity.  Please reopen if you have any more info.

Note You need to log in before you can comment on or make changes to this bug.