240209 – Updating to new kernel (2.6.21-1.3142.fc7.img) caused kernel panick because of raid 1

Bug 240209 - Updating to new kernel (2.6.21-1.3142.fc7.img) caused kernel panick because of raid 1

Summary: Updating to new kernel (2.6.21-1.3142.fc7.img) caused kernel panick because o...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mkinitrd
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Assignee:	Peter Jones
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:	bzcl34nup
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-05-15 19:18 UTC by Ronald Haring
Modified:	2013-01-10 04:18 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-05-07 01:44:49 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
lvmdump output (14.96 KB, application/x-compressed-tar) 2007-05-16 16:51 UTC, Ronald Haring	no flags	Details
View All

Description Ronald Haring 2007-05-15 19:18:16 UTC

I upgraded today to fedora 2.6.21-1.3142.fc7.img.

After upgrading the system ended with kernel panick because it couldnt find the
/ directory anymore. On my current system this is a raid 1 lvm disk.

fdisk -l for the current fedora core (2.6.20-1.3104.fc7.img)
Disk /dev/sda: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1       38913   312568641   8e  Linux LVM

Disk /dev/sdb: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1       38913   312568641   8e  Linux LVM

Disk /dev/sdc: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1   *           1       19457   156288321   42  SFS

Disk /dev/sdd: 20.0 GB, 20020396032 bytes
255 heads, 63 sectors/track, 2434 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *           1        1020     8193118+   b  W95 FAT32
/dev/sdd2            1021        1033      104422+  83  Linux
/dev/sdd3            1034        2434    11253532+  8e  Linux LVM

Disk /dev/dm-1: 320.0 GB, 320070288384 bytes
255 heads, 63 sectors/track, 38912 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/dm-1 doesn't contain a valid partition table

Comment 1 Will Woods 2007-05-15 21:47:58 UTC

What disk controller are the drives attached to? Are you using dmraid / BIOS-raid?

Comment 2 Ronald Haring 2007-05-16 07:11:58 UTC

I think I use bios raid, but I don't know. The reason I liked fedora so much is
that I didn't have to define anything, since everything was picked up correctly
from the start. Do you need additionally logs? Which commands should I run for you?

Regards
Ronald

Comment 3 Milan Broz 2007-05-16 11:18:00 UTC

You can use "lvmdump" utility to grab important info about your lvm installation
and attach created archive to this bugzilla.

Comment 4 Jesse Keating 2007-05-16 15:38:40 UTC

this isn't bios raid, if it were you wouldn't see /dev/sd, you'd see some other
device.  This looks more like a generic LVM issue.

Comment 5 Jarod Wilson 2007-05-16 15:46:29 UTC

This actually *could* be dmraid/bios-raid, as you can actually fdisk -l /dev/sdx
with a disk in a dmraid set and get a valid partition table returned.

Ronald, please attach output of 'cat /etc/fstab', and that should tell us for
certain if you're using dmraid/bios-raid or not, and we'll go from there...

Comment 6 Ronald Haring 2007-05-16 16:51:03 UTC

Created attachment 154847 [details]
lvmdump output

Comment 7 Ronald Haring 2007-05-16 16:52:41 UTC

There you go:

[root@localhost Desktop]# cat /etc/fstab
LABEL=/                 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/dev/VolGroup01/LogVol01 swap                    swap    defaults        0 0
/dev/sdc1 /muziek ntfs-3g defaults,force 0 0

And the dump

Comment 8 Milan Broz 2007-05-16 18:26:39 UTC

So dm mapping table look like this:

nvidia_ejhgbcci: 0 625142446 mirror core 2 131072 nosync 2 8:0 0 8:16 0
nvidia_ejhgbccip1: 0 625137282 linear 253:0 63
VolGroup01-LogVol01: 0 4063232 linear 8:51 18284928
VolGroup01-LogVol00: 0 625082368 linear 253:1 384
VolGroup01-LogVol00: 625082368 18284544 linear 8:51 384

- it is using dmraid mirror (nvidia_...)

- but interesting is that VolGroup01/LogVol00 (root volume) is splitted between
dmraid device (mirror) and 8:51 (/dev/sdd3 ?)

This is a little bit nonstandard configuration...
(iow: you have mirrored only part of root volume !
you can check it with  "lvs -o +devices" command)

Maybe initrd have some problems to correctly activate this ?

Comment 9 Jarod Wilson 2007-05-16 18:31:35 UTC

D'oh, I wasn't thinking. fstab is populated with labels, so it doesn't show
dmraid devices, but as Milan said, there are some shown in the lvmdump output.

I'm curious if this could be yet another case of the disks simply not quite
being ready before we try mounting them. I doubt this is actually an lvm problem
(though I suppose it still could be).

Any chance you have the ability to hook up a serial console to capture the
messages on screen during the failed boot? In a pinch, a digital camera works
too -- if you do that, add 'vga=791' to your kernel boot options to get more on
screen at once though.

Comment 10 Ronald Haring 2007-05-16 18:39:44 UTC

Ah I see,

well I had a non-raid volume containing windows and I could not boot from the
raid 1 disks for some reason. As I still have some windows programs lying around
that I want to use I decided that I then should boot from the windows disk and
have most of the other dirs mirrored. Or better said, I didnt decide that, but
fedora took care of that for me. So I should try to move the windows partition
to the raid mirror disks as well and then start from the raid drives?

Hmmm, it is working now on the previous fedora core though.

Here is the report for lvs:

[root@localhost ~]# lvs -o +devices
  LV       VG         Attr   LSize   Origin Snap%  Move Log Copy%  Devices       
  LogVol00 VolGroup01 -wi-ao 306.78G                               /dev/dm-1(0)  
  LogVol00 VolGroup01 -wi-ao 306.78G                               /dev/sdd3(0)  
  LogVol01 VolGroup01 -wi-ao   1.94G                               /dev/sdd3(279)


I will try to setup a camera to capture some more lines for you.

Thx for your all of your time so far though.

Comment 11 Ronald Haring 2007-05-16 19:06:01 UTC

And here are the lines. My camera could not show them large enough, so I had to
fall back to ye ol art of writing. Lo and behold what happened:

Powernow-k8: BIOS error - no PSB or ACPI _PSS objects
Red hat nash version 6.0.9 starting
device-mapper: table: 253: 0: mirror: Device lookup failure
device-mapper: reload ioctf failed: no such device or address
Reading all physical volumes. This may take a while...
No volume groups found
Volume group "VolGroup01" not found
Unable to access resume deive (/dev/VolGroup01/LogVol01)
mount: could not find filesystem '/dev/root'
setuproot: moving /dev failed
setuproot: error mounting /proc: no such file or directory
setuproot: error mounting /sys: no such file or directory
switchroot: mount failed: no such file or directory
Kernel panic - not syncing: Attempted to halt init!

And thats the end of the story.

Comment 12 Jarod Wilson 2007-05-16 19:27:13 UTC

Hm... Try altering your kernel boot line, removing the 'quiet' option. I suspect
some useful info is being suppressed. On the surface, it looks like
device-mapper having a problem, but that seems it could also be that the device
simply isn't ready yet, which we've ran into in a few other cases...

Comment 13 Alasdair Kergon 2007-05-16 21:47:54 UTC

"Red hat nash version 6.0.9 starting"

Comment 14 Ronald Haring 2007-05-17 08:29:39 UTC

Well, I ran it without the quiet option and removed most of my devices, but I am
still not sure if I have enough information now. One thing I did notice were the
following lines
loading dm-mod.ko module
loading dm-zero.ko module
loading dm-snapshot.ko module
making device-mapper control node
device-mapper: table: device 8:0 too small for target.

So does the system try to put all of the raid on my first, small, hard drive?

Gr
Ronald

Comment 15 Jarod Wilson 2007-05-21 19:35:43 UTC

Hrm. No, the raid is on the sata drives, but there's an lvm volume allocated
across the raid and sdd. The 'device-mapper: table: device 8:0 too small for
target.' message appears to correspond to /dev/sda, which is part of the raid.
The complaint about it being too small is making me think this is indeed another
case where the device isn't quite ready yet. Can you try introducing a bit of a
delay into the initrd to test that theory out?

You can either unpack your initrd, hand-edit init or customize /sbin/mkinitrd
and rebuild your initrd. I believe the following mkinitrd hack should suffice to
test this theory:

$ diff -u /sbin/mkinitrd mkinitrd-sleep 
--- /sbin/mkinitrd      2007-04-16 18:23:18.000000000 -0400
+++ mkinitrd-sleep      2007-05-21 15:33:45.000000000 -0400
@@ -1421,6 +1421,9 @@
     emit "rmmod scsi_wait_scan"
 fi
 
+emit "echo Sleeping for a bit to see if we're trying to access drives too soon..."
+emit "sleep 10"
+
 # HACK: module loading + device creation isn't necessarily synchronous...
 # this will make sure that we have all of our devices before trying
 # things like RAID or LVM

So patch that in, then:

# mkinitrd -f /boot/initrd-2.6.21-1.3142.fc7.img 2.6.21-1.3142.fc7

Comment 16 Ronald Haring 2007-05-22 07:45:09 UTC

Ok, I will try to do that tonight and give you feedback.

Comment 17 Ronald Haring 2007-05-22 20:38:33 UTC

Unfortunately that didnt solve it (although I now had to change 3142 to 3167,
since I updated and reinstalled a couple of times, trying to fix this problem).
I did see the extra echo and the timeout of 10 seconds, but still the same error.

Maybe I should add a sleep on more places so I can try to (very quickly) right
down more possible errors?

Gr
Ronald

Comment 18 Jarod Wilson 2007-05-22 21:14:30 UTC

Hrm. Damn. Shouldn't need more sleeps, just be quick with the scroll lock
button. That, or a serial console would be very very very helpful here... I'm
out of ideas right now. :\

Comment 19 Matthew Truch 2007-06-22 15:53:34 UTC

I found this but via google while debugging a similar problem I had.  Does it
have to do with mkinitrd not including the LVM group properly?  See comment #6
on http://www.linuxquestions.org/questions/showthread.php?t=497332 for what I
found worked for me in a similar case.

Comment 20 Robert K. Moniot 2007-08-11 23:02:06 UTC

We are seeing this same behavior on some but not all of our machines as they are
upgraded from the 2.6.20-1.2962.fc6 to the 2.6.22.1-32.fc6 kernel.  Essentially
the same boot screen as in comment #11 above except we don't see any
device-mapper error messages.  The problem does not seem to have anything to do
with volume groups as suggested by comment #19 above: We have 4 machines, two of
which use volume groups configured identically and two of which do not use
volume groups but instead use plain ext3 file system partitions.  One of the VG
machines boots OK with the 2.6.22 kernel while the other does not.  One of the
non-VG machine boots OK and the other does not.  (The machines are of different
makes.)  The machines that fail with the 2.6.22 kernel continue to boot fine
with the 2.6.20 kernel, so we are OK as long as we don't upgrade.

Comment 21 Bug Zapper 2008-04-04 00:42:29 UTC

Based on the date this bug was created, it appears to have been reported
against rawhide during the development of a Fedora release that is no
longer maintained. In order to refocus our efforts as a project we are
flagging all of the open bugs for releases which are no longer
maintained. If this bug remains in NEEDINFO thirty (30) days from now,
we will automatically close it.

If you can reproduce this bug in a maintained Fedora version (7, 8, or
rawhide), please change this bug to the respective version and change
the status to ASSIGNED. (If you're unable to change the bug's version
or status, add a comment to the bug and someone will change it for you.)

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

Comment 22 Bug Zapper 2008-05-07 01:44:48 UTC

This bug has been in NEEDINFO for more than 30 days since feedback was
first requested. As a result we are closing it.

If you can reproduce this bug in the future against a maintained Fedora
version please feel free to reopen it against that version.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

Note You need to log in before you can comment on or make changes to this bug.