Bug 140367

Summary: FC3 kernel panics with SATA and kernel on install disc
Product: [Fedora] Fedora Reporter: Matthew E. Lauterbach <lauterm>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: benny+bugzilla, davej, gajownik, janes.rob, mattdm, michael.wiktowy, peterm, rob, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-07-20 19:02:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lspci output
none
output of lspci none

Description Matthew E. Lauterbach 2004-11-22 16:58:27 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20041001 Firefox/0.10.1

Description of problem:
This is on an nForce3 based Chaintech VNF3-250 motherboard.

After installing FC3 final to SATA, attempting to boot yields kernel
panic.  Upgrading kernel to 681 in rescue mode yields the same result.
 Installing FC3 to a borrowed IDE drive boots fine.  The sata_nv
driver is being loaded.



Version-Release number of selected component (if applicable):
kernel-2.6.9-1.667 and kernel-2.6.9-1.681_FC3

How reproducible:
Always

Steps to Reproduce:
1. Install FC3 to SATA
2. reboot
3. kernel panic
    

Actual Results:  kernel panic:

Expected Results:  successful boot

Additional info:

ata1 failed to respond (30 secs)
<snip>
Kernel panic - not syncing: Attempted to kill init!

and then many, many:
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86,
might be trying access hardware directly.

Comment 1 Matthew E. Lauterbach 2004-11-22 17:07:58 UTC
Created attachment 107192 [details]
lspci output

Comment 2 Rob Hughes 2004-11-22 17:25:14 UTC
Also happens on Nforce2-based systems.

Comment 3 Benny Amorsen 2004-11-23 02:41:58 UTC
This sounds very much like http://bugzilla.kernel.org/show_bug.cgi?id=3352

I use the workaround patch found at that link.

Comment 4 Rob Hughes 2004-11-23 02:59:32 UTC
Except that the work around is for nf3 boards. I have the same issue,
but with a nf2 board, so it appears to be a problem in the core of the
SATA drivers.

Comment 5 Matthew E. Lauterbach 2004-11-23 05:08:25 UTC
Never patched a source rpm before.  Is there a link on how to do this
properly?

I installed the src.rpm an copied the patchfile into
/usr/src/redhat/SOURCES.  Then I added that patchfile into the spec
file as Patch1154 right after the other sata patches.  Then I did
rpmbuild -bb kernel-2.6.spec --target=i686.  It all built fine, but
when I looked at the spec file again to doublecheck my work my changes
were no longer there.  So apparently I don't know enough about what I
am trying to do.

Why does the installer kernel work just fine and the installed kernel
from FC3 final not work.  Are they different?

btw, this is happening with i686 install and x86_64 install.

Comment 6 Remco Bastiaans 2004-11-23 09:31:48 UTC
Same problem here with ASUS AV7333 motherboard and an additional (PCI)
SATA-controller (Mercury Sata 150 Raid Controller, Silicon Image
chipset)...

Using stock FC3-kernel it panics with the following msg:
----------------------------------------------------------
...
loading jdb.ko module
loading ext3.ko module
creating root device
mounting root filesystem
mount: error 6 mounting ext3
mount: error 2 mounting none
switching to new root
switchroot mount failed: 22
umount /initrd/dev failed: 2
kernel panic - not syncing: attempted to kill init
----------------------------------------------------------

Using kernel-2.6.9-1.681_FC3 kernel as suggested by bug-id 139674
flashes a simular msg (to fast to read it), and then goes looping the
following msg:
----------------------------------------------------------
atkbd.c: spurious ack on isa0060/serio0. Some program, like XFree86
might be trying access hardware directly
----------------------------------------------------------

Booting from the dvd in linux rescue mode and a chroot /mnt/sysimage
gives me my full filesystem (both ide and sata)..


Comment 7 Matthew E. Lauterbach 2004-11-24 08:51:18 UTC
Since I couldn't get the patchfile to work properly, I directly edited
sata_nv.c and re-tarred and bz2ed the kernel source.  I was then able
to successfully rebuild the kernel rpm (i686 only so far) to include
the potential fix suggested by Benny in comment #3.  I've tested the
resulting rpm on a working FC3 install.  It boots and runs fine there.
 I will test on the problem machine when I get off work in 2.5 hours.

Comment 8 Rob Hughes 2004-11-24 11:03:11 UTC
It seems like there's a potential fix for sata_nv.c. Any ideas for
sata_sil, which is what my system uses?

Comment 9 Matthew E. Lauterbach 2004-11-24 11:46:28 UTC
This fixed the issue for me.  I edited sata_nv.c as per
http://bugzilla.kernel.org/show_bug.cgi?id=3352 (Thank you, Benny). 
Apparently the Seagate drives don't like that reset command.  However,
other drives may need it.  Hoping for someone with a little more
kernel experience than me will have an idea as to how to make a
general kernel patch that will work with the Seagate drives without
breaking anything else.  I'm going to look at sata_sil to see if it is
doing the same sort of thing as per Rob's comment above.  Remco, is
your drive a Seagate and are you also using sata_sil?

Comment 10 Remco Bastiaans 2004-11-24 16:31:19 UTC
My drive is a Maxtor..  not really sure if I am (was) using sata_sil,
since I've just finished re-installing my system with boot- and root-
partitions on an old-fashioned ide disk, and the rest on sata..  I do
have some log's saved however... where can I check what it was using?

Comment 11 Rob Hughes 2004-11-24 16:54:58 UTC
The output from dmesg will show if you're still using the sata_sil 
driver, but I don't think this is the core of the problem. The 
sata_sil.c in 2.6.9 is the same as in 2.6.8.1, so something has 
changed which broke that driver, and apparently, several others as 
well.

Comment 12 Rob Hughes 2004-11-29 12:48:25 UTC
Well, after booting the rescue disk and chrooting the old install,
doing a rpm --nodeps on all the kernel packages, booting the rescue cd
again and choosing install, write new grub configuration and letting
the installer install the kernel and update the MBR, everything works.
So this was, for me, apparently a problem caused by not letting the
installer update the MBR with the new grub boot loader code.

As a side note, I've noticed that performance is *way* down, by about
75%, with my Seagate drive. A code review shows that the author is
applying the MOD15WRITE to my  particular drive, though it worked fine
with the older driver with didn't seem to include it in the blacklist.

Comment 13 Matthew E. Lauterbach 2004-11-30 11:32:40 UTC
When someone from Red Hat gets a chance to look at this, the sata_nv
does not seem to need "ATA_FLAG_SATA_RESET |" until libata is ready
for hotplug.  Can this be patched at the in the Fedora rpms until it
all gets sorted out at the kernel level?  See
http://bugzilla.kernel.org/show_bug.cgi?id=3352 for specifics.

Comment 14 Robin Bowes 2004-11-30 18:50:33 UTC
This is a "me too" post.

I upgraded from FC2 to FC3 on an Epox EP-D3VA dual 1GHz PIII
motherboard with  6 x Maxtor Maxline II 250GB SATA disks connected to
2 x Promise SATA150 TX4 cards. Root is installed to /dev/md0 (RAID1
built from /dev/sda1 and /dev/sdd1), with swap on /dev/md1 (RAID1
built from /dev/sdb1 and /dev/sde1) and /dev/md2 (RAID1 built from
/dev/sdc1 and /dev/df1). /dev/md5 is a large RAID5 array built from
/dev/sd[abcdef]2).

The upgrade appeared to go smoothly (kernel-2.6.9-1.667smp was
installed) but when I rebooted I got a kernel panic with the following
message:

Loading ext3.ko module
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
Creating root device
Mounting root filesystem
EXT3-fs: unable to read superblock
mount: error 22 mounting ext3
mount: error 2 mounting none
Switching to new root
switchroot: mount failed: 22
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!

I booted into rescue mode and chrooted into the upgraded system - all
seems fine, i.e. all the filesystems are OK.

I then upgraded all the packages using "yum upgrade" within the chroot
environment. This installed kernel-2.6.9-1.681_FC3smp.

I rebooted and got the same error as above.

Is there any sign of a fix for this on the horizon?

Thanks,

R.

Comment 15 Rob Janes 2005-04-08 01:19:04 UTC
I'm getting the no volume groups problem, but am attempting with
vmlinuz-2.6.10-1.770_FC3.  works ok with vmlinuz-2.6.9-1.667, which is before
the fixed one, .681!

Dell Optiplex GX280.

kernel doesn't panic until it finds it has no volume group

if i reboot with 2.6.9 667, things are ok.

the problem also occurs with vmlinuz-2.6.10-1.760_FC3.

so, install FC3 was fine.  problem happens after a new kernel from up2date.

Any updates on this?

rj


Comment 16 Rob Janes 2005-04-21 23:47:08 UTC
The reason it worked with 2.6.9-1.667 is because in order to get fedora core
3 to install in the first place i had to turn "compatibility" mode on in the bios.

It would appear that kernel versions after the patch supported the "normal"
mode for this thing, but strangely enough, they no longer supported the
"compatibility" mode.

The problem cleared up when I restored "normal" mode in the bios screen for the
hard drive.  I guess "compatibility" mode was getting in the way of the
revamped kernel.

I had forgotten I had to turn on compatibility mode to install fc3.  I thought
it had something to do with the LVM.  I reloaded fc3 and ditched the LVM for
ext3, and the darn thing still wouldn't go.  I've had lots of problems with LVM,
but none to speak of with ext2/ext3, so at that point I figured it had to be
something else, and then remembered about the compatility setting for this funky
ide scsi hybrid drive.

so, for future reference, install fc3 with compatibility mode on.  flip it off
once you get the kernel up2date.

(In reply to comment #15)
> I'm getting the no volume groups problem, but am attempting with
> vmlinuz-2.6.10-1.770_FC3.  works ok with vmlinuz-2.6.9-1.667, which is before
> the fixed one, .681!
> 
> Dell Optiplex GX280.
> 
> kernel doesn't panic until it finds it has no volume group
> 
> if i reboot with 2.6.9 667, things are ok.
> 
> the problem also occurs with vmlinuz-2.6.10-1.760_FC3.
> 
> so, install FC3 was fine.  problem happens after a new kernel from up2date.
> 
> Any updates on this?
> 
> rj
> 



Comment 17 Michael Wiktowy 2005-04-23 23:07:26 UTC
To add another data point:
I was trying to migrate my FC3 install from a PATA drive to a SATA drive.
I low-level copied everything over using:
dd if=/dev/hda of=/dev/sda bs=10M
Afterwards everything seemed to mount OK.
I pulled out the PATA disk but got these same errors and kernel panics.
No amount of grub futzing made it mount the root device.
My new SATA hd is a Seagate 250GB 7200rpm.
I do not have things set up as a RAID.

Comment 18 Michael Wiktowy 2005-04-23 23:10:18 UTC
Created attachment 113587 [details]
output of lspci

This problem occurs running latest kernel 2.6.11-1.14_FC3
see attachment for lspci

Comment 19 Matthew E. Lauterbach 2005-05-02 08:23:15 UTC
I finally had a chance to play with this again.  I did a clean install of x86
FC3.  I got the kernel panic on first boot.  I booted into rescue and ran yum. 
It updated my kernel to 2.6.11-1.14_FC3.  Then, I rebooted, and it is working fine.

Interestingly, my newer Seagate 300GB SATA drive did not exhibit the same
problem.  It is model # ST3300831AS.  The older 120GB that did exhibit the
problem was model # ST3120026AS.

Comment 20 Michael Wiktowy 2005-05-25 00:55:42 UTC
My problem in Comment #17 was solved by booting into rescue mode and
uninstalling/reinstalling the kernel mentioned in Comment #19. This was the one
that was installed originally but when it was installed, my Sil 3112 SATA
onboard controller on my A7N8X mobo was not enabled (hardware jumper was in the
off spot). Likely this caused the sil_sata kernel module to not get bundled into
the initrd and a reinstall of the kernel forced a mkinitrd which created
included the correct modules.

Comment 21 Dave Jones 2005-07-15 19:53:12 UTC
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 22 Matthew E. Lauterbach 2005-07-16 03:35:00 UTC
Actually as I stated in Comment #19, kernel 2.6.11-1.14_FC3 seemed to fix it for
me.  I have moved to FC4, and the problem has not re-occurred.  Thanks.