Bug 189570

Summary: FC5 Kernel crash during install Adaptec i2o 2100S SCSI RAID (also FC4 2.6.16)
Product: [Fedora] Fedora Reporter: Dave <meherenow>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 5CC: bookreviewer, brian.truter, dtimms, joni, mishu, weigansj, wendell, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-07-04 05:55:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output from lsmod when booting with a rescue cd
none
output from lspci when booting with a rescue cd
none
output from dmesg when booting with a rescue cd
none
anaconda log from rawhide install
none
dmesg from rawhide install
none
install.log.syslog from rawhide install
none
compressed anaconda.syslog from rawhide install
none
lsmod output from rawhide install
none
lspci output from rawhide install
none
The anaconda dump during the fc5 installation with the raid disabled
none
oops message from kernel 2.6.16 smp FC5 none

Description Dave 2006-04-21 07:39:24 UTC
I am attempting to install FC5 using a boot iso and a networked (http) install.

The http source is a loopback mounted FC5 iso mounted in my webserver's
/var/log/www/html/fc5 directory.

I have sha1sum verified that the DVD iso is an exact duplicate with no
corruption created during download.

The install procedes through until the point of package install and then
anaconda proceedes to begin installing rpms. At some point during the install,
the machine hangs (caps lock makes no changes to the caps lock led status) and I
am unable to switch to a VT to check on any errors.

If I reboot and use a rescue cd to look at the partially installed filesystem,
the files in /root/ provide no indications of any errors.

The hardware was previously running FC4 with no known issues.

This is reproducable in the fact that I have been unable to install FC5 on my
machine, however there appears to be no single package that causes the lockup.

The underlying hardware is nothing special, the only item out of the ordinary is
an Adaptec 2100S raid controller.

Steps to Reproduce:
1. Boot using a FC5 network boot install iso
2. Proceed through the install until the point of package installation
3. Wait for the machine to hang at some point
  
Actual results:

Machine Hangs during install

Expected results:

Machine should install FC5 cleanly

Additional info:

I have verified the condition of the hardware as much as possible by running
memtest86 overnight (with no errors reported) and a destructive (read/write)
badblocks check (with no errors reported).

Comment 1 Dave 2006-04-21 07:40:47 UTC
I have also tried installing with the options acpi=off and selinux=0

Comment 2 Dave 2006-04-21 21:47:25 UTC
I have just installed an Ubuntu 6.x Beta with no problems, so this leads me to
believe that there are no problems with the underlying hardware. I will leave it
running overnight performing several burn-in tasks overnight to ensure that
nothing obvious causes instability.

Before attempting this, I tried 3 more FC5 installs, each one failed within the
first 5-25% of the rpm installation phase.

Comment 3 Dave 2006-04-22 16:35:26 UTC
Machine was stable with Ubuntu, but still refuses to install FC5.... any
suggestions?

Comment 4 Dave 2006-04-23 10:58:19 UTC
I have managed a sucessful install using the FC5 DVD, however this is not
repeatable, as a second attempt using the same DVD failed with identical
symptoms as the http install.

i.e. hard lock, unable to switch to vt or move mouse, no keyboard lights flashing.

Comment 5 Dave 2006-04-23 15:23:25 UTC
Finaly some progress (albeit talking to myself here). I managed to achieve a
minimal FC5 DVD install (deselecting almost everything), only for it to begin
going pearshape upon running "yum update"

The following errors were picked out of dmesg, which seem to back up the
symptoms that I have been experiencing.

I'd guess that this must be happening during the installs as well, read only
mounting would explain lack of log info.

EXT3-fs error (device dm-0): ext3_delete_entry: bad entry in directory #1376454:
inode out of bounds - offset=0, inode=914432000, rec_len=12, name_len=1
Aborting journal on device dm-0.
ext3_abort called.
EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
EXT3-fs error (device dm-0) in ext3_unlink: IO failure


Comment 6 Dave 2006-04-25 05:32:06 UTC
Deleted and re-built underlying i2o array, no difference to outcome, install
still fails, kernel panics and machine locks up hard.

Comment 7 David Timms 2006-04-26 09:42:33 UTC
Have you tried on multiple architectures, or is the hardware really i386 ?
(could you change it to match the machine you are trying this on ?).

Does the sha1sum match that found on:
http://download.fedora.redhat.com/pub/fedora/linux/core/5/i386/iso/SHA1SUM ?

Are you running upgrade or install ?
Are you allowing the installer to format your discs in the installer ?
What choices are being made for partitioning ? eg lvm etc.

Which motherboard / cpu do you have ?
Which install dvd are you using ?
attach output of lspci, lsmod and dmesg ...
When installing, as soon as the rpm install begins, change to VT2/3/4, and
create a few blank lines in each. I think you may be able to run top or dmesg on
VT2 in the hope that any error message may be displayed in text mode.

Might be worth changing the summary to include words like: adaptec raid ext3
corruption.

Comment 8 Dave 2006-04-26 12:15:55 UTC
Hardware really is i386 (Athlon 2800)

Sha1sum of iso matches exactly

All attempts are fresh installs.

Installer is removing all exisiting partitions and formatting drives.

Have tried default partitioning scheme, (seperate /boot and one big root in LVM)
will try a "LVM free" install tonight.

Install DVD is FC5 Release. Downloaded and sha1sum verified.

Will put FC4 on tonight and get lspci, lsmod and dmesg output.

I do see the error messages briefly, but they scroll off the top of the screen
as they are followed by a kernel panic. Will try vga=791 option to give me more
screen estate!

P.S. Can't find a way to change the summary... will look harder!

Comment 9 Ronald Warsow 2006-04-26 13:33:21 UTC
i see this hangers by the installation of old machines (amd 1300, intel PIII 600). 
at ~60 %, especially when openoffice-core (~200 MB) is installed, the maschine
"seems" to hang; no vt,etc.. 
no differences in text/graphic install.-mode.
installations WITHOUT openoffice. no problems.

i couldn't verify this with an intel PIII ~900MHz, but the installer needs a
very long time at the openoffice-core packages.

maybe this is an option...

Comment 10 Dave 2006-04-27 13:47:23 UTC
No, even minimal installs with almost everything (including openoffice)
de-selected still fail in wierd and interesting ways (at random points in the
install).

Comment 11 Dave 2006-04-28 08:14:37 UTC
Created attachment 128347 [details]
output from lsmod when booting with a rescue cd

I booted the machine with the fc5 rescue cd and captured the output from lsmod

Comment 12 Dave 2006-04-28 08:15:29 UTC
Created attachment 128348 [details]
output from lspci when booting with a rescue cd

I booted the machine with the fc5 rescue cd and captured the output from lspci

Comment 13 Dave 2006-04-28 08:16:05 UTC
Created attachment 128349 [details]
output from dmesg when booting with a rescue cd

I booted the machine with the fc5 rescue cd and captured the output from dmesg

Comment 14 Dave 2006-04-28 08:18:27 UTC
Standard ext3 partitioned install (i.e. no lvm) also fails in the same way.

Comment 15 Dave 2006-04-28 08:53:43 UTC
Just done another install with vga=791 and sat on vt4 until the kernel blew
up... this is what I saw.

kernel bug at include/linux/list.h:167!
invalid opcode: 000 [#1]
last sysfs file: /block/ram0/dev

and a whole lot of kernel dump, after 120 seconds it continues to include:

kernel panic - not syncing : fatal exception in interrupt

followed by more kernel screaming until a hard lock.

Suggestions?

Comment 16 David Timms 2006-04-28 14:16:50 UTC
Good work Dave. Unfortunately I'm not a kernel developer myself, but from past
experience this sort of information is what is needed by developers to make any
progress. It's great to see you are willing to spend time to get to the heart of
the problem.
Other suggestions from me:
- Do the install with serial port logging - don't know if this works during
anaconda (you would need another machine and a serial crossover cable.)
http://www.tldp.org/HOWTO/Remote-Serial-Console-HOWTO/configure-kernel-grub.html
- Try the install with as many as possible of your expansion cards / usb devices
removed.
- Is the Adaptec card a hardware raid ? (ie does it do all the crc / mirroring /
caching etc ?) Could you try remove the raiding and just use the disks direct as
scsi disks ?  
- If you can get a minimal install to complete, try a runlevel 3 boot, and then
yum update kernel, and reboot.  and then maybe just the main low level packages
like selinux-policy, -targeted (pops up everywhere), util-linux
- If the issue is really in kernel code, unfortunately updating once installed
isn't going to fix the installer kernel :(  but there might be some installer
options that allow the install to complete:
- During FC5 testing, daily creation of boot images and rescue iso was being
performed. It would be interesting to do a minimal ftp network install using the
rawhide rescue.iso, to see if current development track has the same issue. This
would take a fair bit of downloading ...
http://download.fedora.redhat.com/pub/fedora/linux/core/development/isos/
- I did have trouble with FC3 on a Dell PE2650 with aacraid: it would
occassionally abort the journal and remount read-only - from the reading I did
at the time: there was kernel-smp v aacraid v raid bios v raid write caching
issue (some suggested disabling write caching on the controller). The eventual
fix was kernel and raid bios and system bios updated numerous iterations, and
the machine is now rock solid.

Comment 17 Dave 2006-04-28 21:52:35 UTC
No problem, and thanks!

Will give the serial port logging a try (or at least read up on it to see if
it's possible given the constraints).

There are no other expansion cards... just video, raid controller and that's it
(mouse and kbd are usb).

Adaptec is hardware raid, it just presents a single volume to the operating
system, will look into straight disk access.

Minimal install may be possible if I remove as much as possible, see if I can
get an install without it blowing up, but it's just a case of luck... and
persistance!

I'll try a rawhide install, I have the bandwidth (actually I have my own rawhide
mirror) so I'll give that a shot.

I hope that I can eventually reach the point where this machine will be rock
solid again, but for now I'm having to make do with a rusty old win xp box
instead of my linux workstation.... sucky!

I'll update this when I've had a chance to run some more of the tests suggested,
thanks for the boost.

Comment 18 Dave 2006-04-29 21:54:52 UTC
Not yet been able to get through even the most minimal installs sucessfully in
the last 8 install attempts. Have de-selected everything except yum and gpm...
still locks hard on the last 8 tries.

The networking in the current rawhide boot iso appears to be badly broken, will
have to wait until it is updated.

*sigh* Are there any other patched fc5 boot iso's out there for any other reason?

Comment 19 Dave 2006-05-01 09:30:53 UTC
Well, just installed from rawhide (boot.iso dated 01/05/2006) and the install
completed sucessfully with no hangs during install.

However, once booted, I recieved journal aborted messages and a whole host of
errors. I've attached some diagnostic information.

Unfortunately I don't know if I've uncovered a rawhide bug, or if this is
related to my FC5 situation.

Does anyone have a list of commands I could use to re-generate an fc5 dvd iso
with the updated kernel?

Attachments to follow.

Comment 20 Dave 2006-05-01 09:33:45 UTC
Created attachment 128434 [details]
anaconda log from rawhide install

Comment 21 Dave 2006-05-01 09:34:47 UTC
Created attachment 128435 [details]
dmesg from rawhide install

Comment 22 Dave 2006-05-01 09:36:27 UTC
Created attachment 128436 [details]
install.log.syslog from rawhide install

Comment 23 Dave 2006-05-01 09:37:30 UTC
Created attachment 128437 [details]
compressed anaconda.syslog from rawhide install

Comment 24 Dave 2006-05-01 09:55:46 UTC
Created attachment 128438 [details]
lsmod output from rawhide install

Comment 25 Dave 2006-05-01 09:57:30 UTC
Created attachment 128439 [details]
lspci output from rawhide install

Comment 26 Dave 2006-05-02 18:38:16 UTC
Back to testing FC5 now.

I disabled the RAID function of the Adaptec 2100S and the install still died a
death during install, however this time it did not lock up completely (i.e. no
kernel panic) however it did suffer from the read-only filesystem bug.

Due to it not locking up, I managed to coax a log out of anaconda, and attach it
below....

Still no sign of the cavalry on the horizon...


Comment 27 Dave 2006-05-02 18:40:09 UTC
Created attachment 128505 [details]
The anaconda dump during the fc5 installation with the raid disabled

Install still fails, but kernel does not panic.

Please help!

Comment 28 David Timms 2006-05-02 22:02:32 UTC
Dave, I think that at this stage it might be worth removing the raid card,
putting in ide or sata disk, and seeing if an install without the raid card
succeeds. If it does, you might like to try swapping the add-in cards around in
their slots.

When you gave Ubuntu 6.x Beta a whirl, what kernel are they up to ?

Comment 29 Dave 2006-05-03 08:40:30 UTC
The Ubuntu beta was using the 2.6.15 kernel.

I'll try a standard ide/sata disk... I can't go for much longer without a stable
desktop.

I'll also try to move the raid card into another slot.

Comment 30 Dave 2006-05-04 05:23:49 UTC
Still using the adaptec raid card for now. I Installed FC4 cleanly with no
errors and ran bonnie++ with the following options:
bonnie++ -s 4096 -r 1024 -n 5 -x 10

... bonnie completed its run with no problems, install kernel version was
2.6.11-1.1369_FC4.

I then ran yum update kernel, which pulled down version 2.6.16-1.2107_FC4
rebooted and ran the same commands....

The kernel panic'd and the machine locked up some time during the run.

Sounds like whatever is affecting FC5 also affected the last (at least 1 maybe
more) errata kernels for FC4.

I'll see if I can locate some of the other errata kernels to perform a series of
tests.

Comment 31 Dave 2006-05-04 23:29:17 UTC
I've located the following FC4 update kernels and will be testing their
stability under load on this hardware.

kernel-2.6.12-1.1369_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.12-1.1387_FC4.i686.rpm - 
kernel-2.6.12-1.1390_FC4.i686.rpm - 
kernel-2.6.12-1.1398_FC4.i686.rpm - 
kernel-2.6.12-1.1447_FC4.i686.rpm - 
kernel-2.6.12-1.1456_FC4.i686.rpm - 
kernel-2.6.13-1.1526_FC4.i686.rpm - 
kernel-2.6.13-1.1532_FC4.i686.rpm - 
kernel-2.6.14-1.1637_FC4.i686.rpm - 
kernel-2.6.14-1.1644_FC4.i686.rpm - 
kernel-2.6.14-1.1653_FC4.i686.rpm - 
kernel-2.6.14-1.1656_FC4.i686.rpm - 
kernel-2.6.15-1.1830_FC4.i686.rpm - 
kernel-2.6.15-1.1831_FC4.i686.rpm - 
kernel-2.6.15-1.1833_FC4.i686.rpm - 
kernel-2.6.16-1.2069_FC4.i686.rpm - 
kernel-2.6.16-1.2096_FC4.i686.rpm - 
kernel-2.6.16-1.2107_FC4.i686.rpm - UNSTABLE - Kernel Panic.

I'll update this table as I get further results.

Comment 32 Dave 2006-05-06 08:27:22 UTC
kernel-2.6.12-1.1369_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.12-1.1387_FC4.i686.rpm - 
kernel-2.6.12-1.1390_FC4.i686.rpm - 
kernel-2.6.12-1.1398_FC4.i686.rpm - 
kernel-2.6.12-1.1447_FC4.i686.rpm - 
kernel-2.6.12-1.1456_FC4.i686.rpm - 
kernel-2.6.13-1.1526_FC4.i686.rpm - 
kernel-2.6.13-1.1532_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.14-1.1637_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.14-1.1644_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.14-1.1653_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.14-1.1656_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.15-1.1830_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.15-1.1831_FC4.i686.rpm - 
kernel-2.6.15-1.1833_FC4.i686.rpm - 
kernel-2.6.16-1.2069_FC4.i686.rpm - 
kernel-2.6.16-1.2096_FC4.i686.rpm - 
kernel-2.6.16-1.2107_FC4.i686.rpm - UNSTABLE - Kernel Panic.

Comment 33 Dave 2006-05-06 14:16:02 UTC
kernel-2.6.12-1.1369_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.12-1.1387_FC4.i686.rpm - 
kernel-2.6.12-1.1390_FC4.i686.rpm - 
kernel-2.6.12-1.1398_FC4.i686.rpm - 
kernel-2.6.12-1.1447_FC4.i686.rpm - 
kernel-2.6.12-1.1456_FC4.i686.rpm - 
kernel-2.6.13-1.1526_FC4.i686.rpm - 
kernel-2.6.13-1.1532_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.14-1.1637_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.14-1.1644_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.14-1.1653_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.14-1.1656_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.15-1.1830_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.15-1.1831_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.15-1.1833_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.16-1.2069_FC4.i686.rpm - UNSTABLE - Kernel Panic.
kernel-2.6.16-1.2096_FC4.i686.rpm - UNSTABLE - Kernel Panic.
kernel-2.6.16-1.2107_FC4.i686.rpm - UNSTABLE - Kernel Panic.

Looks like the bug was introduced in the 2.6.16 kernel... will see if it's
present in the vanilla 2.6.16...

Comment 34 Dave 2006-05-07 16:58:15 UTC
Updated with latest errata kernel:

kernel-2.6.12-1.1369_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.12-1.1387_FC4.i686.rpm - 
kernel-2.6.12-1.1390_FC4.i686.rpm - 
kernel-2.6.12-1.1398_FC4.i686.rpm - 
kernel-2.6.12-1.1447_FC4.i686.rpm - 
kernel-2.6.12-1.1456_FC4.i686.rpm - 
kernel-2.6.13-1.1526_FC4.i686.rpm - 
kernel-2.6.13-1.1532_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.14-1.1637_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.14-1.1644_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.14-1.1653_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.14-1.1656_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.15-1.1830_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.15-1.1831_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.15-1.1833_FC4.i686.rpm - STABLE - Bonnie test run complete.
kernel-2.6.16-1.2069_FC4.i686.rpm - UNSTABLE - Kernel Panic.
kernel-2.6.16-1.2096_FC4.i686.rpm - UNSTABLE - Kernel Panic.
kernel-2.6.16-1.2107_FC4.i686.rpm - UNSTABLE - Kernel Panic.
kernel-2.6.16-1.2108_FC4.i686.rpm - UNSTABLE - Kernel Panic.

Comment 35 Need Real Name 2006-05-08 21:15:49 UTC
I've filed a bug report which appears to be a dupe of this (thanks for pointing
that out).

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=190340

What you're describing here I've verified on *multiple* servers with both SATA
and SCSI disks and 3 flavors of Adaptec zero-channel RAID cards. The common
denominator, all used i2o drivers. Similar boxes with different Adaptec cards
using aacraid drivers appear to be fine. I knew it was somewhere after 2.6.14
where the bug was introduced but hadn't yet tracked it down to the exact
version, bravo on your fortitude there. 

Comment 36 Dave 2006-05-08 23:05:38 UTC
Many thanks, it's good to know I'm not alone!

It's not so much fortitude, as bloody mindedness ;o)

I want to try a vanilla 2.6.16 kernel on FC4 asap to see if the same bugs are
present in the upstream kernel or if it's something specific to Fedora, however
my time is in very short supply at the moment, and it'll take me some time to
organise.

Alternatively I might try another distro that's also using 2.6.16 and see if
that suffers from the same problems... 

I've modified the summary to include the "i2o" keyword which was not present in
the origonal.

Comment 37 Dave 2006-05-09 16:14:52 UTC
I have recieved information from Markus Lidel (see i2o.shadowconnect.com) that
instability with i2o under the 2.6.16 kernel is a potential known problem.

Other people are reporting similar issues.

He's currently awaiting hardware for test purposes.... I'll update when I hear
anything else from him.

Comment 38 Danny Yee 2006-05-12 01:41:06 UTC
Not sure if this is the same problem, but I can't get either of my two servers
with i2o arrays to boot using any of the FC4 2.6.16 kernels.  See bug 191357 for
details.


Comment 39 Dave 2006-05-12 08:52:17 UTC
Yup thats exactly the same problem. I'm afraid yours is a duplicate, please mark
it as such.

Comment 40 Brian 2006-05-15 19:45:29 UTC
I can confirm this bug. System crashes/locks under even moderate load. If I boot
off of a 2.6.15 or lower kernel, the system runs fine.

My hardware:
i386 - Fedora Core 5
Adaptec 2100S Raid Controller
Pentium 4 3Ghz

For what its worth, I suspect it is not specific to RedHat/Fedora, as the same
thing happens on a Gentoo 2.6.16 kernel


Comment 41 Joni Bäcklund 2006-05-16 09:05:40 UTC
I can also confirm this bug. It gave me one bad night when upgrade failed, I
managed to get it done at 04:00AM. After 7 upgrades and lots of hardware changes
to confirm the problem. I saw in the kernel logs that it was kind of irq
handling bug ( at least by the kernel crash dump ) 

My hardware:
i386
Adaptec 2100S Raid Controller
Pentium 4 2.8Ghz

Now Im now running kernel 2.6.15-1.1833_FC4smp on that system without any problems. 

Is there any way to boot kernel 2.6.15-1.1833 from HD with grub and then use DVD
as installation root. I have to upgrade the other server also in near future and
I do not want to do it 7 times :). This server has different hardware setup but
also 
it runs adaptec 2100S. 

I hope the fix will follow soon as there are some issues on 2.16.15 that I want
to avoid. 

I could of course build own trial kernel with 2.6.16.16 and i2o drivers from
2.6.15 series.

( version 1.288 instead of 1.325 )

Joni





Comment 42 Dave 2006-05-16 15:16:45 UTC
I don't think there's a way to install in the way you mention, however it is
possible to re-spin the FC5 iso with a 2.6.15 kernel... 

http://www.users.on.net/~rgarth/weblog/fedora/patch_cd.autumn

I'm looking to give this a try when I get some free time, let me know if you get
a chance to try it before I do.

P.S. Don't suppose you have a copy of the kernel oops?

Comment 43 Joni Bäcklund 2006-05-16 16:06:36 UTC
Sorry ! I do not have the kernel oopses available at this time. The server is
located 65km away from my house and I do NOT want to crash it remotely :)

I can try to save oops message on Thursday when I will do other updates at the
server.



Comment 44 Dave 2006-05-16 20:05:02 UTC
I'm in a similar (remote from the machine) situation so I can relate!

I've raised a bug upstream on the kernel, but the one thing we're currently
lacking is a kernel oops message... I did manage to snap a few pics, but I'm
currently remote from both the site, and the camera!

Look forward to seeing an oops message from you soon ;o)

Comment 45 Joni Bäcklund 2006-05-18 10:14:30 UTC
Created attachment 129404 [details]
oops message from kernel 2.6.16 smp FC5

Comment 46 Joni Bäcklund 2006-05-18 10:23:34 UTC
So I have the oops here.. 

I managed to make FC5 installation DVD with latest update RPMS and kernel 2.6.15
from FC4 that has old driver.. It runs but do not find i2o based harddisks.. I
see them in kernel log. I have done manually devices /dev/i2o/hda etc.. but anaconda
does not seem to find them.. So no go with fc4 kernel + fc5 dvd :) 

It works on other media such as normal scsi or ide. I was nice to learn how to
make such DVD though

Joni

Comment 47 Joni Bäcklund 2006-05-18 10:40:08 UTC
Sorry forgot to dump the oops here. Did only put it as attachment.



Unable to handle kernel NULL pointer dereference at virtual address 00000004
 printing eip:
c015e77d
*pde = 32cd9001
Oops: 0000 [#1]
SMP
last sysfs file: /class/net/sit0/address
Modules linked in: ipt_REJECT xt_tcpudp iptable_filter xt_state ipt_MASQUERADE
ip_nat_irc ip_nat_ftp iptab
CPU:    1
EIP:    0060:[<c015e77d>]    Not tainted VLI
EFLAGS: 00010046   (2.6.16-1.2111_FC5smp #1)
EIP is at free_block+0x51/0xe5
eax: 00000000   ebx: f7ff8540   ecx: f7205000   edx: f7d9d000
esi: f7205ac0   edi: f7f04e80   ebp: 00000004   esp: f7e02ef8
ds: 007b   es: 007b   ss: 0068
Process events/1 (pid: 9, threadinfo=f7e02000 task=f7f35630)
Stack: <0>c02f1fe0 00000005 c1bce5e0 c1bce5e0 00000005 c1bce5c0 00000000 c015e86d
       00000000 f7f04e80 c1a00280 f7ff8540 f7f04e80 00000000 c015fcb5 00000000
       f7ff856c f7f04f54 c1a00280 c1a00284 c1bce140 00000282 c01313b4 c015fc46
Call Trace:
 [<c02f1fe0>] _spin_unlock_irq+0x5/0x7     [<c015e86d>] drain_array_locked+0x5c/0x7b
 [<c015fcb5>] cache_reap+0x6f/0x191     [<c01313b4>] run_workqueue+0x7f/0xba
 [<c015fc46>] cache_reap+0x0/0x191     [<c0131ba1>] worker_thread+0x0/0x117
 [<c0131c87>] worker_thread+0xe6/0x117     [<c011daa9>]
default_wake_function+0x0/0xc
 [<c01344b9>] kthread+0x9d/0xc9


Comment 48 Joni Bäcklund 2006-05-18 15:33:20 UTC
One other odd thing I found out during debuggin is that old driver from
2.6.15-1.1833_FC4smp uses totally different
irq settings as new 2.6.16 does. Maybe the way that kernel routes irq:s has been 
changed. Old uses "irq 177" instead of "irq 17" on new kernek. The bios settings
and everyting else is 100% same. Maybe this is the way it should be as all other 
irq:s are also routed differently. 

See below:

old working version:

I2O subsystem v1.288
i2o: max drivers = 8
i2o: Checking for PCI I2O controllers...
ACPI: PCI Interrupt 0000:01:0b.1[A] -> GSI 22 (level, low) -> IRQ 177
iop0: controller found (0000:01:0b.1)
iop0: using 64-bit DMA
iop0: PCI I2O controller at F8000000 size=1048576
iop0: Installed at IRQ 177
i2o: iop0: Activating I2O controller...
i2o: iop0: This may take a few minutes if there are many devices
iop0: HRT has 1 entries of 16 bytes each.
Adapter 00000012: <7>TID 0000:[<7>H<7>P<7>C<7>*<7>]:<7>PCI 1: Bus 1 Device
22 Function 0<7>
i2o: iop0: Controller added
I2O Block Device OSM v1.287


new locking version:

I2O subsystem v1.325
i2o: max drivers = 8
i2o: Checking for PCI I2O controllers...
ACPI: PCI Interrupt 0000:01:0b.1[A] -> GSI 22 (level, low) -> IRQ 17
iop0: controller found (0000:01:0b.1)
iop0: using 64-bit DMA
iop0: PCI I2O controller at F8000000 size=1048576
iop0: Installed at IRQ 17
i2o: iop0: Activating I2O controller...
i2o: iop0: This may take a few minutes if there are many devices
iop0: HRT has 1 entries of 16 bytes each.
Adapter 00000012: <7>TID 0000:[<7>H<7>P<7>C<7>*<7>]:<7>PCI 1: Bus 1 Device 22
Function 0<7>
i2o: iop0: Controller added
I2O Block Device OSM v1.325
block-osm: registered device at major 80
 i2o/hda: i2o/hda1 i2o/hda2 i2o/hda3
block-osm: device added (TID: 20c): i2o/hda




Comment 50 Dave 2006-05-18 23:27:25 UTC
Nice chunk of debugging there, good job!



Comment 51 Dave 2006-06-15 09:57:04 UTC
Re-submitted due to data loss during bugzilla.redhat.com hardware failure

I raised an upstream bug at kernel.org and have since sucessfully tested a
kernel patch that has been included in linus' kernel.

This fix will be present in 2.6.17. Is there any chance of anyone backporting it
into 2.6.16 before Fedora goes 2.6.17?

http://bugzilla.kernel.org/show_bug.cgi?id=6561

Comment 52 Danny Yee 2006-06-30 01:51:07 UTC
Is this fixed in 2.6.17-1.2139_FC4 and 2.6.17-1.2139_FC5?

Comment 53 Dave 2006-06-30 06:14:47 UTC
I'll be able to tell you at the weekend... when I'll have access to the server
again.

I'll test and reply here.

Comment 54 Dave 2006-07-04 05:59:28 UTC
Closed, fixes appear to be present in 2.6.17-1.2139_FC4.

Will wait for the next Fedora Unity respin of FC5 to come with the 2.6.17 kernel
to see if I can actually finaly install FC5.

Note: Under load the system appears to be more sluggish than I remember it
being, but at least it stays up and stable.

Any other observations welcome!

Comment 55 Dave Russell 2006-09-03 17:40:26 UTC
*** Bug 191357 has been marked as a duplicate of this bug. ***

Comment 56 Dave Russell 2006-09-03 17:41:10 UTC
*** Bug 190340 has been marked as a duplicate of this bug. ***