Bug 190340

Summary: I2O oops in all FC5 and FC4 >2.6.14 kernels
Product: [Fedora] Fedora Reporter: Need Real Name <wendell>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: meherenow, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-09-03 17:40:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2006-05-01 13:23:56 UTC
Description of problem: There appears to be a bug with the I2O drivers affecting
all release kernels in FC5 and FC4 updated kernels at least >2.6.14


Version-Release number of selected component (if applicable): For FC5 the kernel
on the install CD is affected. For FC4 at least 2.6.16-1.2096 is. 2.6.14-1.1656
appears stable, I have a heavily used box running it and I2O that's been up 96
days currently.


How reproducible: Just cause lots of disk I/O. Quick way to crash the box is
thi: for i in `seq 1 90`; do echo $i; dd if=/dev/zero of=/data/junk.$i
count=250000 bs=4096 1>/dev/null 2>&1; done


Steps to Reproduce:
1. Lots of I/O, see above
2.
3.
  
Actual results: From an ssh connection I see this:
[root@localhost ~]# !for
for i in `seq 1 90`; do echo $i; dd if=/dev/zero of=/data/junk.$i count=250000
bs=4096 1>/dev/null 2>&1; done
1
2
3
4
5

Message from syslogd@localhost at Mon May  1 08:45:22 2006 ...
localhost kernel: ------------[ cut here ]------------

On the console I get a few screens worth of messages followed by a countdown of
"continuing in 120" which counts down to 0 but never does anything after that.


Expected results:


Additional info: Looking through dmesg output I see messages about PCI and a
suggestion to use "pci=routeirq" as a boot-time kernel parameter. I'm testing
that now and so far that looping dd has made it further than usual on
2.6.16-1.2096SMP kernel. 2.6.11 and 2.6.14 appear to be able to complete it
indefinitely, or at least as long as disk space holds out.

I'm not sure if this is a "bug" or the "routeirq" thing is common. I only report
it because a few boxes I had with FC4 and I2O drivers started crashing like
crazy after a kernel upgrade (via yum). I also tried an in-place upgrade to FC5
which was disasterous since the stock kernel with it has the problem. I've not
tried that yet with pci=routeirq on the CD boot line. Also the line in dmesg
suggests trying pci=routeirq and also to report the results if it's beneficial.

Hardware: SuperMicro 7043M-6 and other similar boxes with Adaptec 2000S and
similar hardware RAID cards.

If any log files or additional info might be helpful, let me know...

Comment 1 Need Real Name 2006-05-01 16:52:48 UTC
Without "pci=routeirq" the "dd" test above would complete only about 5
iterations before locking up. With that boot string it completed 88 before
locking up.

Comment 2 Dave 2006-05-08 21:03:40 UTC
I think this is probably a duplicate of my bug...

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=189570

Please let me know if you can mirror my testing...

Comment 3 Need Real Name 2006-05-08 21:10:38 UTC
Yes, appears to be a dupe. I searched and searched and of course found nothing
even close to this before filing the report, ah well...

Comment 4 Dave Russell 2006-09-03 17:40:46 UTC

*** This bug has been marked as a duplicate of 189570 ***