Bug 190340

Summary:	I2O oops in all FC5 and FC4 >2.6.14 kernels
Product:	[Fedora] Fedora	Reporter:	Need Real Name <wendell>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED DUPLICATE	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	5	CC:	meherenow, wtogami
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-09-03 17:40:46 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Need Real Name 2006-05-01 13:23:56 UTC

Description of problem: There appears to be a bug with the I2O drivers affecting
all release kernels in FC5 and FC4 updated kernels at least >2.6.14


Version-Release number of selected component (if applicable): For FC5 the kernel
on the install CD is affected. For FC4 at least 2.6.16-1.2096 is. 2.6.14-1.1656
appears stable, I have a heavily used box running it and I2O that's been up 96
days currently.


How reproducible: Just cause lots of disk I/O. Quick way to crash the box is
thi: for i in `seq 1 90`; do echo $i; dd if=/dev/zero of=/data/junk.$i
count=250000 bs=4096 1>/dev/null 2>&1; done


Steps to Reproduce:
1. Lots of I/O, see above
2.
3.
  
Actual results: From an ssh connection I see this:
[root@localhost ~]# !for
for i in `seq 1 90`; do echo $i; dd if=/dev/zero of=/data/junk.$i count=250000
bs=4096 1>/dev/null 2>&1; done
1
2
3
4
5

Message from syslogd@localhost at Mon May  1 08:45:22 2006 ...
localhost kernel: ------------[ cut here ]------------

On the console I get a few screens worth of messages followed by a countdown of
"continuing in 120" which counts down to 0 but never does anything after that.


Expected results:


Additional info: Looking through dmesg output I see messages about PCI and a
suggestion to use "pci=routeirq" as a boot-time kernel parameter. I'm testing
that now and so far that looping dd has made it further than usual on
2.6.16-1.2096SMP kernel. 2.6.11 and 2.6.14 appear to be able to complete it
indefinitely, or at least as long as disk space holds out.

I'm not sure if this is a "bug" or the "routeirq" thing is common. I only report
it because a few boxes I had with FC4 and I2O drivers started crashing like
crazy after a kernel upgrade (via yum). I also tried an in-place upgrade to FC5
which was disasterous since the stock kernel with it has the problem. I've not
tried that yet with pci=routeirq on the CD boot line. Also the line in dmesg
suggests trying pci=routeirq and also to report the results if it's beneficial.

Hardware: SuperMicro 7043M-6 and other similar boxes with Adaptec 2000S and
similar hardware RAID cards.

If any log files or additional info might be helpful, let me know...

Comment 1 Need Real Name 2006-05-01 16:52:48 UTC

Without "pci=routeirq" the "dd" test above would complete only about 5
iterations before locking up. With that boot string it completed 88 before
locking up.

Comment 2 Dave 2006-05-08 21:03:40 UTC

I think this is probably a duplicate of my bug...

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=189570

Please let me know if you can mirror my testing...

Comment 3 Need Real Name 2006-05-08 21:10:38 UTC

Yes, appears to be a dupe. I searched and searched and of course found nothing
even close to this before filing the report, ah well...

Comment 4 Dave Russell 2006-09-03 17:40:46 UTC


*** This bug has been marked as a duplicate of 189570 ***