Description of problem: There appears to be a bug with the I2O drivers affecting all release kernels in FC5 and FC4 updated kernels at least >2.6.14 Version-Release number of selected component (if applicable): For FC5 the kernel on the install CD is affected. For FC4 at least 2.6.16-1.2096 is. 2.6.14-1.1656 appears stable, I have a heavily used box running it and I2O that's been up 96 days currently. How reproducible: Just cause lots of disk I/O. Quick way to crash the box is thi: for i in `seq 1 90`; do echo $i; dd if=/dev/zero of=/data/junk.$i count=250000 bs=4096 1>/dev/null 2>&1; done Steps to Reproduce: 1. Lots of I/O, see above 2. 3. Actual results: From an ssh connection I see this: [root@localhost ~]# !for for i in `seq 1 90`; do echo $i; dd if=/dev/zero of=/data/junk.$i count=250000 bs=4096 1>/dev/null 2>&1; done 1 2 3 4 5 Message from syslogd@localhost at Mon May 1 08:45:22 2006 ... localhost kernel: ------------[ cut here ]------------ On the console I get a few screens worth of messages followed by a countdown of "continuing in 120" which counts down to 0 but never does anything after that. Expected results: Additional info: Looking through dmesg output I see messages about PCI and a suggestion to use "pci=routeirq" as a boot-time kernel parameter. I'm testing that now and so far that looping dd has made it further than usual on 2.6.16-1.2096SMP kernel. 2.6.11 and 2.6.14 appear to be able to complete it indefinitely, or at least as long as disk space holds out. I'm not sure if this is a "bug" or the "routeirq" thing is common. I only report it because a few boxes I had with FC4 and I2O drivers started crashing like crazy after a kernel upgrade (via yum). I also tried an in-place upgrade to FC5 which was disasterous since the stock kernel with it has the problem. I've not tried that yet with pci=routeirq on the CD boot line. Also the line in dmesg suggests trying pci=routeirq and also to report the results if it's beneficial. Hardware: SuperMicro 7043M-6 and other similar boxes with Adaptec 2000S and similar hardware RAID cards. If any log files or additional info might be helpful, let me know...
Without "pci=routeirq" the "dd" test above would complete only about 5 iterations before locking up. With that boot string it completed 88 before locking up.
I think this is probably a duplicate of my bug... https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=189570 Please let me know if you can mirror my testing...
Yes, appears to be a dupe. I searched and searched and of course found nothing even close to this before filing the report, ah well...
*** This bug has been marked as a duplicate of 189570 ***