Bug 167166
Description
David Kostal
2005-08-31 09:10:17 UTC
Please attach console panic or oops output (capturing it with a serial line if necessary). Thanks in advance. Also please post the I/O error messages you got while running the default megaraid2 driver, as shipped by RH. This would be most helpful if you to use your most recent RH kernel (2.4.21-35.EL? ). Thanks. *** Bug 167167 has been marked as a duplicate of this bug. *** Created attachment 118334 [details]
kernel-2.4.21-35.EL+megaraid2-v2.10.10.1-1dkms crash
(partial) oops trace and messages from 2.4.21-35.EL+megaraid2-v2.10.10.1-1dkms
I'm now running kernel-2.4.21-32.0.1.EL+megaraid2-v2.10.10.1-1dkms for 23 hours without crash. I'll boot with RH megaraid2 driver and try to reproduce a crash. Created attachment 118335 [details]
kernel-2.4.21-32.0.1.EL + RH megaraid v2.10.8.2-RH1 I/O error
Here is an I/O error when running kernel-2.4.21-32.0.1.EL + default RH
megaraid2 (v2.10.8.2-RH1). I have only one I/O error here, because at that time
I didn't have remote syslog set up yet. Also all errors with default megaraid2
driver - as far as I have noticed - were only on hdb/scsi1 - internal perc
4i/DC.
I wasn't able to reproduce the crash or I/O error now even after cca 23hours of testing with kernel-2.4.21-35.EL + RH megaraid driver. I continue testing (now with fresh reboot again). However, what I experience every time is the slow down of the disk operations. After reboot, both sda (on perc 4e/DC) and sdb (on perc 4i/DC) are very fast, when runnig bonnie++, sda can do cca 50-110k blocks per sec and sdb can do cca 40-50k bps (iostat 1). But if I run it first on one disk after that on the socond one OR both at the same time, the performance goes down on both to max 3-4k bps and sometimes even doing nothing for a very long time. Although doing 'dd if=/dev/sda1 | cat - > /dev/null' (or sdb...) helps a bit for short time (eg. from 100bps to 5k bps), it goes down to nearly-no-performance shortly again. If one disk starts to be slow, the other one will become slow too. When running bonnies on both disk simultaneously (after reboot) the performance for both disks is not at the highest level neither, but it "acceptable" :-/ as it does cca 30-40k bps for both disks (counted together). After cca 30minutes it anyway goes down to max 3-4k bps. For example, during todays night, I had 2 bonnies running and both were still running (one was reading already...) after 16hours. There is no background task running on the disks. Both raid controllers It might or might not be related to the crashing issue but this slow-down also makes the machine unusable:( The other problem I've seen is that interactive responsibility is really horible with these disk benchmarks running. The machine is not swapping but the shells might take up to 10 minutes to become responsible. However (tested only once) it became much better when I executed echo 1 > /proc/sys/vm/skip_mapped_pages echo 1 10 15 > /proc/sys/vm/pagecache echo "30 500 0 0 500 3000 80 50 0" > /proc/sys/vm/bdflush as was advised in some other bugs here. Hi again. Unfortunately I wasn't able to reproduce crash in the last week (before I reported this it crashed with all but one kernel (2.4.21-32.0.1.EL + megaraid2 from Dell) I tried. Difference is only that now I run without hyperthreading but in the past I had one crash also without HT. What remains are the performance problems. If I boot with mem=3G there is no performance degradation at all. If I boot with mem=6G the speed goes down to 5-15k bps on sda (from 50-110k) after short time. However this is with controller settings: CachedIO+WRTHRU. The machine also spends 100% in IOwait. If I switch to CachedIO+WRBACK it has no performance problems at all under both 3G and 6G. I'll test it with 12G later. The machine also takes cca 30-70 in System+irq-iowait time. Created attachment 118690 [details]
kernel 2.4.21-35.EL + RH megaraid driver
This is a console dump from crash of 2.4.21-35.EL with original RH megaraid
driver. The crash happened just few hours after boot (2 bonnie++ running) and
the only difference against the previous attempts was that now both logical
dives (hda and hdb) were set up in raid bios to WRBACK + CachedIO.
Created attachment 118791 [details]
Kernel-2.4.21-32.0.1.EL with dell's megaraid2-v2.10.10.1-1dkms crash dump
Kernel crash dump for kernel-2.4.21-32.0.1.EL with Dell's
megaraid2-v2.10.10.1-1dkms
Created attachment 118842 [details]
2.4.21-32.0.1.EL error with Patrol Read disabled
I turned off Patrol Read (as recommended by Dell support) and original RH
kernel 2.4.21-32.0.1.EL froze (I wasn't able to type anything on console,
serial line nor ssh) within 10 minutes and generated a lot of error messages
(see attachement). Shortly before it froze completely, I was able to observe
that:
* bonnie++ on sdb was running without any problem
* bonnie++ on sda was running but frozen
* on scsi0 (sda) were 4 pending commands (/proc/megaraid/0/stat)
* first few error messages apperared on console
I'm now running 2.4.21-32.0.1.EL with Dell's megaraid2...
The problem is solved by new BIOS A02 for PE2850. From the release notes: * Added workaround for lockup resulting from the systems with 8GB RAM or more and RAID storage controller potentially claiming inappropriate addresses. Thanks for help, you can close this bug now. David Closing as NOTABUG based on last comment. |