Bug 69552
| Summary: | lots of disk writes lock out interactive processes | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] Red Hat Linux | Reporter: | Tom Karzes <karzes> | ||||||
| Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 7.3 | CC: | gbailey | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | i686 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2003-06-07 19:13:54 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Tom Karzes
2002-07-23 01:59:43 UTC
Created attachment 66424 [details]
example program dw.c
I've attached the source of the program I mentioned
which illustrates this problem very strongly. It
takes two arguments, a block size and a count.
It then creates a buffer with the specified
block size, fills it with 0xff bytes, and
writes it to standard output "count" times,
using direct, unbuffered "write" system calls.
The total output size is block_size*count.
To compile, simply:
% gcc dw.c -o dw
Then try running it with a buffer size of 8192 and
a count of 40000 (for a total of 327680000 bytes).
Run it in background, at boot level 3. When I do
this, it completely locks up my machine:
% dw 8192 40000 > dw.out &
This should make it much easier to track down this
problem.
question 1: if this is IDE, is DMA enabled ? question 2: does changing the "elvtune" parameters to lower values (see man elvtune for syntax) improve things ? In answer to your queastions:
1. I have two 80GB IDE drives. I had assumed
that DMA was enabled, but now it appears that
it is not. I tried executing the "hdparm"
commands I was instructed to use, and there
was a problem enabling DMA. I will attach
the log and additional comments in the next entry.
2. I have not used "elvtune" before. I'm willing
to give it a try, but I have no idea what values
to use. If I just run it with no arguments, I
get the following:
/dev/hda elevator ID 0
read_latency: 8192
write_latency: 16384
max_bomb_segments: 6
The results for /dev/hdb are identical. What
values should I try? Is this safe, or is there
a risk of losing my disk if I use bad values here?
Created attachment 66486 [details]
here's a log of my "hdparm" attempts
Here's a copy of the update I made to ticket #209989,
regarding my attempts to execute the "hdparm" commands
I was given (I attached the log in the previous update):
Ok, I tried the "hdparm" commands that were outlined. Please
note that the syntax given in the ticket entry was incorrect:
all of the arguments prior to the device name are options and
must be preceded by "-". So instead of:
hdparm -u1 d1 c1 X66 /dev/hda
one must type:
hdparm -u1 -d1 -c1 -X66 /dev/hda
Anyway, once I figured this out, I tried it out. Unfortunately,
I had some problems. It is refusing to enable DMA (I had assumed
that DMA had been enabled right from the start, so this was a bit
disturbing). Anyway when it tried to enable it, it got an
"HDIO_SET_DMA failed: Operation not permitted" error. This doesn't
sound right, does it??? This is a very new mother board and very
new hard disks -- how can DMA not be supported? When it changed
the speed to -X66, it appears to have worked, although at the time
it was changed I got some "ide0: unexpected interrupt, status=0x58,
count=1" errors -- are they expected when the speed is changed?
Second, when I tried to increase the speed to -X68, I got a new
error: "ide0: Speed warnings UDMA 3/4/5 is not functional." Should
it be? The motherboard is an Intel Desktop D845EBG2L, and purports
to support UltraDMA 100/66/33, and the disks are ATA133 Maxtor drives.
Anyway, I'm attaching a log of the hdparm attempts (with the errors
that were logged to the console inserted at the points where they
occurred). I also tried my lock-up example after these changes, and
the problem still exists.
After this I rebooted my machine, without making any changes to
/etc/rc.d/rc.local. I figured I'd rather find out what's going
on first, and in the meantime it seemed safest to run at the
original settings.
I've seen this on MANY of our Compaq 1850R's running RedHat 7.1, 7.2, and 7.3. All of them have hardware RAID. I noticed Bugzilla #33309, but didn't see what the resolution was, as this problem still seems to be occuring, even with the latest kernel (2.4.18-5). It turns out the problem is that the RedHat 7.3 kernel doesn't support the Intel Desktop D845EBG2L motherboard, due to a BIOS problem which prevents Linux from enabling DMA. This results in a *MASSIVE* slowdown in disk performance, and in addition heavy disk traffic causes other processes to lock up. Until RedHat comes out with an official, complete patch for Intel's BIOS bug (I'm not holding my breath for Intel to release a BIOS that fixes this; the Intel guy I talked to told me flat out that Linux is not supported), I strongly advise people to avoid Intel motherboards. In the meantime, I've built a patched kernel that bypasses some resource checks and gets me to mode 2 (but not to mode 5, which is what it should be using). I would *really* like a proper patch for this bug which eliminates the need for this complete workaround. We have a patch in testing in the rawhide kernel for this btw and re elvtune: elvtune cannot cause dataloss, it just changes the parameter the kernel uses to reorder requests. Low values mean "almost no sorting" while high values mean agressive sorting. Sorting is sort of nice for throughput (disks have shorter head moves -> shorter seek times) but can give some starvation, hence it being limited. values of 64/256 I've played with and seem to improve response a bit, but do cost throughput noticably I've tried some of the 2.4.19 kernels from www.kernel.org. I first tried 2.4.19-rc3, which does *not* fix this problem. I then tried 2.4.19-rc3-ac3, which *does* fix the problem, so for now I'm using this kernel. Hopefully Alan's fixes for this will make it into 2.4.19-rc4 and eventually into an official RedHat kernel, but for now this is the only kernel I can use without experiencing severe performance penalties. So far I haven't had any problems with it. I also noticed that 2.4.19-rc3-ac4 is now available, although I haven't tried it yet, and probably won't unless I experience any problems. as I said, the rawhide kernel (and the Limbo beta kernel) have the needed patch. If you prefer RPM kernels you could chose to use one of those I didn't realize the rawhide kernel was currently available
from RedHat. I wouldn't mind trying it, but I find the RPM
download interface somewhat cumbersome to use (and at the
moment, the search mechanism appears to be somewhat broken).
What I really want is a list of available kernels, but the
search mechanism doesn't appear to be working properly.
Here's what I did:
o Open www.redhat.com
o Click on "download"
o Under "Find latest RPM's"
o Under "By Category"
o Select "-Kernel" under "Base" and click "Search"
(No hits)
o Select "Base" directly and click "Search"
Lots of stuff comes up, most of it
non-kernel.
o Under "By Keyword"
o enter "kernel" and click "Search'
This one seems to work. A bunch of kernel
releases pop up. However, when I try to
access the second page, I get "Proxy Error"
(from the RedHat site):
Proxy Error
We're sorry! We are currently experiencing
technical difficulties. Please try again later.
Presumably this is a temporary problem that will be
fixed soon. In any case, none of the visible pages
mentioned "rawhide", nor did a search on "rawhide"
yield anything.
Is there a simpler, more reliable way to get a list of the
available kernel RPMs?
It was suggested that I try the "rawhide" or "Limbo" kernels from the RPM database, but the RPM search/download mechanism appears to be broken, at least for kernel searches. I've therefore created a new ticket for this problem, ticket 210823, which contains the details of the RPM search/download problems. current erratum has elevator tuning patches to fix this |