Bug 69552
Summary: | lots of disk writes lock out interactive processes | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Tom Karzes <karzes> | ||||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 7.3 | CC: | gbailey | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2003-06-07 19:13:54 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Tom Karzes
2002-07-23 01:59:43 UTC
Created attachment 66424 [details]
example program dw.c
I've attached the source of the program I mentioned which illustrates this problem very strongly. It takes two arguments, a block size and a count. It then creates a buffer with the specified block size, fills it with 0xff bytes, and writes it to standard output "count" times, using direct, unbuffered "write" system calls. The total output size is block_size*count. To compile, simply: % gcc dw.c -o dw Then try running it with a buffer size of 8192 and a count of 40000 (for a total of 327680000 bytes). Run it in background, at boot level 3. When I do this, it completely locks up my machine: % dw 8192 40000 > dw.out & This should make it much easier to track down this problem. question 1: if this is IDE, is DMA enabled ? question 2: does changing the "elvtune" parameters to lower values (see man elvtune for syntax) improve things ? In answer to your queastions: 1. I have two 80GB IDE drives. I had assumed that DMA was enabled, but now it appears that it is not. I tried executing the "hdparm" commands I was instructed to use, and there was a problem enabling DMA. I will attach the log and additional comments in the next entry. 2. I have not used "elvtune" before. I'm willing to give it a try, but I have no idea what values to use. If I just run it with no arguments, I get the following: /dev/hda elevator ID 0 read_latency: 8192 write_latency: 16384 max_bomb_segments: 6 The results for /dev/hdb are identical. What values should I try? Is this safe, or is there a risk of losing my disk if I use bad values here? Created attachment 66486 [details]
here's a log of my "hdparm" attempts
Here's a copy of the update I made to ticket #209989, regarding my attempts to execute the "hdparm" commands I was given (I attached the log in the previous update): Ok, I tried the "hdparm" commands that were outlined. Please note that the syntax given in the ticket entry was incorrect: all of the arguments prior to the device name are options and must be preceded by "-". So instead of: hdparm -u1 d1 c1 X66 /dev/hda one must type: hdparm -u1 -d1 -c1 -X66 /dev/hda Anyway, once I figured this out, I tried it out. Unfortunately, I had some problems. It is refusing to enable DMA (I had assumed that DMA had been enabled right from the start, so this was a bit disturbing). Anyway when it tried to enable it, it got an "HDIO_SET_DMA failed: Operation not permitted" error. This doesn't sound right, does it??? This is a very new mother board and very new hard disks -- how can DMA not be supported? When it changed the speed to -X66, it appears to have worked, although at the time it was changed I got some "ide0: unexpected interrupt, status=0x58, count=1" errors -- are they expected when the speed is changed? Second, when I tried to increase the speed to -X68, I got a new error: "ide0: Speed warnings UDMA 3/4/5 is not functional." Should it be? The motherboard is an Intel Desktop D845EBG2L, and purports to support UltraDMA 100/66/33, and the disks are ATA133 Maxtor drives. Anyway, I'm attaching a log of the hdparm attempts (with the errors that were logged to the console inserted at the points where they occurred). I also tried my lock-up example after these changes, and the problem still exists. After this I rebooted my machine, without making any changes to /etc/rc.d/rc.local. I figured I'd rather find out what's going on first, and in the meantime it seemed safest to run at the original settings. I've seen this on MANY of our Compaq 1850R's running RedHat 7.1, 7.2, and 7.3. All of them have hardware RAID. I noticed Bugzilla #33309, but didn't see what the resolution was, as this problem still seems to be occuring, even with the latest kernel (2.4.18-5). It turns out the problem is that the RedHat 7.3 kernel doesn't support the Intel Desktop D845EBG2L motherboard, due to a BIOS problem which prevents Linux from enabling DMA. This results in a *MASSIVE* slowdown in disk performance, and in addition heavy disk traffic causes other processes to lock up. Until RedHat comes out with an official, complete patch for Intel's BIOS bug (I'm not holding my breath for Intel to release a BIOS that fixes this; the Intel guy I talked to told me flat out that Linux is not supported), I strongly advise people to avoid Intel motherboards. In the meantime, I've built a patched kernel that bypasses some resource checks and gets me to mode 2 (but not to mode 5, which is what it should be using). I would *really* like a proper patch for this bug which eliminates the need for this complete workaround. We have a patch in testing in the rawhide kernel for this btw and re elvtune: elvtune cannot cause dataloss, it just changes the parameter the kernel uses to reorder requests. Low values mean "almost no sorting" while high values mean agressive sorting. Sorting is sort of nice for throughput (disks have shorter head moves -> shorter seek times) but can give some starvation, hence it being limited. values of 64/256 I've played with and seem to improve response a bit, but do cost throughput noticably I've tried some of the 2.4.19 kernels from www.kernel.org. I first tried 2.4.19-rc3, which does *not* fix this problem. I then tried 2.4.19-rc3-ac3, which *does* fix the problem, so for now I'm using this kernel. Hopefully Alan's fixes for this will make it into 2.4.19-rc4 and eventually into an official RedHat kernel, but for now this is the only kernel I can use without experiencing severe performance penalties. So far I haven't had any problems with it. I also noticed that 2.4.19-rc3-ac4 is now available, although I haven't tried it yet, and probably won't unless I experience any problems. as I said, the rawhide kernel (and the Limbo beta kernel) have the needed patch. If you prefer RPM kernels you could chose to use one of those I didn't realize the rawhide kernel was currently available from RedHat. I wouldn't mind trying it, but I find the RPM download interface somewhat cumbersome to use (and at the moment, the search mechanism appears to be somewhat broken). What I really want is a list of available kernels, but the search mechanism doesn't appear to be working properly. Here's what I did: o Open www.redhat.com o Click on "download" o Under "Find latest RPM's" o Under "By Category" o Select "-Kernel" under "Base" and click "Search" (No hits) o Select "Base" directly and click "Search" Lots of stuff comes up, most of it non-kernel. o Under "By Keyword" o enter "kernel" and click "Search' This one seems to work. A bunch of kernel releases pop up. However, when I try to access the second page, I get "Proxy Error" (from the RedHat site): Proxy Error We're sorry! We are currently experiencing technical difficulties. Please try again later. Presumably this is a temporary problem that will be fixed soon. In any case, none of the visible pages mentioned "rawhide", nor did a search on "rawhide" yield anything. Is there a simpler, more reliable way to get a list of the available kernel RPMs? It was suggested that I try the "rawhide" or "Limbo" kernels from the RPM database, but the RPM search/download mechanism appears to be broken, at least for kernel searches. I've therefore created a new ticket for this problem, ticket 210823, which contains the details of the RPM search/download problems. current erratum has elevator tuning patches to fix this |