From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2.1) Gecko/20010901 Description of problem: When doing lots of disk writes (e.g., writing a large file), keyboard and mouse events tend to be ignored for long periods of time (many seconds, or even minutes). There is an existing tech support ticket number for this, ticket #209989. I was advised by the tech support staff to open a bugzilla report for the problem. After experimenting with the problem a fair amount, I found that the best way to reproduce it was to run a program I wrote called "dw.c", which is attached to one of the entries in the ticket # mentioned above. All this program does is a bunch or "write" system calls. If I run it with a buffer size of 8192, and a repeat count of 40000 (i.e., it calls write with a size of 8192, 40000 times in a row, for a total of 327680000 byes of output), it absolutely locks up my machine, for minutes on end. Even at boot level 3, if I kick this off in background, then type some simple command like "date", it will be a minute or so before the word "date" even appears on my screen. The tech support folks noticed that my disks were running slower than they should, and suggested some hdparm commands to speed them up, but this doesn't really seem to address the problem, which I suspect is a kernel bug. My hardware configuation is also given in the problem ticket mentioned above, but basically it's a 2.26GHz P4 processor, Intel motherboard using the 845e chipset, 1GB DDR266 ecc unbuffered memory on two 512MB memory cards, two 80GB maxtor IDE hard drives, and my video card is an ABIT Siluro using the NVIDIA GeForce 4 Ti4600 chipset (for which I had to download and install the NVIDIA video drivers). I'm running Red Hat Professional, version 7.3, and I clicked "everything" when I did the install. I then downloaded, built, and installed the NVIDIA video driver, which worked perfectly. When I noticed the performance problem, I used the RedHat update agent to upgrade to the latest kernel patch level (-5), and I also downloaded the latest glibc packages. I then rebooted, rebuild, and resinstalled the NVIDIA video driver. Everything works fine, except for this performance problem, which I'd really like to track down and fix. The problem is completely reproducible and is extremely annoying -- I should be able to do large file copies etc. in background without being aware of them (except of course disk i/o will slow down), but instead I am seeing my interactive response (mouse & keyboard events, and probably other things as well) getting almost completely locked out. Please contact me if I can provide any more needed information on this problem. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Run a program that does lots of back-to-back disk writes, preferably direct, unbuffered writes. This can be done at boot level 3 or 5. 2.Try to do anything else interactively, e.g. type or move the mouse -- there are enormous delays in the response. These delays should not be occurring. Actual Results: Response slowed to a virtual standstill. Expected Results: Response should be minimally affected. In the case of typing ot moving the mouse, no noticeable slowdown should be seen. Additional info: I'm listing the priority as "high" because I consider this to be a reliability problem: I am concerned that trying to do things like write a CD ROM etc. will fail due to the lack of reasonable response time, plus it is delaying my ability to bring up the rest of the functionality in my system (e.g., CD ROM burner, a firewall box that I bought, etc.) -- much of this equipment is on 30-day warranty, and I would really like to get a patch for this problem soon so that I can not only use my system normally but also move on to bringing up the rest of the functionality and making sure everything's ok. The technical support team has suggested some commands that will speed up my hard drives, but as mentioned, I don't believe this really addresses the problem I've been seeing. In fact, if anything I'd expect running the hard drives faster to make the problem worse, not better.
Created attachment 66424 [details] example program dw.c
I've attached the source of the program I mentioned which illustrates this problem very strongly. It takes two arguments, a block size and a count. It then creates a buffer with the specified block size, fills it with 0xff bytes, and writes it to standard output "count" times, using direct, unbuffered "write" system calls. The total output size is block_size*count. To compile, simply: % gcc dw.c -o dw Then try running it with a buffer size of 8192 and a count of 40000 (for a total of 327680000 bytes). Run it in background, at boot level 3. When I do this, it completely locks up my machine: % dw 8192 40000 > dw.out & This should make it much easier to track down this problem.
question 1: if this is IDE, is DMA enabled ? question 2: does changing the "elvtune" parameters to lower values (see man elvtune for syntax) improve things ?
In answer to your queastions: 1. I have two 80GB IDE drives. I had assumed that DMA was enabled, but now it appears that it is not. I tried executing the "hdparm" commands I was instructed to use, and there was a problem enabling DMA. I will attach the log and additional comments in the next entry. 2. I have not used "elvtune" before. I'm willing to give it a try, but I have no idea what values to use. If I just run it with no arguments, I get the following: /dev/hda elevator ID 0 read_latency: 8192 write_latency: 16384 max_bomb_segments: 6 The results for /dev/hdb are identical. What values should I try? Is this safe, or is there a risk of losing my disk if I use bad values here?
Created attachment 66486 [details] here's a log of my "hdparm" attempts
Here's a copy of the update I made to ticket #209989, regarding my attempts to execute the "hdparm" commands I was given (I attached the log in the previous update): Ok, I tried the "hdparm" commands that were outlined. Please note that the syntax given in the ticket entry was incorrect: all of the arguments prior to the device name are options and must be preceded by "-". So instead of: hdparm -u1 d1 c1 X66 /dev/hda one must type: hdparm -u1 -d1 -c1 -X66 /dev/hda Anyway, once I figured this out, I tried it out. Unfortunately, I had some problems. It is refusing to enable DMA (I had assumed that DMA had been enabled right from the start, so this was a bit disturbing). Anyway when it tried to enable it, it got an "HDIO_SET_DMA failed: Operation not permitted" error. This doesn't sound right, does it??? This is a very new mother board and very new hard disks -- how can DMA not be supported? When it changed the speed to -X66, it appears to have worked, although at the time it was changed I got some "ide0: unexpected interrupt, status=0x58, count=1" errors -- are they expected when the speed is changed? Second, when I tried to increase the speed to -X68, I got a new error: "ide0: Speed warnings UDMA 3/4/5 is not functional." Should it be? The motherboard is an Intel Desktop D845EBG2L, and purports to support UltraDMA 100/66/33, and the disks are ATA133 Maxtor drives. Anyway, I'm attaching a log of the hdparm attempts (with the errors that were logged to the console inserted at the points where they occurred). I also tried my lock-up example after these changes, and the problem still exists. After this I rebooted my machine, without making any changes to /etc/rc.d/rc.local. I figured I'd rather find out what's going on first, and in the meantime it seemed safest to run at the original settings.
I've seen this on MANY of our Compaq 1850R's running RedHat 7.1, 7.2, and 7.3. All of them have hardware RAID. I noticed Bugzilla #33309, but didn't see what the resolution was, as this problem still seems to be occuring, even with the latest kernel (2.4.18-5).
It turns out the problem is that the RedHat 7.3 kernel doesn't support the Intel Desktop D845EBG2L motherboard, due to a BIOS problem which prevents Linux from enabling DMA. This results in a *MASSIVE* slowdown in disk performance, and in addition heavy disk traffic causes other processes to lock up. Until RedHat comes out with an official, complete patch for Intel's BIOS bug (I'm not holding my breath for Intel to release a BIOS that fixes this; the Intel guy I talked to told me flat out that Linux is not supported), I strongly advise people to avoid Intel motherboards. In the meantime, I've built a patched kernel that bypasses some resource checks and gets me to mode 2 (but not to mode 5, which is what it should be using). I would *really* like a proper patch for this bug which eliminates the need for this complete workaround.
We have a patch in testing in the rawhide kernel for this btw
and re elvtune: elvtune cannot cause dataloss, it just changes the parameter the kernel uses to reorder requests. Low values mean "almost no sorting" while high values mean agressive sorting. Sorting is sort of nice for throughput (disks have shorter head moves -> shorter seek times) but can give some starvation, hence it being limited. values of 64/256 I've played with and seem to improve response a bit, but do cost throughput noticably
I've tried some of the 2.4.19 kernels from www.kernel.org. I first tried 2.4.19-rc3, which does *not* fix this problem. I then tried 2.4.19-rc3-ac3, which *does* fix the problem, so for now I'm using this kernel. Hopefully Alan's fixes for this will make it into 2.4.19-rc4 and eventually into an official RedHat kernel, but for now this is the only kernel I can use without experiencing severe performance penalties. So far I haven't had any problems with it. I also noticed that 2.4.19-rc3-ac4 is now available, although I haven't tried it yet, and probably won't unless I experience any problems.
as I said, the rawhide kernel (and the Limbo beta kernel) have the needed patch. If you prefer RPM kernels you could chose to use one of those
I didn't realize the rawhide kernel was currently available from RedHat. I wouldn't mind trying it, but I find the RPM download interface somewhat cumbersome to use (and at the moment, the search mechanism appears to be somewhat broken). What I really want is a list of available kernels, but the search mechanism doesn't appear to be working properly. Here's what I did: o Open www.redhat.com o Click on "download" o Under "Find latest RPM's" o Under "By Category" o Select "-Kernel" under "Base" and click "Search" (No hits) o Select "Base" directly and click "Search" Lots of stuff comes up, most of it non-kernel. o Under "By Keyword" o enter "kernel" and click "Search' This one seems to work. A bunch of kernel releases pop up. However, when I try to access the second page, I get "Proxy Error" (from the RedHat site): Proxy Error We're sorry! We are currently experiencing technical difficulties. Please try again later. Presumably this is a temporary problem that will be fixed soon. In any case, none of the visible pages mentioned "rawhide", nor did a search on "rawhide" yield anything. Is there a simpler, more reliable way to get a list of the available kernel RPMs?
It was suggested that I try the "rawhide" or "Limbo" kernels from the RPM database, but the RPM search/download mechanism appears to be broken, at least for kernel searches. I've therefore created a new ticket for this problem, ticket 210823, which contains the details of the RPM search/download problems.
current erratum has elevator tuning patches to fix this