Bug 140123
Summary: | CPU scheduling issue | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Wendy Cheng <nobody+wcheng> | ||||||
Component: | kernel | Assignee: | Ingo Molnar <mingo> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3.0 | CC: | dowdle, k.georgiou, peterm, petrides, riel, sct, tao, tburke | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2005-09-15 20:41:38 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Wendy Cheng
2004-11-19 21:18:24 UTC
Created attachment 107097 [details]
test program -1
The executable location (currently set to /usr/src/wendy/) needs to get
modified accordingly (see hang.c for details).
The test program (comment #1) is based on: http://www.hpl.hp.com/research/linux/kernel/o1-starve.php by David Mosberger from HP Palo Alto. This is the second place I've tacked on my problem asking if it appears to be related to the bug described. Please advise. speedycgi is a parent process that talks to children over a long period of time... so does it fit this? - - - - - I've been having a big performance problem of RHEL AS 3 with the 2.4.21-20 kernel. The situtation is frustrating. I'm running OpenWebMail using SpeedyCGI. SpeedyCGI speeds things up and is supposed to be a good thing. After the kernel upgrade, I see the following in /var/log/messages Dec 12 04:08:38 mail kernel: application bug: speedy_backend(20437) has SIGCHLD set to SIG_IGN but calls wait(). Dec 12 04:08:38 mail kernel: (see the NOTES section of 'man 2 wait'). Workaround activated. The iowait from top is always very high. The machine gets bogged down and has to be restarted every few days when it gets into situations where the load is in the 20s or higher and just won't come down. Even when it's almost doing nothing it has a load average 0.76. I am NOT running on underpowered hardware. I've been told that I need to do some vm tweaking but all of my attempts have helped a bit here and there but have not solved the problem. Are my performance issues related to use of perl / openwebmail / speedycgi and this issue interacting? Please help. For comment #8, the warning messages you had in /var/log/messages file is *not* related to this issue. What did the "kernel upgrade" mean ? From AS2.1 to RHEL 3 ? RHEL 3 does have some vm issues that would cause similar symptoms. I would suggest you either trying out the newest U4 beta kernel (.27EL will be out of door very soon) or reporting this issue to Red Hat support. The machine in question is running RHEL AS 3 U4. It was installed as RHEL AS 3 U3 if I remember correctly. By kernel upgrade, I ment from whatever kernel came before. Perhaps the problem has existed for some time. I only noticed it sometime after using the latest kernel. Regarding RH Support. I'm at a college. I bought 3 copies of RHEL ES 2.1 and before getting them installed... (about a month later), Red Hat came out with academic pricing. As I understood it at the time there wasn't an upgrade path from ES to AS so when it came time to install, I just bought AS 3 at academic pricing and let the RHEL ES rot... although they did get registered but not really used. Anyway, my point here is that I filed a support request with Red Hat support but never got a response back. I'm assuming it was becuase I reported it against AS Academic Edition which doesn't come with email / phone support. I thought about filing the bug against the unused ES 2.1's but the problem didn't apply to those. Since I'm not sure where to get beta kernels, I'll wait until the .27EL is released. Hunting down performance reports that looked related to mine, I found them going back to March and April (on Dell's Linux support forums for example) so I was starting to think that Red Hat was ignorning these problems or just not being vocal about trouble fixing the issue. I certainly appreciate all that Red Hat does for the community with all of the development. Thank you for replying to me even if this is in an off topic way. :) Ok, update. I had a brainfart yesterday with version numbers. So, I did a clean install of RHEL AS 3 U2 (Academic Edition) this summer. Ran that for a while and didn't notice the load/performance problem. It may have existed but not as bad. Then updated to U3. Then the problem got really bad. Is that vague enough for you? :) I did pull my head out and went to the RHEL AS 3 beta channel and download the kernel-smp-2.4.21-27.EL beta kernel. I've been running it for about 12 hours now and have not seen much of a change. The system seems more responsive at the command prompt (when I'm ssh'ed in) but the load is still way too high for what the machine is doing. The RAM caching and swap seem to have come to reasonable levels. Perhaps the load accounting is somewhat off? Anyway, haven't been running it long enough to make a definitive call... but thus far, it does NOT appear that this kernel fixes my problem. So, I have contacted Red Hat Support but I went through that in a previous submission. Doing my best to research the problem on my own I come across references to people having similiar problems with the RHEL kernel dating back to March and April of this year. In most cases it appears they were told to contact Red Hat Support and tweak the vm system. That appears to have worked for many people. Of course they would have to do various system dumps and it appears their vm settings were customized to their load. So far, I haven't found the right balance. This leads me to some questions about Red Hat. I do not mean to question the integrity of the employees... but my perception from this whole issue is that: 1) Perhaps in adding support for enterprise class hardware, the complexity of making the scheduling system and the vm system work on all loads has increased to a degree that Red Hat is having a problem making it work for most people most of the time without tweaking or 2) Red Hat has, one way or another, shipped a somewhat broken kernel in an effort to make their support system needed by a significant number of customers I don't have enough data to pick from the two and chances are it is some combination of both. I will continue to work on this problem until it is resolved but given the Academic Edition status, I'm not a legitimate support customer and have to suffer through it. So, does the new beta kernel fix the problem? Not for me, or at least that's how it appears now. Can Red Hat support help me? No, I don't qualify. Will the next kernel update fix the problem? Not sure, hope so. Have I given you enough information about my hardware, software and load? No. But if you ask for specific things, I'll provide them. I guess I could see what everyone else was asked and just provide the information but I'm not sure anyone wants to hear it... because they have seen this problem over and over already and tuning my specific system would actually be end user support. I also run, as mentioned previously, some third-party software (MailScanner, clamav, OpenWebMail w/speedycgi) and I'm sure those don't play well in the support mix... but I can tell you there are a lot of people out there using those because they work well. Thank you for trying. Let's avoid filling this bugzilla with unrelated issues. I'll follow-up with above two comments via email. Created attachment 108619 [details]
scheduler patch
This patch was used to build the test kernel run by the above 4 customers.
Included for reference purpose.
Closing as recommended by last comment. |