Description of problem: The __get_request_wait() in drivers/block/ll_rw_block.c uses add_wait_queue(). This code is coming from the original kernel source(2.4.21). But the other version's kernel uses add_wait_queue_exclusive(). I think it may cause a bad situation. If many processes are waiting in a wait queue, all processes are waked up. And the process which was sleeping for a long time may sleep again. Version-Release number of selected component (if applicable): kernel-2.4.21-20.EL
I've been having a big performance problem of RHEL AS 3 with the 2.4.21-20 kernel. The situtation is frustrating. I'm running OpenWebMail using SpeedyCGI. SpeedyCGI speeds things up and is supposed to be a good thing. After the kernel upgrade, I see the following in /var/log/messages Dec 12 04:08:38 mail kernel: application bug: speedy_backend(20437) has SIGCHLD set to SIG_IGN but calls wait(). Dec 12 04:08:38 mail kernel: (see the NOTES section of 'man 2 wait'). Workaround activated. The iowait from top is always very high. The machine gets bogged down and has to be restarted every few days when it gets into situations where the load is in the 20s or higher and just won't come down. Even when it's almost doing nothing it has a load average 0.76. I am NOT running on underpowered hardware. I've been told that I need to do some vm tweaking but all of my attempts have helped a bit here and there but have not solved the problem. Are my performance issues related to use of perl / openwebmail / speedycgi and this issue interacting? Please help.
wsanders
This should probably go to Larry and Tom, not to me. Reassigning to Larry.
I dont think add_wait_queue_exclusive is the right thing to do in __get_request_wait() for RHEL3. This would only wakeup one process when blkdev_release_request is called and since RHEL3 does batch processing of requests we would leave several processes sleeping even though an entire batch of requests is free! I think this could leave one or more processes hung perminantly in __get_request_wait(). Has anyone tried simply replacing the add_wait_queue with add_wait_queue_exclusive in __get_request_wait() and let the system run under load for a long time? Larry Woodman
In response to off-topic comment #2: Scott, your posting here has nothing to do with this bug report. The warning message you reported will be addressed when changed to a debug message in response to bug 140552.
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.