Bug 146017
Summary: | high load average unresponsive server | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Jose Traver <traverj> | ||||||
Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3.0 | CC: | peterm, petrides, riel | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2005-04-07 14:06:31 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Jose Traver
2005-01-24 19:03:14 UTC
Created attachment 110141 [details]
SysRq log for M, W, T
Jose, unfortunately the above attachment does not show a system with a high load average. In this case both CPUs were running the idle loop and all other processes were blocked. In addition there was no memory deficit. Can you get the system in this state and get a "vmstat 1" and "top" output so I can see if they agree? Thanks, Larry Woodman Created attachment 111564 [details]
Capture file with "vmstat 1" and top
Hello Larry,
I've caught the server in this state again and I did both "vmstat 1" and "top".
I include the capture as an attatchment.
Looking through the capture, there are a lot of processes from crond, grouped
by pairs parent-child which could be leading to the reported problem. Through a
"strace" the child process does nothing while the parent process is waiting for
a read, so both are "iddle". I have tried to kill these processes but only the
parent processes have died.
With a "lsof" from one of the remaining child processes, I've seen that it was
using the "audit" feature, so I have stopped the audit service and all the
child processes from crond have died. Then I have restarted crond and this has
made the system come back to normal load average.
Now I have disabled the audit service and restarted the system so I can test
whether audit is responsible or not of the high load average. If so, I guess I
should report it as an audit package bug, shouldn't I?
Please turn auditing off unless you want to run in a CAPP EAL3 environment. Auditing is enabled by default and it will impact system performance. Larry Woodman |