Bug 132562
Summary: | mrtg triggers the oom-killer | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jay Fenlason <fenlason> | ||||
Component: | kernel | Assignee: | Rik van Riel <riel> | ||||
Status: | CLOSED CANTFIX | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3 | CC: | davej, jfeeney, jroyse | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2005-10-03 00:05:58 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jay Fenlason
2004-09-14 18:50:31 UTC
Created attachment 103840 [details]
output of vmstat 1 on the rawhide system
The output of top is too large to attach. It's located on fenlason-desk.boston.redhat.com: /local/home/hack/top.out I cannot reproduce this... I have: * set instances=100, cps = 1000 10 in /etc/xinet.d/telnet * run mkfifo fifo cat > fifo (on a separate terminal) while :; do telnet localhost < fifo > /dev/null & usleep 10; done There were 100 in.telnetd and at least 100 telnet processes running all the time, running mrtg (with the default almost-empty config file) had no effect, running it manually or from cron. I have, however, triggered the OOM killer when the usleep was not there; that's the usual fork-bomb, nothing to blame mrtg for. mrtg also appears 32 times your top.out, never using more than 3.8% of memory, with 24 different PIDs, so I feel pretty sure mrtg is not to blame. Have I missed something during my attempt to reproduce the problem? Jay, can you still reproduce the OOM? I haven't tried recently (busy). riel thinks it's a kernel bug, and wants to investigate it, but he won't get a chance until after Fedora bug week is over. If it's a kernel bug, I'll reassign this to him as soon as I can. Fedora Core release 2 (Tettnang) mrtg-2.10.5-3 I believe the telnet sessions are a red herring: I have just started getting oom-killer messages once I started MRTG on a previously stable system. Since enabling MRTG the system has gone unstable- it won't reboot but there are missing services everytime I look thanks to the oom-killer. Miloslav Trmac: I would recommend configuring MRTG- an empty configuration file is a the default-ship state, not enabling MRTG to do anything. Run cfgmaker public@myrouter > /etc/mrtg/mrtg.cfg and you'll have recreated the MRTG part. I plan on disabling MRTG by replacing the config with the stock one. I'll write back in a day or two. OTOH, I've managed to trigger the oom-killer-cascading-failure on my rawhide box after I removed mrtg completely. It looks to me like something (heavy network load? Lots of fork()/exec()/exit()s) makes the box fragile, so that even a small rise in cpu usage (mrtg, anything else started by cron, etc) throws the box into the death spiral. Jay Fenlason: Yeah, I turned off MRTG, and oom-killer once again woke up killing other processes. So Network activity seems to be the culprit like you said. Should the component be moved to "kernel", version "RC2"? OK, assigning to the kernel. Sorry for the late response. you should get some debug spew in dmesg when you get an oom killing. please paste it. also, make sure its repeatable on the latest errata kernel, as there have been numerous improvements to OOM handling in 2.6.11 An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you. This bug has been automatically closed as part of a mass update. It had been in NEEDINFO state since July 2005. If this bug still exists in current errata kernels, please reopen this bug. There are a large number of inactive bugs in the database, and this is the only way to purge them. Thank you. |