Bug 132562

Summary: mrtg triggers the oom-killer
Product: [Fedora] Fedora Reporter: Jay Fenlason <fenlason>
Component: kernelAssignee: Rik van Riel <riel>
Status: CLOSED CANTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: davej, jfeeney, jroyse
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-10-03 00:05:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output of vmstat 1 on the rawhide system none

Description Jay Fenlason 2004-09-14 18:50:31 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3)
Gecko/20040803

Description of problem:
When running an xinetd test program, cron started mrtg and the oom
killer started blowing things away.  The box has 256M of memory, and
512M of swap.

Version-Release number of selected component (if applicable):
mrtg-2.10.15-1

How reproducible:
Always

Steps to Reproduce:
1.Configure xinetd to accept 100 telnet connections
2.Run a test program that continually spawns telnet connections,
looking to see if xinetd ever accepts more than 100 of them.
3.Wait for mrtg to start.  Notice that the oom killer has killed all
your ssh connections
    

Additional info:

Comment 1 Jay Fenlason 2004-09-14 18:53:15 UTC
Created attachment 103840 [details]
output of vmstat 1 on the rawhide system

Comment 2 Jay Fenlason 2004-09-14 18:54:16 UTC
The output of top is too large to attach.  It's located on
fenlason-desk.boston.redhat.com: /local/home/hack/top.out

Comment 3 Miloslav Trmač 2004-09-15 00:06:30 UTC
I cannot reproduce this...

I have:
* set instances=100, cps = 1000 10 in /etc/xinet.d/telnet
* run
    mkfifo fifo
    cat > fifo  (on a separate terminal)
    while :; do telnet localhost < fifo > /dev/null & usleep 10; done

There were 100 in.telnetd and at least 100 telnet processes running
all the time, running mrtg (with the default almost-empty config file)
had no effect, running it manually or from cron.

I have, however, triggered the OOM killer when the usleep was not
there; that's the usual fork-bomb, nothing to blame mrtg for.


mrtg also appears 32 times your top.out, never using more than 3.8%
of memory, with 24 different PIDs, so I feel pretty sure mrtg is
not to blame.

Have I missed something during my attempt to reproduce the problem?

Comment 4 Miloslav Trmač 2004-09-29 21:13:57 UTC
Jay, can you still reproduce the OOM?

Comment 5 Jay Fenlason 2004-09-30 14:44:56 UTC
I haven't tried recently (busy).  riel thinks it's a kernel bug, and 
wants to investigate it, but he won't get a chance until after 
Fedora bug week is over.  If it's a kernel bug, I'll reassign this 
to him as soon as I can. 

Comment 6 Josiah Royse 2004-11-01 01:43:19 UTC
Fedora Core release 2 (Tettnang)
mrtg-2.10.5-3

I believe the telnet sessions are a red herring:  I have just started
getting oom-killer messages once I started MRTG on a previously stable
system.  

Since enabling MRTG the system has gone unstable- it won't reboot but
there are missing services everytime I look thanks to the oom-killer.

Miloslav Trmac: I would recommend configuring MRTG- an empty
configuration file is a the default-ship state, not enabling MRTG to
do anything.  Run cfgmaker public@myrouter > /etc/mrtg/mrtg.cfg and
you'll have recreated the MRTG part.

I plan on disabling MRTG by replacing the config with the stock one. 
I'll write back in a day or two.

Comment 7 Jay Fenlason 2004-11-01 15:17:52 UTC
OTOH, I've managed to trigger the oom-killer-cascading-failure on my 
rawhide box after I removed mrtg completely.  It looks to me like 
something (heavy network load?  Lots of fork()/exec()/exit()s) makes 
the box fragile, so that even a small rise in cpu usage (mrtg, 
anything else started by cron, etc) throws the box into the death 
spiral. 

Comment 8 Josiah Royse 2004-11-02 16:53:05 UTC
Jay Fenlason: Yeah, I turned off MRTG, and oom-killer once again woke
up  killing other processes.  So Network activity seems to be the
culprit like you said.  Should the component be moved to "kernel",
version "RC2"?

Comment 9 Miloslav Trmač 2005-04-17 08:31:23 UTC
OK, assigning to the kernel. Sorry for the late response.

Comment 10 Dave Jones 2005-04-18 02:09:08 UTC
you should get some debug spew in dmesg when you get an oom killing. please
paste it.

also, make sure its repeatable on the latest errata kernel, as there have been
numerous improvements to OOM handling in 2.6.11

Comment 11 Dave Jones 2005-07-15 17:35:09 UTC
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 12 Dave Jones 2005-10-03 00:05:58 UTC
This bug has been automatically closed as part of a mass update.
It had been in NEEDINFO state since July 2005.
If this bug still exists in current errata kernels, please reopen this bug.

There are a large number of inactive bugs in the database, and this is the only
way to purge them.

Thank you.