Bug 132562 - mrtg triggers the oom-killer
mrtg triggers the oom-killer
Status: CLOSED CANTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
3
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Rik van Riel
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-09-14 14:50 EDT by Jay Fenlason
Modified: 2014-08-31 19:26 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-02 20:05:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
output of vmstat 1 on the rawhide system (628.76 KB, text/plain)
2004-09-14 14:53 EDT, Jay Fenlason
no flags Details

  None (edit)
Description Jay Fenlason 2004-09-14 14:50:31 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3)
Gecko/20040803

Description of problem:
When running an xinetd test program, cron started mrtg and the oom
killer started blowing things away.  The box has 256M of memory, and
512M of swap.

Version-Release number of selected component (if applicable):
mrtg-2.10.15-1

How reproducible:
Always

Steps to Reproduce:
1.Configure xinetd to accept 100 telnet connections
2.Run a test program that continually spawns telnet connections,
looking to see if xinetd ever accepts more than 100 of them.
3.Wait for mrtg to start.  Notice that the oom killer has killed all
your ssh connections
    

Additional info:
Comment 1 Jay Fenlason 2004-09-14 14:53:15 EDT
Created attachment 103840 [details]
output of vmstat 1 on the rawhide system
Comment 2 Jay Fenlason 2004-09-14 14:54:16 EDT
The output of top is too large to attach.  It's located on
fenlason-desk.boston.redhat.com: /local/home/hack/top.out
Comment 3 Miloslav Trmač 2004-09-14 20:06:30 EDT
I cannot reproduce this...

I have:
* set instances=100, cps = 1000 10 in /etc/xinet.d/telnet
* run
    mkfifo fifo
    cat > fifo  (on a separate terminal)
    while :; do telnet localhost < fifo > /dev/null & usleep 10; done

There were 100 in.telnetd and at least 100 telnet processes running
all the time, running mrtg (with the default almost-empty config file)
had no effect, running it manually or from cron.

I have, however, triggered the OOM killer when the usleep was not
there; that's the usual fork-bomb, nothing to blame mrtg for.


mrtg also appears 32 times your top.out, never using more than 3.8%
of memory, with 24 different PIDs, so I feel pretty sure mrtg is
not to blame.

Have I missed something during my attempt to reproduce the problem?
Comment 4 Miloslav Trmač 2004-09-29 17:13:57 EDT
Jay, can you still reproduce the OOM?
Comment 5 Jay Fenlason 2004-09-30 10:44:56 EDT
I haven't tried recently (busy).  riel thinks it's a kernel bug, and 
wants to investigate it, but he won't get a chance until after 
Fedora bug week is over.  If it's a kernel bug, I'll reassign this 
to him as soon as I can. 
Comment 6 Josiah Royse 2004-10-31 20:43:19 EST
Fedora Core release 2 (Tettnang)
mrtg-2.10.5-3

I believe the telnet sessions are a red herring:  I have just started
getting oom-killer messages once I started MRTG on a previously stable
system.  

Since enabling MRTG the system has gone unstable- it won't reboot but
there are missing services everytime I look thanks to the oom-killer.

Miloslav Trmac: I would recommend configuring MRTG- an empty
configuration file is a the default-ship state, not enabling MRTG to
do anything.  Run cfgmaker public@myrouter > /etc/mrtg/mrtg.cfg and
you'll have recreated the MRTG part.

I plan on disabling MRTG by replacing the config with the stock one. 
I'll write back in a day or two.
Comment 7 Jay Fenlason 2004-11-01 10:17:52 EST
OTOH, I've managed to trigger the oom-killer-cascading-failure on my 
rawhide box after I removed mrtg completely.  It looks to me like 
something (heavy network load?  Lots of fork()/exec()/exit()s) makes 
the box fragile, so that even a small rise in cpu usage (mrtg, 
anything else started by cron, etc) throws the box into the death 
spiral. 
Comment 8 Josiah Royse 2004-11-02 11:53:05 EST
Jay Fenlason: Yeah, I turned off MRTG, and oom-killer once again woke
up  killing other processes.  So Network activity seems to be the
culprit like you said.  Should the component be moved to "kernel",
version "RC2"?
Comment 9 Miloslav Trmač 2005-04-17 04:31:23 EDT
OK, assigning to the kernel. Sorry for the late response.
Comment 10 Dave Jones 2005-04-17 22:09:08 EDT
you should get some debug spew in dmesg when you get an oom killing. please
paste it.

also, make sure its repeatable on the latest errata kernel, as there have been
numerous improvements to OOM handling in 2.6.11
Comment 11 Dave Jones 2005-07-15 13:35:09 EDT
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.
Comment 12 Dave Jones 2005-10-02 20:05:58 EDT
This bug has been automatically closed as part of a mass update.
It had been in NEEDINFO state since July 2005.
If this bug still exists in current errata kernels, please reopen this bug.

There are a large number of inactive bugs in the database, and this is the only
way to purge them.

Thank you.

Note You need to log in before you can comment on or make changes to this bug.