Bug 177593 - Memory and swap usage rises and system crashes
Memory and swap usage rises and system crashes
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
All Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-01-11 20:11 EST by Nathan G. Grennan
Modified: 2015-01-04 17:24 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-05-05 08:54:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Nathan G. Grennan 2006-01-11 20:11:46 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12) Gecko/20050923 Galeon/2.0.0

Description of problem:
  I am seeing a similar problem with 2.6.15-1.1823_FC4. It seems to have issues with memory management. A mail server with 1gb of memory and 2gb of swap will normally use around 512mb of memory, excluding buffers and cache, after a few days. While running 2.6.15-1.1823_FC4, after about a day, the amount goes up to around 950mb or more, and starts to use swap. The swap usage will keep going up. It seems when it fills around 20% of swap all programs stop functioning and the console shows lots of errors about memory and swap.

  I tried using top and ps to see if any programs were using an excessive about of memory and didn't see any with unsual amounts. I also tried restarting just about every daemon and nothing made a big dent in the memory usage.

  It had been running for 105 days without problem before being rebooted to 2.6.15-1.1823_FC4. It think it was running kernel-2.6.13-1.1526_FC4 before.

Version-Release number of selected component (if applicable):
kernel-2.6.15-1.1823_FC4

How reproducible:
Always

Steps to Reproduce:
1. Boot 2.6.15-1.1823_FC4
2. Stress the system with a strong workload
  

Actual Results:  memory and swap usage go up and server crashes

Expected Results:  memory and swap usage to no continue to rise and the server not to crash

Additional info:
Comment 1 Dave Jones 2006-01-12 20:07:44 EST
1824 should reach updates-testing soon, which fixes 1-2 memory management bugs,
and should be a lot more stable.
Comment 2 Nathan G. Grennan 2006-01-13 11:29:04 EST
I noticed it had come out last night, and read the changelog. I plan to do some
testing on a desktop today, and look forward to installing 1824 to see if it
fixes the issue I was saw before with 1823.
Comment 3 Nathan G. Grennan 2006-01-13 12:51:03 EST
1824 doesn't seem any better with the problem I am seeing. I have 1824 running
on a x86_64 desktop. I started after a fresh reboot at around 220mb after login
to X. I then started normal usage and it was up to 350mb fairly quickly. I was a
little susipous, but didn't think much of it. Then I used su to become root and
ran ls -alR /. I watched the memory - buffer - cache amount continue to climb
using free. The amount went up to 450mb. I then closed all four of my
gnome-terminal windows, and the figure didn't drop. I also used top to check the
highest users of ram. X was the top at 5.2%. I then repeated ls -alR / again
while watching top memory percentages and free. Memory usage just kept going up.
It went up to 670mb before I stopped it. X usage didn't go up and nothing
overtook it in memory usage.
Comment 4 Dave Jones 2006-01-13 16:38:23 EST
memory increasing this way is normal behaviour.  Instead of having empty RAM
around, Unix behaviour is to use this spare ram as cache, and will only purge it
when something requests it. So closing gnome-terminal makes no difference at
all, and shouldn't be expected to free anything.

X memory usage in top is also always over-inflated, as it also accounts memory
mapped areas such as video ram, chipset registers etc. 

I need to see some of those console errors you mentioned to further diagnose this.
The crash is obviously a bug, but there's not enough info to work on right now.
Comment 5 Nathan G. Grennan 2006-01-13 18:30:48 EST
As I said, before I am looking at the figure without buffers and cache counted.

             total       used       free     shared    buffers     cached
Mem:       1534400    1407216     127184          0     220948     468524
                      ^^^^^^^--Not this number
-/+ buffers/cache:     717744     816656
                       ^^^^^^--This number
                      

I will try the same situtation with 2.6.14-1.1656 to make sure it isn't
something else.
Comment 6 Nathan G. Grennan 2006-01-13 21:49:20 EST
I think I figured out what was causing the high memory usage from ls -alR / on
the desktop. I will be trying 1824 on the server.
Comment 7 Nathan G. Grennan 2006-01-15 03:13:13 EST
I am still seeing the issue with 1824 on the server. After a day the memory is
up to about 870mb for no obvious reason.

 00:13:17 up 1 day,  5:11,  1 user,  load average: 1.01, 1.10, 1.10

             total       used       free     shared    buffers     cached
Mem:       1034964    1015400      19564          0      18916     105236
-/+ buffers/cache:     891248     143716
Swap:      2104496        616    2103880


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12512 amavis    16   0 86652  72m 2948 S  0.0  7.2   0:06.31 amavisd
12607 amavis    17   0 84492  70m 2964 S  0.0  6.9   0:04.24 amavisd
12679 amavis    16   0 84436  70m 2948 S  0.0  6.9   0:02.38 amavisd
13161 amavis    16   0 83304  68m 2764 S  0.0  6.8   0:00.79 amavisd
 2084 amavis    16   0 81780  67m 2512 S  0.0  6.7   0:04.93 amavisd
13218 amavis    16   0 82516  67m 1712 S  0.0  6.6   0:00.01 amavisd
 1884 mysql     16   0  134m  31m 4768 S  0.0  3.1   1:37.44 mysqld
 2483 root      39  19 21576  16m 1376 R 99.4  1.6   1458:49 sb
 2268 apache    15   0 55892  13m 9.8m S  0.0  1.4   0:04.01 httpd
 2261 apache    16   0 55832  13m 9.8m S  0.0  1.4   0:03.81 httpd
 2260 apache    16   0 55600  13m  10m S  0.0  1.4   0:03.75 httpd
 2264 apache    16   0 55768  13m 9988 S  0.0  1.4   0:03.52 httpd
 2263 apache    15   0 55656  13m 9908 S  0.0  1.3   0:03.42 httpd
 2262 apache    16   0 55772  13m 9760 S  0.0  1.3   0:03.40 httpd
 2265 apache    15   0 55652  13m 9716 S  0.0  1.3   0:03.88 httpd
 2267 apache    15   0 55524  13m 9696 S  0.0  1.3   0:03.00 httpd
20570 apache    16   0 55440  11m 8316 S  0.0  1.2   0:01.42 httpd
 1787 clamav    16   0 45008 9.9m 1140 S  0.0  1.0   1:43.60 clamd
 2215 root      16   0 52132 7088 4668 S  0.0  0.7   0:00.21 httpd
 2381 mailman   16   0 11492 5804 2360 S  0.0  0.6   0:00.32 python
 2380 mailman   16   0 11504 5760 2360 S  0.0  0.6   0:00.38 python
 2378 mailman   16   0 11496 5740 2360 S  0.0  0.6   0:00.29 python
 2376 mailman   16   0 11492 5736 2360 S  0.0  0.6   0:00.33 python
 2377 mailman   16   0 11476 5736 2360 S  0.0  0.6   0:00.36 python
 2382 mailman   16   0 11492 5736 2360 S  0.0  0.6   0:00.29 python


MemTotal:      1034964 kB
MemFree:         48468 kB
Buffers:         19144 kB
Cached:         105564 kB
SwapCached:          0 kB
Active:         420584 kB
Inactive:        38316 kB
HighTotal:      130240 kB
HighFree:          252 kB
LowTotal:       904724 kB
LowFree:         48216 kB
SwapTotal:     2104496 kB
SwapFree:      2103880 kB
Dirty:            1048 kB
Writeback:           0 kB
Mapped:         364228 kB
Slab:           512440 kB
CommitLimit:   2621976 kB
Committed_AS:   807628 kB
PageTables:       4356 kB
VmallocTotal:   114680 kB
VmallocUsed:      3356 kB
VmallocChunk:   110972 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     4096 kB
Comment 8 Nathan G. Grennan 2006-01-16 11:56:56 EST
Here is the current state of the system after being up for slightly longer than
before.

08:59:28 up 1 day,  8:40,  1 user,  load average: 1.22, 1.27, 1.25


             total       used       free     shared    buffers     cached
Mem:       1035116     977888      57228          0     112812     368244
-/+ buffers/cache:     496832     538284
Swap:      2104496        748    2103748



MemTotal:      1035116 kB
MemFree:         46444 kB
Buffers:        112844 kB
Cached:         369436 kB
SwapCached:          0 kB
Active:         654692 kB
Inactive:       210396 kB
HighTotal:      130240 kB
HighFree:          120 kB
LowTotal:       904876 kB
LowFree:         46324 kB
SwapTotal:     2104496 kB
SwapFree:      2103748 kB
Dirty:            3352 kB
Writeback:           0 kB
Mapped:         412192 kB
Slab:           105384 kB
CommitLimit:   2622052 kB
Committed_AS:   915192 kB
PageTables:       6124 kB
VmallocTotal:   114680 kB
VmallocUsed:      3448 kB
VmallocChunk:   110868 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     4096 kB
Comment 9 Nathan G. Grennan 2006-01-16 12:06:33 EST
The big difference I see is Slab. 

1824 Slab:          512440 kB
1656 Slab:          105384 kB

512440 - 105384 = 407056

1824 -/+ buffers/cache:     891248
1656 -/+ buffers/cache:     496832


891248 - 496832 = 394416


407056 and 394416 are fairly close. Any ideas on why the Slab is so much higher
with 1824?
Comment 10 Nathan G. Grennan 2006-01-16 12:18:52 EST
https://www.redhat.com/archives/fedora-devel-list/2005-February/msg00249.html

This e-mail from you would suggest that maybe it is related to debugging options
turned on.

config-2.6.14-1.1656_FC4:# CONFIG_DEBUG_SLAB is not set
config-2.6.15-1.1824_FC4:CONFIG_DEBUG_SLAB=y

Looks like the slab debugging is the key difference, and is causing the problem.
Comment 11 Dave Jones 2006-02-03 01:11:05 EST
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.
Comment 12 John Thacker 2006-05-05 08:54:43 EDT
Closing per previous comment.

Note You need to log in before you can comment on or make changes to this bug.