Bug 147832 - (IT_72574) oom-killer triggered during Red Hat Cert
oom-killer triggered during Red Hat Cert
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-02-11 14:21 EST by Robert Hentosh
Modified: 2007-11-30 17:07 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-06-08 11:13:55 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages file from system. (863.28 KB, text/plain)
2005-02-11 14:21 EST, Robert Hentosh
no flags Details
sysreport (508.89 KB, application/octet-stream)
2005-04-27 15:09 EDT, Danny Trinh
no flags Details
dmesg log (15.00 KB, application/octet-stream)
2005-04-27 15:10 EDT, Danny Trinh
no flags Details
/var/log/messages (386.82 KB, text/plain)
2005-04-27 15:11 EDT, Danny Trinh
no flags Details
results*.rpm package when running on SMP kernel (721.33 KB, application/octet-stream)
2005-04-27 15:14 EDT, Danny Trinh
no flags Details
result*.rpm package when running on hugemem kernel (700.94 KB, application/octet-stream)
2005-04-27 15:15 EDT, Danny Trinh
no flags Details
result*.rpm when running with rhr2-1.1-3 (788.09 KB, application/x-rpm)
2005-04-28 11:35 EDT, Danny Trinh
no flags Details
/proc/slabinfo from sysreport (13.37 KB, text/plain)
2005-05-10 16:56 EDT, Amit Bhutani
no flags Details

  None (edit)
Description Robert Hentosh 2005-02-11 14:21:19 EST
Description of problem:

After running the Certification Suite for 2 hours on a PE4125 with SATA drives 
and 12 GB of RAM. The oom-killer starts to kill processes off. Eventually the 
system locks up.

Version-Release number of selected component (if applicable):


How reproducible:
Every run

Steps to Reproduce:
1. Install RHEL4 RC1 on PE1425 with SATA drives and 12GB of RAM.
2. Run Red Hat Certification (redhat-ready)
3. Sit back.
  
Actual results:

oom-killer will start killing off processes

Expected results:

No oom-killer.
Additional info:
Comment 1 Robert Hentosh 2005-02-11 14:21:20 EST
Created attachment 110985 [details]
/var/log/messages file from system.
Comment 2 Robert Hentosh 2005-02-11 14:23:02 EST
This was originally thought related to BZ # 141173

See Larry Woodman's comment:

https://bugzilla.redhat.com/beta/show_bug.cgi?id=141173#c120

Comment 3 Robert Hentosh 2005-02-11 14:26:38 EST
Also, we were unable to obtain a failure when the ammount of memory was 
reduced to 5GB of RAM.
Comment 4 Larry Woodman 2005-02-11 15:27:36 EST
This is being caused by *someone* allocating all of lowmem!  There are ~256K
pages of lowmem and ~45K pages are in the slabcache and ~4K pages are on the
lists here:

Normal free:696kB active:8300kB inactive:7268kB present:901120kB

Given the fact that this only happens when there is lots of highmem(this is a
13GB system with 12GB of highmem/9GB of non-DMA-able memory) and not of smaller
systems(a 5GB system with 4GB of highmem/1GB of non-DMA-able memory) I would
guess that the other 200K pages of lowmem are being used in bounce buffers. 
I'll add bounce buffer accounting to a test kernel so we can see if thats where
that are and we'll have to proceed from there.

Larry Woodman
Comment 5 Larry Woodman 2005-02-28 17:47:27 EST
I think this problem has been fixed in the pre-RHEL4-U1 kernel.  Basically the
bio ref counting was wrong and that caused the leaking bounce pages from lowmem.
 This is exactly what we are seeing on this system.

Please get me a /proc/slabinfo output and try the latest RHEL4-U1 kernel ASAP.

Larry Woodman
Comment 6 Susan Denham 2005-03-31 09:48:02 EST
U1 kernel (2.6.9-6.37.EL) being given to Dell today; U1 ISOs should be available
tomorrow.

Please test with the U1 kernel and report status here.
Comment 8 Danny Trinh 2005-04-27 09:50:13 EDT
oom-killer still appears in 2.6.9-6.37.EL SMP kernel. However, I didn't see 
this problem on hugemem kernel.
Comment 10 Danny Trinh 2005-04-27 15:09:59 EDT
Created attachment 113728 [details]
sysreport
Comment 11 Danny Trinh 2005-04-27 15:10:54 EDT
Created attachment 113729 [details]
dmesg log
Comment 12 Danny Trinh 2005-04-27 15:11:54 EDT
Created attachment 113730 [details]
/var/log/messages
Comment 13 Danny Trinh 2005-04-27 15:14:13 EDT
Created attachment 113731 [details]
results*.rpm package when running on SMP kernel

The approximate time it took for the first failure to occur is about 2 hours
Comment 14 Danny Trinh 2005-04-27 15:15:28 EDT
Created attachment 113732 [details]
result*.rpm package when running on hugemem kernel

There is no oom-killer.
Comment 15 Danny Trinh 2005-04-28 11:35:16 EDT
Created attachment 113790 [details]
result*.rpm when running with rhr2-1.1-3

oom-killer still appears.
Comment 16 Amit Bhutani 2005-05-10 11:37:21 EDT
Has RH had a chance to look at the most recent failure logs from the 
regression tests on U1 Beta kernel that Danny has posted ? 

As has been communicated earlier, this is being tracked as a U1 MUSTFIX.
Comment 17 Larry Woodman 2005-05-10 13:54:31 EDT
Can someone simply get me the dmesg output that appears when you get the
oom-kills?  Thanks, I cant seem to get look at that rpm that was attached. 
Also, please get me a "uname -a" output so I can see the exact kernel version so
I can trac down the exact patch set that it includes.

Thanks, Larry Woodman
Comment 18 Amit Bhutani 2005-05-10 15:10:32 EDT
> Can someone simply get me the dmesg output that appears when you get the
oom-kills?

See comment #11 for dmesg output

> Also, please get me a "uname -a" output so I can see the exact kernel version

Sysreport is attached in comment #10 and should have the comprehensive 
information on the state of the system. Here is the requested info any way:
Kernel version: 2.6.9-6.37.ELsmp
Arch: x86
Comment 19 Larry Woodman 2005-05-10 15:56:38 EDT
OK, this is a 13GB system?(3342336 pages of RAM) running the SMP kernel(3G/1G).
While we do officially support this, you are much better off running the
Hugemem(4G/4G) kernel because the cause of the OOM kills is lowmem exhaustion
(Normal free:688kB active:8800kB inactive:8152kB present:901120kB).  

Having said all that, I would guess that you are running some sort of driver
that is either leaking memory or using memory as a cache instead of using the
slabcache.  This is consuming all of lowmem which combided with running the SMP
kernel is causing the OOM kills.

>>>writeback:3348 slab:36667
>>>Normal free:688kB active:8800kB inactive:8152kB present:901120kB

To help debug the lowmem leakage:

1.) reboot the SMP kernel and get an AltSysrq-M output before running anything.

2.) get me a /proc/slabinfo output as soon as an OOM kill occurs.

3.) get an lsmod so I can see what drivers are being used. 
Comment 20 Amit Bhutani 2005-05-10 16:54:15 EDT
> OK, this is a 13GB system?

This is a SC1425 system with 12 GB RAM. To be precise, it has 6x2GB Single 
rank DIMMS.


> running the SMP kernel(3G/1G)

Yes. Hugemem kernel passes fine.


> 3.) get an lsmod so I can see what drivers are being used. 

This was an untainted kernel and nothing outside of what was on the RHEL 4 
media was installed. Extracting the lsmod output from sysreport attached in 
comment #10:
Module                  Size  Used by
iptable_nat            27236  0
ip_conntrack           45701  1 iptable_nat
iptable_mangle          6721  0
iptable_filter          6721  0
ip_tables              21441  3 iptable_nat,iptable_mangle,iptable_filter
nfsd                  205281  9
exportfs               10049  1 nfsd
lockd                  65257  2 nfsd
md5                     8001  1
ipv6                  238817  20
parport_pc             27905  0
lp                     15405  0
parport                37641  2 parport_pc,lp
autofs4                22085  0
i2c_dev                14273  0
i2c_core               25921  1 i2c_dev
sunrpc                138789  19 nfsd,lockd
dm_mod                 58949  0
button                 10449  0
battery                12869  0
ac                      8773  0
uhci_hcd               32729  0
ehci_hcd               31813  0
hw_random               9557  0
e1000                  83989  0
ext3                  118729  2
jbd                    59481  1 ext3
ata_piix               13125  4
libata                 47133  1 ata_piix
sd_mod                 20545  5
scsi_mod              116429  2 libata,sd_mod

> 1.) reboot the SMP kernel and get an AltSysrq-M output before running 
anything.

System not available anymore. Sysreport (from comment #10) does 
have /proc/meminfo. This was captured after the failures though. 
Pasting /proc/meminfo output.
MemTotal:       515260 kB
MemFree:          8860 kB
Buffers:         93652 kB
Cached:         225828 kB
SwapCached:          0 kB
Active:         389520 kB
Inactive:        44192 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       515260 kB
LowFree:          8860 kB
SwapTotal:     2097136 kB
SwapFree:      2096960 kB
Dirty:              68 kB
Writeback:           0 kB
Mapped:         151136 kB
Slab:            62236 kB
Committed_AS:   421136 kB
PageTables:       3512 kB
VmallocTotal:   499704 kB
VmallocUsed:      4568 kB
VmallocChunk:   491692 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     4096 kB

Output of 'free' from sysreport:
total       used       free     shared    buffers     cached
Mem:      12472132     295404   12176728          0      53812      87888
-/+ buffers/cache:     153704   12318428
Swap:      8385912          0    8385912

> 2.) get me a /proc/slabinfo output as soon as an OOM kill occurs.

I am trying to acquire the system so we can restart the test to capture this 
stateful information. At the moment, what I have is what was captured in 
sysreport (comment #10) probably several hours after the fact OOM was invoked 
and stress was also stopped, so may not be very useful but any way I have 
attached it (slabinfo).
Comment 21 Amit Bhutani 2005-05-10 16:56:44 EDT
Created attachment 114228 [details]
/proc/slabinfo from sysreport
Comment 22 Larry Woodman 2005-05-10 22:43:46 EDT
The problem here is that after booting the SMP kernel on a system with this much
RAM(3342336) there is only 1/2 of lowmem available(LowTotal: 515260 kB).  Based
on the slabinfo output over 400MB of that lowmem is wired in the slabcache.  Add
a few hundred processes, buffers allocated by the drivers and lots of bounce
buffers for a system with this high a ratio of highmem to lowmem and avoiding
OOM kills will be difficult if not impossible with the SMP kernel.  I'm afraid
that the only real answer is to run the Hugemem kernel.  Is this a problem?

Larry Woodman
Comment 23 Amit Bhutani 2005-05-17 16:17:57 EDT
> I'm afraid that the only real answer is to run the Hugemem kernel.  Is this 
a problem?

It's not a real problem rather an obscure messaging problem. Dell is currently 
communicating to it's customers that you only need to run the Hugemem kernel 
if you've got >16GB.  In this case, you are suggesting that we run hugemem 
when you've got 12GB. There's the disconnect. Question is what is RH 
*officially* messaging to it's customers on usage of SMP Vs. Hugemem kernel ? 
If we can fix the messaging, we can close this issue as "Working as Designed"
Comment 24 Larry Woodman 2005-05-17 16:53:09 EDT
The reality of the situation is when the ratio between Highmem and Lowmem
exceeds about 10 to 1 the possibility of OOM kills increases significantly.

Larry Woodman
Comment 25 Amit Bhutani 2005-05-17 17:09:22 EDT
I agree and am not arguing about your theory of why this is happening. All I 
am saying is that the system config on which this is readily happening happens 
to be an average config which Dell sells a lot i.e. EM64T system with 12 GB 
RAM. Since we cannot document the actual technical HIGHMEM / LOWMEM ratio 
factor to our customers as criteria for using SMP vs. Hugemem kernels, what do 
you propose we document. Currently we say that UP and SMP for up to 16GB and 
Hugemem for anything > 16GB.
Comment 26 Larry Woodman 2005-05-17 17:15:11 EDT
Amit, I'm a bit confused!  Is this an EM64T running an x86 SMP kernel?  


Larry
Comment 27 Amit Bhutani 2005-05-17 17:19:14 EDT
I'm sorry, I should clarify. The remark "EM64T system with 12 GB RAM." should 
have stated "EM64T capable system with 12 GB RAM.". The OS is very much x86. 
The system (SC1425) is 32/64 capable since it has the Intel EM64T procs. Sorry 
for the confusion.
Comment 28 Amit Bhutani 2005-05-19 13:57:20 EDT
As per discussion in today's con call, we can close this issue once RH issues 
a KB article on proper usage of Hugemem kernels in scenarios prone for OOM-
killer invocations such as the one described in this issue.
Comment 29 jlewis 2005-05-20 23:33:38 EDT
I seem to be having this same problem with 2.6.9-5.0.5.ELsmp on a server with
only 1GB RAM.

# free
             total       used       free     shared    buffers     cached
Mem:       1034676     485980     548696          0     171348     160672
-/+ buffers/cache:     153960     880716
Swap:      2104432          0    2104432

# lsmod
Module                  Size  Used by
nfsd                  205153  9
exportfs               10049  1 nfsd
lockd                  65129  2 nfsd
sunrpc                137637  19 nfsd,lockd
md5                     8001  1
ipv6                  238945  32
ipt_REJECT             10561  1
ipt_state               5825  5
iptable_filter          6721  1
iptable_nat            27237  1
ip_conntrack           45701  2 ipt_state,iptable_nat
ip_tables              21441  4 ipt_REJECT,ipt_state,iptable_filter,iptable_nat
dm_mod                 57157  0
button                 10449  0
battery                12869  0
ac                      8773  0
uhci_hcd               32473  0
e1000                  82253  0
e100                   35781  0
mii                     8641  1 e100
floppy                 58065  0
qla2200                90817  0
ext3                  118473  3
jbd                    59481  1 ext3
raid5                  24129  1
xor                    17609  1 raid5
raid1                  19521  3
qla2100                83393  0
qla2xxx               109664  16 qla2200,qla2100
scsi_transport_fc      11713  1 qla2xxx
sd_mod                 20545  28
scsi_mod              116301  3 qla2xxx,scsi_transport_fc,sd_mod

Here's a slabinfo from about 36min after a reboot.  We don't generally get much
time to look at things once the oom-killer goes nuts.

slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>
: tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs>
<num_slabs> <sharedavail>
rpc_buffers            8      8   2048    2    1 : tunables   24   12    8 :
slabdata      4      4      0
rpc_tasks              8     15    256   15    1 : tunables  120   60    8 :
slabdata      1      1      0
rpc_inode_cache        6      7    512    7    1 : tunables   54   27    8 :
slabdata      1      1      0
fib6_nodes             7    119     32  119    1 : tunables  120   60    8 :
slabdata      1      1      0
ip6_dst_cache          7     15    256   15    1 : tunables  120   60    8 :
slabdata      1      1      0
ndisc_cache            1     15    256   15    1 : tunables  120   60    8 :
slabdata      1      1      0
rawv6_sock             6     10    768    5    1 : tunables   54   27    8 :
slabdata      2      2      0
udpv6_sock             1      5    768    5    1 : tunables   54   27    8 :
slabdata      1      1      0
tcpv6_sock             5      6   1280    3    1 : tunables   24   12    8 :
slabdata      2      2      0
ip_fib_alias          14    226     16  226    1 : tunables  120   60    8 :
slabdata      1      1      0
ip_fib_hash           14    119     32  119    1 : tunables  120   60    8 :
slabdata      1      1      0
ip_conntrack_expect      0      0    256   15    1 : tunables  120   60    8 :
slabdata      0      0      0
ip_conntrack        2708   3400    384   10    1 : tunables   54   27    8 :
slabdata    340    340      0
dm_tio                 0      0     16  226    1 : tunables  120   60    8 :
slabdata      0      0      0
dm_io                  0      0     16  226    1 : tunables  120   60    8 :
slabdata      0      0      0
raid5/md2            256    258   1344    3    1 : tunables   24   12    8 :
slabdata     86     86      0
uhci_urb_priv          0      0     44   88    1 : tunables  120   60    8 :
slabdata      0      0      0
scsi_cmd_cache       215    240    384   10    1 : tunables   54   27    8 :
slabdata     24     24    135
ext3_inode_cache   54782  54782    552    7    1 : tunables   54   27    8 :
slabdata   7826   7826      0
ext3_xattr             0      0     48   81    1 : tunables  120   60    8 :
slabdata      0      0      0
journal_handle       199    405     28  135    1 : tunables  120   60    8 :
slabdata      3      3     15
journal_head        1255   3159     48   81    1 : tunables  120   60    8 :
slabdata     39     39    300
revoke_table           6    290     12  290    1 : tunables  120   60    8 :
slabdata      1      1      0
revoke_record         37    226     16  226    1 : tunables  120   60    8 :
slabdata      1      1      0
qla2xxx_srbs         432    496    128   31    1 : tunables  120   60    8 :
slabdata     16     16    120
sgpool-128            32     33   2560    3    2 : tunables   24   12    8 :
slabdata     11     11      0
sgpool-64             32     33   1280    3    1 : tunables   24   12    8 :
slabdata     11     11      0
sgpool-32             33     36    640    6    1 : tunables   54   27    8 :
slabdata      6      6      0
sgpool-16             59     60    384   10    1 : tunables   54   27    8 :
slabdata      6      6      0
sgpool-8             234    330    256   15    1 : tunables  120   60    8 :
slabdata     22     22     60
unix_sock            105    105    512    7    1 : tunables   54   27    8 :
slabdata     15     15      0
ip_mrt_cache           0      0    128   31    1 : tunables  120   60    8 :
slabdata      0      0      0
tcp_tw_bucket        606    744    128   31    1 : tunables  120   60    8 :
slabdata     24     24      0
tcp_bind_bucket      858   1582     16  226    1 : tunables  120   60    8 :
slabdata      7      7     60
tcp_open_request      24     31    128   31    1 : tunables  120   60    8 :
slabdata      1      1      0
inet_peer_cache       99    122     64   61    1 : tunables  120   60    8 :
slabdata      2      2      0
secpath_cache          0      0    128   31    1 : tunables  120   60    8 :
slabdata      0      0      0
xfrm_dst_cache         0      0    256   15    1 : tunables  120   60    8 :
slabdata      0      0      0
ip_dst_cache         174    195    256   15    1 : tunables  120   60    8 :
slabdata     13     13      0
arp_cache              6     15    256   15    1 : tunables  120   60    8 :
slabdata      1      1      0
raw_sock               5      6    640    6    1 : tunables   54   27    8 :
slabdata      1      1      0
udp_sock              11     24    640    6    1 : tunables   54   27    8 :
slabdata      4      4      0
tcp_sock             323    616   1152    7    2 : tunables   24   12    8 :
slabdata     88     88     84
flow_cache             0      0    128   31    1 : tunables  120   60    8 :
slabdata      0      0      0
mqueue_inode_cache      1      6    640    6    1 : tunables   54   27    8 :
slabdata      1      1      0
isofs_inode_cache      0      0    372   10    1 : tunables   54   27    8 :
slabdata      0      0      0
hugetlbfs_inode_cache      1     11    344   11    1 : tunables   54   27    8 :
slabdata      1      1      0
ext2_inode_cache       0      0    488    8    1 : tunables   54   27    8 :
slabdata      0      0      0
ext2_xattr             0      0     48   81    1 : tunables  120   60    8 :
slabdata      0      0      0
dquot                  0      0    144   27    1 : tunables  120   60    8 :
slabdata      0      0      0
eventpoll_pwq          3    107     36  107    1 : tunables  120   60    8 :
slabdata      1      1      0
eventpoll_epi          3     31    128   31    1 : tunables  120   60    8 :
slabdata      1      1      0
kioctx                 0      0    256   15    1 : tunables  120   60    8 :
slabdata      0      0      0
kiocb                  0      0    128   31    1 : tunables  120   60    8 :
slabdata      0      0      0
dnotify_cache          1    185     20  185    1 : tunables  120   60    8 :
slabdata      1      1      0
fasync_cache           0      0     16  226    1 : tunables  120   60    8 :
slabdata      0      0      0
shmem_inode_cache    307    333    444    9    1 : tunables   54   27    8 :
slabdata     37     37      0
posix_timers_cache      0      0    112   35    1 : tunables  120   60    8 :
slabdata      0      0      0
uid_cache             13     61     64   61    1 : tunables  120   60    8 :
slabdata      1      1      0
cfq_pool             332    357     32  119    1 : tunables  120   60    8 :
slabdata      3      3    120
crq_pool             525    960     40   96    1 : tunables  120   60    8 :
slabdata     10     10    208
deadline_drq           0      0     52   75    1 : tunables  120   60    8 :
slabdata      0      0      0
as_arq                 0      0     64   61    1 : tunables  120   60    8 :
slabdata      0      0      0
blkdev_ioc           130    370     20  185    1 : tunables  120   60    8 :
slabdata      2      2      0
blkdev_queue          38     56    488    8    1 : tunables   54   27    8 :
slabdata      7      7      0
blkdev_requests      537    825    160   25    1 : tunables  120   60    8 :
slabdata     33     33    224
biovec-(256)         256    256   3072    2    2 : tunables   24   12    8 :
slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    8 :
slabdata     52     52      0
biovec-64            270    270    768    5    1 : tunables   54   27    8 :
slabdata     54     54      0
biovec-16            259    285    256   15    1 : tunables  120   60    8 :
slabdata     19     19      0
biovec-4             270    305     64   61    1 : tunables  120   60    8 :
slabdata      5      5      0
biovec-1           32590  32996     16  226    1 : tunables  120   60    8 :
slabdata    146    146    180
bio                32587  32767    128   31    1 : tunables  120   60    8 :
slabdata   1057   1057    180
file_lock_cache      123    123     96   41    1 : tunables  120   60    8 :
slabdata      3      3      0
sock_inode_cache     454    735    512    7    1 : tunables   54   27    8 :
slabdata    105    105     77
skbuff_head_cache   1800   1860    256   15    1 : tunables  120   60    8 :
slabdata    124    124    240
sock                   5     10    384   10    1 : tunables   54   27    8 :
slabdata      1      1      0
proc_inode_cache    6655   6655    360   11    1 : tunables   54   27    8 :
slabdata    605    605      0
sigqueue             295    297    148   27    1 : tunables  120   60    8 :
slabdata     11     11      0
radix_tree_node    18650  18802    276   14    1 : tunables   54   27    8 :
slabdata   1343   1343     27
bdev_cache            58     63    512    7    1 : tunables   54   27    8 :
slabdata      9      9      0
mnt_cache             29     62    128   31    1 : tunables  120   60    8 :
slabdata      2      2      0
inode_cache         1999   2035    344   11    1 : tunables   54   27    8 :
slabdata    185    185      0
dentry_cache      127110 127140    152   26    1 : tunables  120   60    8 :
slabdata   4890   4890     30
filp                1451   2610    256   15    1 : tunables  120   60    8 :
slabdata    174    174    264
names_cache           47     47   4096    1    1 : tunables   24   12    8 :
slabdata     47     47      0
avc_node              12    300     52   75    1 : tunables  120   60    8 :
slabdata      4      4      0
idr_layer_cache       88    116    136   29    1 : tunables  120   60    8 :
slabdata      4      4      0
buffer_head        62085  62100     52   75    1 : tunables  120   60    8 :
slabdata    828    828    120
mm_struct            261    435    768    5    1 : tunables   54   27    8 :
slabdata     87     87      0
vm_area_struct      4228   8325     88   45    1 : tunables  120   60    8 :
slabdata    185    185    360
fs_cache             465    671     64   61    1 : tunables  120   60    8 :
slabdata     11     11      0
files_cache          297    448    512    7    1 : tunables   54   27    8 :
slabdata     64     64     33
signal_cache         468    713    128   31    1 : tunables  120   60    8 :
slabdata     23     23      0
sighand_cache        344    470   1408    5    2 : tunables   24   12    8 :
slabdata     94     94     88
task_struct         2407   2530   1408    5    2 : tunables   24   12    8 :
slabdata    506    506     88
anon_vma            1315   3616     16  226    1 : tunables  120   60    8 :
slabdata     16     16    344
pgd                  414    714     32  119    1 : tunables  120   60    8 :
slabdata      6      6      7
pmd                  618    629   4096    1    1 : tunables   24   12    8 :
slabdata    618    629     96
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 :
slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 :
slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 :
slabdata      0      0      0
size-65536             1      1  65536    1   16 : tunables    8    4    0 :
slabdata      1      1      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 :
slabdata      0      0      0
size-32768             3      3  32768    1    8 : tunables    8    4    0 :
slabdata      3      3      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 :
slabdata      0      0      0
size-16384             1      1  16384    1    4 : tunables    8    4    0 :
slabdata      1      1      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 :
slabdata      0      0      0
size-8192              7      7   8192    1    2 : tunables    8    4    0 :
slabdata      7      7      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    8 :
slabdata      0      0      0
size-4096           3564   3578   4096    1    1 : tunables   24   12    8 :
slabdata   3564   3578     72
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    8 :
slabdata      0      0      0
size-2048            258    258   2048    2    1 : tunables   24   12    8 :
slabdata    129    129     36
size-1620(DMA)         0      0   1664    4    2 : tunables   24   12    8 :
slabdata      0      0      0
size-1620             35     36   1664    4    2 : tunables   24   12    8 :
slabdata      9      9      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    8 :
slabdata      0      0      0
size-1024            215    252   1024    4    1 : tunables   54   27    8 :
slabdata     63     63      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    8 :
slabdata      0      0      0
size-512             934   2312    512    8    1 : tunables   54   27    8 :
slabdata    289    289    216
size-256(DMA)          0      0    256   15    1 : tunables  120   60    8 :
slabdata      0      0      0
size-256             607   1800    256   15    1 : tunables  120   60    8 :
slabdata    120    120      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    8 :
slabdata      0      0      0
size-128            3279   4712    128   31    1 : tunables  120   60    8 :
slabdata    152    152     60
size-64(DMA)           0      0     64   61    1 : tunables  120   60    8 :
slabdata      0      0      0
size-64            48922  48922     64   61    1 : tunables  120   60    8 :
slabdata    802    802     60
size-32(DMA)           0      0     32  119    1 : tunables  120   60    8 :
slabdata      0      0      0
size-32             7728   7973     32  119    1 : tunables  120   60    8 :
slabdata     67     67    273
kmem_cache           165    165    256   15    1 : tunables  120   60    8 :
slabdata     11     11      0

May 20 20:32:54 romulus kernel: oom-killer: gfp_mask=0xd0
May 20 20:32:55 romulus kernel: DMA per-cpu:
May 20 20:32:55 romulus kernel: cpu 0 hot: low 2, high 6, batch 1
May 20 20:32:55 romulus kernel: cpu 0 cold: low 0, high 2, batch 1
May 20 20:32:55 romulus kernel: cpu 1 hot: low 2, high 6, batch 1
May 20 20:32:55 romulus kernel: cpu 1 cold: low 0, high 2, batch 1
May 20 20:32:55 romulus kernel: cpu 2 hot: low 2, high 6, batch 1
May 20 20:32:55 romulus kernel: cpu 2 cold: low 0, high 2, batch 1
May 20 20:32:55 romulus kernel: cpu 3 hot: low 2, high 6, batch 1
May 20 20:32:55 romulus kernel: cpu 3 cold: low 0, high 2, batch 1
May 20 20:32:55 romulus kernel: Normal per-cpu:
May 20 20:32:55 romulus kernel: cpu 0 hot: low 32, high 96, batch 16
May 20 20:32:55 romulus kernel: cpu 0 cold: low 0, high 32, batch 16
May 20 20:33:01 romulus kernel: cpu 1 hot: low 32, high 96, batch 16
May 20 20:33:04 romulus kernel: cpu 1 cold: low 0, high 32, batch 16
May 20 20:33:06 romulus kernel: cpu 2 hot: low 32, high 96, batch 16
May 20 20:33:07 romulus crond(pam_unix)[1481]: session opened for user root by
(uid=0)
May 20 20:33:07 romulus kernel: cpu 2 cold: low 0, high 32, batch 16
May 20 20:33:08 romulus kernel: cpu 3 hot: low 32, high 96, batch 16
May 20 20:33:10 romulus kernel: cpu 3 cold: low 0, high 32, batch 16
May 20 20:33:11 romulus kernel: HighMem per-cpu:
May 20 20:33:13 romulus kernel: cpu 0 hot: low 14, high 42, batch 7
May 20 20:33:15 romulus kernel: cpu 0 cold: low 0, high 14, batch 7
May 20 20:33:15 romulus kernel: cpu 1 hot: low 14, high 42, batch 7
May 20 20:33:16 romulus kernel: cpu 1 cold: low 0, high 14, batch 7
May 20 20:33:17 romulus kernel: cpu 2 hot: low 14, high 42, batch 7
May 20 20:33:18 romulus kernel: cpu 2 cold: low 0, high 14, batch 7
May 20 20:33:19 romulus kernel: cpu 3 hot: low 14, high 42, batch 7
May 20 20:33:20 romulus kernel: cpu 3 cold: low 0, high 14, batch 7
May 20 20:33:20 romulus kernel:
May 20 20:33:20 romulus kernel: Free pages:        9564kB (252kB HighMem)
May 20 20:33:20 romulus kernel: Active:28055 inactive:310 dirty:0 writeback:26
unstable:0 free:2391 slab:217669 mapped:27615 pagetables:4796
May 20 20:33:22 romulus kernel: DMA free:16kB min:16kB low:32kB high:48kB
active:0kB inactive:0kB present:16384kB
May 20 20:33:24 romulus kernel: protections[]: 0 0 0
May 20 20:33:24 romulus crond(pam_unix)[1481]: session closed for user root
May 20 20:33:24 romulus kernel: Normal free:9296kB min:936kB low:1872kB
high:2808kB active:1188kB inactive:376kB present:901120kB
May 20 20:33:24 romulus kernel: protections[]: 0 0 0
May 20 20:33:24 romulus kernel: HighMem free:252kB min:128kB low:256kB
high:384kB active:111032kB inactive:864kB present:131008kB
May 20 20:33:25 romulus kernel: protections[]: 0 0 0
May 20 20:33:25 romulus kernel: DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB
May 20 20:33:25 romulus kernel: Normal: 2324*4kB 0*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 9296kB
May 20 20:33:26 romulus kernel: HighMem: 21*4kB 5*8kB 2*16kB 1*32kB 1*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 252kB
May 20 20:33:26 romulus kernel: Swap cache: add 133090, delete 131714, find
294133/298432, race 0+18
May 20 20:33:26 romulus kernel: Out of Memory: Killed process 555 (sshd).
May 20 20:33:26 romulus kernel: oom-killer: gfp_mask=0xd0
May 20 20:33:26 romulus kernel: DMA per-cpu:
May 20 20:33:26 romulus kernel: cpu 0 hot: low 2, high 6, batch 1
May 20 20:33:26 romulus kernel: cpu 0 cold: low 0, high 2, batch 1
May 20 20:33:26 romulus kernel: cpu 1 hot: low 2, high 6, batch 1
May 20 20:33:26 romulus kernel: cpu 1 cold: low 0, high 2, batch 1
May 20 20:33:27 romulus kernel: cpu 2 hot: low 2, high 6, batch 1
May 20 20:33:27 romulus kernel: cpu 2 cold: low 0, high 2, batch 1
May 20 20:33:28 romulus kernel: cpu 3 hot: low 2, high 6, batch 1
May 20 20:33:30 romulus kernel: cpu 3 cold: low 0, high 2, batch 1
May 20 20:33:32 romulus kernel: Normal per-cpu:
May 20 20:33:32 romulus kernel: cpu 0 hot: low 32, high 96, batch 16
May 20 20:33:33 romulus kernel: cpu 0 cold: low 0, high 32, batch 16
May 20 20:33:35 romulus kernel: cpu 1 hot: low 32, high 96, batch 16
May 20 20:33:36 romulus kernel: cpu 1 cold: low 0, high 32, batch 16
May 20 20:33:38 romulus kernel: cpu 2 hot: low 32, high 96, batch 16
May 20 20:33:39 romulus kernel: cpu 2 cold: low 0, high 32, batch 16
May 20 20:33:39 romulus kernel: cpu 3 hot: low 32, high 96, batch 16
May 20 20:33:39 romulus kernel: cpu 3 cold: low 0, high 32, batch 16
May 20 20:33:39 romulus kernel: HighMem per-cpu:
May 20 20:33:39 romulus kernel: cpu 0 hot: low 14, high 42, batch 7
May 20 20:33:39 romulus kernel: cpu 0 cold: low 0, high 14, batch 7
May 20 20:33:39 romulus kernel: cpu 1 hot: low 14, high 42, batch 7
May 20 20:33:39 romulus kernel: cpu 1 cold: low 0, high 14, batch 7
May 20 20:33:39 romulus kernel: cpu 2 hot: low 14, high 42, batch 7
May 20 20:33:39 romulus kernel: cpu 2 cold: low 0, high 14, batch 7
May 20 20:33:39 romulus kernel: cpu 3 hot: low 14, high 42, batch 7
May 20 20:33:39 romulus kernel: cpu 3 cold: low 0, high 14, batch 7
May 20 20:33:39 romulus kernel:
May 20 20:33:39 romulus kernel: Free pages:        8968kB (560kB HighMem)
May 20 20:33:39 romulus kernel: Active:13979 inactive:14354 dirty:2
writeback:480 unstable:0 free:2242 slab:217813 mapped:20388 pagetables:4802
May 20 20:33:39 romulus kernel: DMA free:16kB min:16kB low:32kB high:48kB
active:0kB inactive:0kB present:16384kB
May 20 20:33:39 romulus kernel: protections[]: 0 0 0
May 20 20:33:39 romulus kernel: Normal free:8392kB min:936kB low:1872kB
high:2808kB active:432kB inactive:1160kB present:901120kB
May 20 20:33:39 romulus kernel: protections[]: 0 0 0
May 20 20:33:39 romulus kernel: HighMem free:560kB min:128kB low:256kB
high:384kB active:55416kB inactive:56280kB present:131008kB
May 20 20:33:40 romulus kernel: protections[]: 0 0 0
May 20 20:33:40 romulus kernel: DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB
May 20 20:33:40 romulus kernel: Normal: 2098*4kB 0*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 8392kB
May 20 20:33:40 romulus kernel: HighMem: 28*4kB 16*8kB 8*16kB 4*32kB 1*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 560kB
May 20 20:33:40 romulus kernel: Swap cache: add 141570, delete 133003, find
294450/298900, race 0+18
May 20 20:33:40 romulus kernel: Out of Memory: Killed process 10100 (sshd).
May 20 20:33:40 romulus kernel: oom-killer: gfp_mask=0xd0
May 20 20:33:40 romulus kernel: DMA per-cpu:
May 20 20:33:40 romulus kernel: cpu 0 hot: low 2, high 6, batch 1
May 20 20:33:40 romulus kernel: cpu 0 cold: low 0, high 2, batch 1
May 20 20:33:40 romulus kernel: cpu 1 hot: low 2, high 6, batch 1
May 20 20:33:40 romulus kernel: cpu 1 cold: low 0, high 2, batch 1
May 20 20:33:40 romulus kernel: cpu 2 hot: low 2, high 6, batch 1
May 20 20:33:40 romulus kernel: cpu 2 cold: low 0, high 2, batch 1
May 20 20:33:40 romulus kernel: cpu 3 hot: low 2, high 6, batch 1
May 20 20:33:40 romulus kernel: cpu 3 cold: low 0, high 2, batch 1
May 20 20:33:40 romulus kernel: Normal per-cpu:
May 20 20:33:40 romulus kernel: cpu 0 hot: low 32, high 96, batch 16
May 20 20:33:40 romulus kernel: cpu 0 cold: low 0, high 32, batch 16
May 20 20:33:40 romulus kernel: cpu 1 hot: low 32, high 96, batch 16
May 20 20:33:40 romulus kernel: cpu 1 cold: low 0, high 32, batch 16
May 20 20:33:40 romulus kernel: cpu 2 hot: low 32, high 96, batch 16
May 20 20:33:40 romulus kernel: cpu 2 cold: low 0, high 32, batch 16
May 20 20:33:40 romulus kernel: cpu 3 hot: low 32, high 96, batch 16
May 20 20:33:40 romulus kernel: cpu 3 cold: low 0, high 32, batch 16
May 20 20:33:40 romulus kernel: HighMem per-cpu:
May 20 20:33:40 romulus kernel: cpu 0 hot: low 14, high 42, batch 7
May 20 20:33:40 romulus kernel: cpu 0 cold: low 0, high 14, batch 7
May 20 20:33:40 romulus kernel: cpu 1 hot: low 14, high 42, batch 7
May 20 20:33:40 romulus kernel: cpu 1 cold: low 0, high 14, batch 7
May 20 20:33:40 romulus kernel: cpu 2 hot: low 14, high 42, batch 7
May 20 20:33:40 romulus kernel: cpu 2 cold: low 0, high 14, batch 7
May 20 20:33:40 romulus kernel: cpu 3 hot: low 14, high 42, batch 7
May 20 20:33:40 romulus kernel: cpu 3 cold: low 0, high 14, batch 7
May 20 20:33:40 romulus kernel:
May 20 20:33:40 romulus kernel: Free pages:        9688kB (840kB HighMem)
May 20 20:33:40 romulus kernel: Active:14323 inactive:13859 dirty:0 writeback:6
unstable:0 free:2422 slab:217819 mapped:20454 pagetables:4688
May 20 20:33:40 romulus kernel: DMA free:16kB min:16kB low:32kB high:48kB
active:0kB inactive:0kB present:16384kB
May 20 20:33:40 romulus kernel: protections[]: 0 0 0
May 20 20:33:40 romulus kernel: Normal free:8832kB min:936kB low:1872kB
high:2808kB active:664kB inactive:624kB present:901120kB
May 20 20:33:40 romulus kernel: protections[]: 0 0 0
May 20 20:33:40 romulus kernel: HighMem free:840kB min:128kB low:256kB
high:384kB active:57212kB inactive:54484kB present:131008kB
May 20 20:33:40 romulus kernel: protections[]: 0 0 0
May 20 20:33:40 romulus kernel: DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB
May 20 20:33:40 romulus kernel: Normal: 2208*4kB 0*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 8832kB
May 20 20:33:40 romulus kernel: HighMem: 72*4kB 31*8kB 5*16kB 5*32kB 1*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 840kB
May 20 20:33:40 romulus kernel: Swap cache: add 143367, delete 135261, find
294951/299685, race 0+18
May 20 20:33:40 romulus kernel: Out of Memory: Killed process 1236
(imap-login).May 20 20:33:40 romulus kernel: Fixed up OOM kill of mm-less task
May 20 20:33:40 romulus kernel: oom-killer: gfp_mask=0xd0
May 20 20:33:40 romulus kernel: DMA per-cpu:
May 20 20:33:40 romulus kernel: cpu 0 hot: low 2, high 6, batch 1
May 20 20:33:40 romulus kernel: cpu 0 cold: low 0, high 2, batch 1
May 20 20:33:40 romulus kernel: cpu 1 hot: low 2, high 6, batch 1
May 20 20:33:40 romulus kernel: cpu 1 cold: low 0, high 2, batch 1
May 20 20:33:40 romulus kernel: cpu 2 hot: low 2, high 6, batch 1
May 20 20:33:41 romulus kernel: cpu 2 cold: low 0, high 2, batch 1
May 20 20:33:41 romulus kernel: cpu 3 hot: low 2, high 6, batch 1
May 20 20:33:41 romulus kernel: cpu 3 cold: low 0, high 2, batch 1
May 20 20:33:41 romulus kernel: Normal per-cpu:
May 20 20:33:41 romulus kernel: cpu 0 hot: low 32, high 96, batch 16
May 20 20:33:41 romulus kernel: cpu 0 cold: low 0, high 32, batch 16
May 20 20:33:41 romulus kernel: cpu 1 hot: low 32, high 96, batch 16
May 20 20:33:41 romulus kernel: cpu 1 cold: low 0, high 32, batch 16
May 20 20:33:41 romulus kernel: cpu 2 hot: low 32, high 96, batch 16
May 20 20:33:41 romulus kernel: cpu 2 cold: low 0, high 32, batch 16
May 20 20:33:41 romulus kernel: cpu 3 hot: low 32, high 96, batch 16
May 20 20:33:41 romulus kernel: cpu 3 cold: low 0, high 32, batch 16
May 20 20:33:41 romulus kernel: HighMem per-cpu:
May 20 20:33:41 romulus kernel: cpu 0 hot: low 14, high 42, batch 7
May 20 20:33:41 romulus kernel: cpu 0 cold: low 0, high 14, batch 7
May 20 20:33:41 romulus kernel: cpu 1 hot: low 14, high 42, batch 7
May 20 20:33:41 romulus kernel: cpu 1 cold: low 0, high 14, batch 7
May 20 20:33:41 romulus kernel: cpu 2 hot: low 14, high 42, batch 7
May 20 20:33:41 romulus kernel: cpu 2 cold: low 0, high 14, batch 7
May 20 20:33:41 romulus kernel: cpu 3 hot: low 14, high 42, batch 7
May 20 20:33:41 romulus kernel: cpu 3 cold: low 0, high 14, batch 7
May 20 20:33:41 romulus kernel:
May 20 20:33:41 romulus kernel: Free pages:       10872kB (952kB HighMem)
May 20 20:33:41 romulus kernel: Active:15073 inactive:13109 dirty:22
writeback:28 unstable:0 free:2718 slab:217672 mapped:20860 pagetables:4505
May 20 20:33:41 romulus kernel: DMA free:16kB min:16kB low:32kB high:48kB
active:0kB inactive:0kB present:16384kB
May 20 20:33:41 romulus kernel: protections[]: 0 0 0
May 20 20:33:41 romulus kernel: Normal free:9904kB min:936kB low:1872kB
high:2808kB active:308kB inactive:368kB present:901120kB
May 20 20:33:41 romulus kernel: protections[]: 0 0 0
May 20 20:33:41 romulus kernel: HighMem free:952kB min:128kB low:256kB
high:384kB active:60008kB inactive:51932kB present:131008kB
May 20 20:33:41 romulus kernel: protections[]: 0 0 0
May 20 20:33:41 romulus kernel: DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB
May 20 20:33:41 romulus kernel: Normal: 2476*4kB 0*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 9904kB
May 20 20:33:41 romulus kernel: HighMem: 68*4kB 35*8kB 9*16kB 6*32kB 1*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 952kB
May 20 20:33:41 romulus kernel: Swap cache: add 144163, delete 136573, find
295354/300210, race 0+18
May 20 20:33:41 romulus kernel: Out of Memory: Killed process 879 (vdelivermail).
May 20 20:33:41 romulus kernel: oom-killer: gfp_mask=0xd0
May 20 20:33:41 romulus kernel: DMA per-cpu:
May 20 20:33:41 romulus kernel: cpu 0 hot: low 2, high 6, batch 1
May 20 20:33:41 romulus kernel: cpu 0 cold: low 0, high 2, batch 1
May 20 20:33:41 romulus kernel: cpu 1 hot: low 2, high 6, batch 1
May 20 20:33:41 romulus kernel: cpu 1 cold: low 0, high 2, batch 1
May 20 20:33:41 romulus kernel: cpu 2 hot: low 2, high 6, batch 1
May 20 20:33:41 romulus kernel: cpu 2 cold: low 0, high 2, batch 1
May 20 20:33:41 romulus kernel: cpu 3 hot: low 2, high 6, batch 1
May 20 20:33:41 romulus kernel: cpu 3 cold: low 0, high 2, batch 1
May 20 20:33:41 romulus kernel: Normal per-cpu:
May 20 20:33:41 romulus kernel: cpu 0 hot: low 32, high 96, batch 16
May 20 20:33:41 romulus kernel: cpu 0 cold: low 0, high 32, batch 16
May 20 20:33:41 romulus kernel: cpu 1 hot: low 32, high 96, batch 16
May 20 20:33:41 romulus kernel: cpu 1 cold: low 0, high 32, batch 16
May 20 20:33:41 romulus kernel: cpu 2 hot: low 32, high 96, batch 16
May 20 20:33:41 romulus kernel: cpu 2 cold: low 0, high 32, batch 16
May 20 20:33:41 romulus kernel: cpu 3 hot: low 32, high 96, batch 16
May 20 20:33:42 romulus kernel: cpu 3 cold: low 0, high 32, batch 16
May 20 20:33:42 romulus kernel: HighMem per-cpu:
May 20 20:33:42 romulus kernel: cpu 0 hot: low 14, high 42, batch 7

it goes on and on...let me know if more is desired. 

[root@romulus ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md0              95500532   2564248  88085064   3% /
/dev/md2             385018324  69642340 295818128  20% /home
none                    517336         0    517336   0% /dev/shm
/dev/md1              95492532    797288  89844424   1% /var

[root@romulus ~]# mount
/dev/md0 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/md2 on /home type ext3 (rw)
none on /dev/shm type tmpfs (rw)
/dev/md1 on /var type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)


I suppose we can give the hugemem kernel a try.
Comment 30 jlewis 2005-05-25 09:03:13 EDT
We ran 2.6.9-5.0.5.ELhugemem for about 4 days before this happened again. 
Unfortunately, I haven't been able to get into the system to do any
troubleshooting once this starts.  We generally have to just have someone power
cycle it and then look through the logs to see what happened.  We have new RAM
to swap in, but I don't really expect that to help.
Comment 31 jlewis 2005-05-25 09:24:36 EDT
BTW...this looks like it might be similar to, if not the same bug as 149609 and
132562.
Comment 32 Larry Woodman 2005-05-25 13:20:56 EDT
The problem here is that the slabcache is consuming just about all of
lowmem(slab:217819).  The above /proc/slabinfo output shows most of the memory
in the dentry cache and bufferheads, both of which should have been shrunk by
kswapd. 

Please attach an AltSystq-T output when this hapeens so I can see what kswapd is
doing and a /proc/slabinfo output when the hugemem kernel is running when the
OOM kills happen so I can verify that the same problem is happening with that
kernel.

Thanks for your help and patience, Larry Woodman
Comment 33 Dave Jones 2005-05-25 15:09:38 EDT
John is running 5.0.5, which doesnt have any of the leak fixes that went into
U1, so this is probably an unrelated issue to this bug.

Comment 35 Tim Powers 2005-06-08 11:13:55 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-420.html

Note You need to log in before you can comment on or make changes to this bug.