Description of problem: After running the Certification Suite for 2 hours on a PE4125 with SATA drives and 12 GB of RAM. The oom-killer starts to kill processes off. Eventually the system locks up. Version-Release number of selected component (if applicable): How reproducible: Every run Steps to Reproduce: 1. Install RHEL4 RC1 on PE1425 with SATA drives and 12GB of RAM. 2. Run Red Hat Certification (redhat-ready) 3. Sit back. Actual results: oom-killer will start killing off processes Expected results: No oom-killer. Additional info:
Created attachment 110985 [details] /var/log/messages file from system.
This was originally thought related to BZ # 141173 See Larry Woodman's comment: https://bugzilla.redhat.com/beta/show_bug.cgi?id=141173#c120
Also, we were unable to obtain a failure when the ammount of memory was reduced to 5GB of RAM.
This is being caused by *someone* allocating all of lowmem! There are ~256K pages of lowmem and ~45K pages are in the slabcache and ~4K pages are on the lists here: Normal free:696kB active:8300kB inactive:7268kB present:901120kB Given the fact that this only happens when there is lots of highmem(this is a 13GB system with 12GB of highmem/9GB of non-DMA-able memory) and not of smaller systems(a 5GB system with 4GB of highmem/1GB of non-DMA-able memory) I would guess that the other 200K pages of lowmem are being used in bounce buffers. I'll add bounce buffer accounting to a test kernel so we can see if thats where that are and we'll have to proceed from there. Larry Woodman
I think this problem has been fixed in the pre-RHEL4-U1 kernel. Basically the bio ref counting was wrong and that caused the leaking bounce pages from lowmem. This is exactly what we are seeing on this system. Please get me a /proc/slabinfo output and try the latest RHEL4-U1 kernel ASAP. Larry Woodman
U1 kernel (2.6.9-6.37.EL) being given to Dell today; U1 ISOs should be available tomorrow. Please test with the U1 kernel and report status here.
oom-killer still appears in 2.6.9-6.37.EL SMP kernel. However, I didn't see this problem on hugemem kernel.
Created attachment 113728 [details] sysreport
Created attachment 113729 [details] dmesg log
Created attachment 113730 [details] /var/log/messages
Created attachment 113731 [details] results*.rpm package when running on SMP kernel The approximate time it took for the first failure to occur is about 2 hours
Created attachment 113732 [details] result*.rpm package when running on hugemem kernel There is no oom-killer.
Created attachment 113790 [details] result*.rpm when running with rhr2-1.1-3 oom-killer still appears.
Has RH had a chance to look at the most recent failure logs from the regression tests on U1 Beta kernel that Danny has posted ? As has been communicated earlier, this is being tracked as a U1 MUSTFIX.
Can someone simply get me the dmesg output that appears when you get the oom-kills? Thanks, I cant seem to get look at that rpm that was attached. Also, please get me a "uname -a" output so I can see the exact kernel version so I can trac down the exact patch set that it includes. Thanks, Larry Woodman
> Can someone simply get me the dmesg output that appears when you get the oom-kills? See comment #11 for dmesg output > Also, please get me a "uname -a" output so I can see the exact kernel version Sysreport is attached in comment #10 and should have the comprehensive information on the state of the system. Here is the requested info any way: Kernel version: 2.6.9-6.37.ELsmp Arch: x86
OK, this is a 13GB system?(3342336 pages of RAM) running the SMP kernel(3G/1G). While we do officially support this, you are much better off running the Hugemem(4G/4G) kernel because the cause of the OOM kills is lowmem exhaustion (Normal free:688kB active:8800kB inactive:8152kB present:901120kB). Having said all that, I would guess that you are running some sort of driver that is either leaking memory or using memory as a cache instead of using the slabcache. This is consuming all of lowmem which combided with running the SMP kernel is causing the OOM kills. >>>writeback:3348 slab:36667 >>>Normal free:688kB active:8800kB inactive:8152kB present:901120kB To help debug the lowmem leakage: 1.) reboot the SMP kernel and get an AltSysrq-M output before running anything. 2.) get me a /proc/slabinfo output as soon as an OOM kill occurs. 3.) get an lsmod so I can see what drivers are being used.
> OK, this is a 13GB system? This is a SC1425 system with 12 GB RAM. To be precise, it has 6x2GB Single rank DIMMS. > running the SMP kernel(3G/1G) Yes. Hugemem kernel passes fine. > 3.) get an lsmod so I can see what drivers are being used. This was an untainted kernel and nothing outside of what was on the RHEL 4 media was installed. Extracting the lsmod output from sysreport attached in comment #10: Module Size Used by iptable_nat 27236 0 ip_conntrack 45701 1 iptable_nat iptable_mangle 6721 0 iptable_filter 6721 0 ip_tables 21441 3 iptable_nat,iptable_mangle,iptable_filter nfsd 205281 9 exportfs 10049 1 nfsd lockd 65257 2 nfsd md5 8001 1 ipv6 238817 20 parport_pc 27905 0 lp 15405 0 parport 37641 2 parport_pc,lp autofs4 22085 0 i2c_dev 14273 0 i2c_core 25921 1 i2c_dev sunrpc 138789 19 nfsd,lockd dm_mod 58949 0 button 10449 0 battery 12869 0 ac 8773 0 uhci_hcd 32729 0 ehci_hcd 31813 0 hw_random 9557 0 e1000 83989 0 ext3 118729 2 jbd 59481 1 ext3 ata_piix 13125 4 libata 47133 1 ata_piix sd_mod 20545 5 scsi_mod 116429 2 libata,sd_mod > 1.) reboot the SMP kernel and get an AltSysrq-M output before running anything. System not available anymore. Sysreport (from comment #10) does have /proc/meminfo. This was captured after the failures though. Pasting /proc/meminfo output. MemTotal: 515260 kB MemFree: 8860 kB Buffers: 93652 kB Cached: 225828 kB SwapCached: 0 kB Active: 389520 kB Inactive: 44192 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 515260 kB LowFree: 8860 kB SwapTotal: 2097136 kB SwapFree: 2096960 kB Dirty: 68 kB Writeback: 0 kB Mapped: 151136 kB Slab: 62236 kB Committed_AS: 421136 kB PageTables: 3512 kB VmallocTotal: 499704 kB VmallocUsed: 4568 kB VmallocChunk: 491692 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 4096 kB Output of 'free' from sysreport: total used free shared buffers cached Mem: 12472132 295404 12176728 0 53812 87888 -/+ buffers/cache: 153704 12318428 Swap: 8385912 0 8385912 > 2.) get me a /proc/slabinfo output as soon as an OOM kill occurs. I am trying to acquire the system so we can restart the test to capture this stateful information. At the moment, what I have is what was captured in sysreport (comment #10) probably several hours after the fact OOM was invoked and stress was also stopped, so may not be very useful but any way I have attached it (slabinfo).
Created attachment 114228 [details] /proc/slabinfo from sysreport
The problem here is that after booting the SMP kernel on a system with this much RAM(3342336) there is only 1/2 of lowmem available(LowTotal: 515260 kB). Based on the slabinfo output over 400MB of that lowmem is wired in the slabcache. Add a few hundred processes, buffers allocated by the drivers and lots of bounce buffers for a system with this high a ratio of highmem to lowmem and avoiding OOM kills will be difficult if not impossible with the SMP kernel. I'm afraid that the only real answer is to run the Hugemem kernel. Is this a problem? Larry Woodman
> I'm afraid that the only real answer is to run the Hugemem kernel. Is this a problem? It's not a real problem rather an obscure messaging problem. Dell is currently communicating to it's customers that you only need to run the Hugemem kernel if you've got >16GB. In this case, you are suggesting that we run hugemem when you've got 12GB. There's the disconnect. Question is what is RH *officially* messaging to it's customers on usage of SMP Vs. Hugemem kernel ? If we can fix the messaging, we can close this issue as "Working as Designed"
The reality of the situation is when the ratio between Highmem and Lowmem exceeds about 10 to 1 the possibility of OOM kills increases significantly. Larry Woodman
I agree and am not arguing about your theory of why this is happening. All I am saying is that the system config on which this is readily happening happens to be an average config which Dell sells a lot i.e. EM64T system with 12 GB RAM. Since we cannot document the actual technical HIGHMEM / LOWMEM ratio factor to our customers as criteria for using SMP vs. Hugemem kernels, what do you propose we document. Currently we say that UP and SMP for up to 16GB and Hugemem for anything > 16GB.
Amit, I'm a bit confused! Is this an EM64T running an x86 SMP kernel? Larry
I'm sorry, I should clarify. The remark "EM64T system with 12 GB RAM." should have stated "EM64T capable system with 12 GB RAM.". The OS is very much x86. The system (SC1425) is 32/64 capable since it has the Intel EM64T procs. Sorry for the confusion.
As per discussion in today's con call, we can close this issue once RH issues a KB article on proper usage of Hugemem kernels in scenarios prone for OOM- killer invocations such as the one described in this issue.
I seem to be having this same problem with 2.6.9-5.0.5.ELsmp on a server with only 1GB RAM. # free total used free shared buffers cached Mem: 1034676 485980 548696 0 171348 160672 -/+ buffers/cache: 153960 880716 Swap: 2104432 0 2104432 # lsmod Module Size Used by nfsd 205153 9 exportfs 10049 1 nfsd lockd 65129 2 nfsd sunrpc 137637 19 nfsd,lockd md5 8001 1 ipv6 238945 32 ipt_REJECT 10561 1 ipt_state 5825 5 iptable_filter 6721 1 iptable_nat 27237 1 ip_conntrack 45701 2 ipt_state,iptable_nat ip_tables 21441 4 ipt_REJECT,ipt_state,iptable_filter,iptable_nat dm_mod 57157 0 button 10449 0 battery 12869 0 ac 8773 0 uhci_hcd 32473 0 e1000 82253 0 e100 35781 0 mii 8641 1 e100 floppy 58065 0 qla2200 90817 0 ext3 118473 3 jbd 59481 1 ext3 raid5 24129 1 xor 17609 1 raid5 raid1 19521 3 qla2100 83393 0 qla2xxx 109664 16 qla2200,qla2100 scsi_transport_fc 11713 1 qla2xxx sd_mod 20545 28 scsi_mod 116301 3 qla2xxx,scsi_transport_fc,sd_mod Here's a slabinfo from about 36min after a reboot. We don't generally get much time to look at things once the oom-killer goes nuts. slabinfo - version: 2.0 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> rpc_buffers 8 8 2048 2 1 : tunables 24 12 8 : slabdata 4 4 0 rpc_tasks 8 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0 rpc_inode_cache 6 7 512 7 1 : tunables 54 27 8 : slabdata 1 1 0 fib6_nodes 7 119 32 119 1 : tunables 120 60 8 : slabdata 1 1 0 ip6_dst_cache 7 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0 ndisc_cache 1 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0 rawv6_sock 6 10 768 5 1 : tunables 54 27 8 : slabdata 2 2 0 udpv6_sock 1 5 768 5 1 : tunables 54 27 8 : slabdata 1 1 0 tcpv6_sock 5 6 1280 3 1 : tunables 24 12 8 : slabdata 2 2 0 ip_fib_alias 14 226 16 226 1 : tunables 120 60 8 : slabdata 1 1 0 ip_fib_hash 14 119 32 119 1 : tunables 120 60 8 : slabdata 1 1 0 ip_conntrack_expect 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 ip_conntrack 2708 3400 384 10 1 : tunables 54 27 8 : slabdata 340 340 0 dm_tio 0 0 16 226 1 : tunables 120 60 8 : slabdata 0 0 0 dm_io 0 0 16 226 1 : tunables 120 60 8 : slabdata 0 0 0 raid5/md2 256 258 1344 3 1 : tunables 24 12 8 : slabdata 86 86 0 uhci_urb_priv 0 0 44 88 1 : tunables 120 60 8 : slabdata 0 0 0 scsi_cmd_cache 215 240 384 10 1 : tunables 54 27 8 : slabdata 24 24 135 ext3_inode_cache 54782 54782 552 7 1 : tunables 54 27 8 : slabdata 7826 7826 0 ext3_xattr 0 0 48 81 1 : tunables 120 60 8 : slabdata 0 0 0 journal_handle 199 405 28 135 1 : tunables 120 60 8 : slabdata 3 3 15 journal_head 1255 3159 48 81 1 : tunables 120 60 8 : slabdata 39 39 300 revoke_table 6 290 12 290 1 : tunables 120 60 8 : slabdata 1 1 0 revoke_record 37 226 16 226 1 : tunables 120 60 8 : slabdata 1 1 0 qla2xxx_srbs 432 496 128 31 1 : tunables 120 60 8 : slabdata 16 16 120 sgpool-128 32 33 2560 3 2 : tunables 24 12 8 : slabdata 11 11 0 sgpool-64 32 33 1280 3 1 : tunables 24 12 8 : slabdata 11 11 0 sgpool-32 33 36 640 6 1 : tunables 54 27 8 : slabdata 6 6 0 sgpool-16 59 60 384 10 1 : tunables 54 27 8 : slabdata 6 6 0 sgpool-8 234 330 256 15 1 : tunables 120 60 8 : slabdata 22 22 60 unix_sock 105 105 512 7 1 : tunables 54 27 8 : slabdata 15 15 0 ip_mrt_cache 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 tcp_tw_bucket 606 744 128 31 1 : tunables 120 60 8 : slabdata 24 24 0 tcp_bind_bucket 858 1582 16 226 1 : tunables 120 60 8 : slabdata 7 7 60 tcp_open_request 24 31 128 31 1 : tunables 120 60 8 : slabdata 1 1 0 inet_peer_cache 99 122 64 61 1 : tunables 120 60 8 : slabdata 2 2 0 secpath_cache 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 xfrm_dst_cache 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 ip_dst_cache 174 195 256 15 1 : tunables 120 60 8 : slabdata 13 13 0 arp_cache 6 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0 raw_sock 5 6 640 6 1 : tunables 54 27 8 : slabdata 1 1 0 udp_sock 11 24 640 6 1 : tunables 54 27 8 : slabdata 4 4 0 tcp_sock 323 616 1152 7 2 : tunables 24 12 8 : slabdata 88 88 84 flow_cache 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 mqueue_inode_cache 1 6 640 6 1 : tunables 54 27 8 : slabdata 1 1 0 isofs_inode_cache 0 0 372 10 1 : tunables 54 27 8 : slabdata 0 0 0 hugetlbfs_inode_cache 1 11 344 11 1 : tunables 54 27 8 : slabdata 1 1 0 ext2_inode_cache 0 0 488 8 1 : tunables 54 27 8 : slabdata 0 0 0 ext2_xattr 0 0 48 81 1 : tunables 120 60 8 : slabdata 0 0 0 dquot 0 0 144 27 1 : tunables 120 60 8 : slabdata 0 0 0 eventpoll_pwq 3 107 36 107 1 : tunables 120 60 8 : slabdata 1 1 0 eventpoll_epi 3 31 128 31 1 : tunables 120 60 8 : slabdata 1 1 0 kioctx 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 kiocb 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 dnotify_cache 1 185 20 185 1 : tunables 120 60 8 : slabdata 1 1 0 fasync_cache 0 0 16 226 1 : tunables 120 60 8 : slabdata 0 0 0 shmem_inode_cache 307 333 444 9 1 : tunables 54 27 8 : slabdata 37 37 0 posix_timers_cache 0 0 112 35 1 : tunables 120 60 8 : slabdata 0 0 0 uid_cache 13 61 64 61 1 : tunables 120 60 8 : slabdata 1 1 0 cfq_pool 332 357 32 119 1 : tunables 120 60 8 : slabdata 3 3 120 crq_pool 525 960 40 96 1 : tunables 120 60 8 : slabdata 10 10 208 deadline_drq 0 0 52 75 1 : tunables 120 60 8 : slabdata 0 0 0 as_arq 0 0 64 61 1 : tunables 120 60 8 : slabdata 0 0 0 blkdev_ioc 130 370 20 185 1 : tunables 120 60 8 : slabdata 2 2 0 blkdev_queue 38 56 488 8 1 : tunables 54 27 8 : slabdata 7 7 0 blkdev_requests 537 825 160 25 1 : tunables 120 60 8 : slabdata 33 33 224 biovec-(256) 256 256 3072 2 2 : tunables 24 12 8 : slabdata 128 128 0 biovec-128 256 260 1536 5 2 : tunables 24 12 8 : slabdata 52 52 0 biovec-64 270 270 768 5 1 : tunables 54 27 8 : slabdata 54 54 0 biovec-16 259 285 256 15 1 : tunables 120 60 8 : slabdata 19 19 0 biovec-4 270 305 64 61 1 : tunables 120 60 8 : slabdata 5 5 0 biovec-1 32590 32996 16 226 1 : tunables 120 60 8 : slabdata 146 146 180 bio 32587 32767 128 31 1 : tunables 120 60 8 : slabdata 1057 1057 180 file_lock_cache 123 123 96 41 1 : tunables 120 60 8 : slabdata 3 3 0 sock_inode_cache 454 735 512 7 1 : tunables 54 27 8 : slabdata 105 105 77 skbuff_head_cache 1800 1860 256 15 1 : tunables 120 60 8 : slabdata 124 124 240 sock 5 10 384 10 1 : tunables 54 27 8 : slabdata 1 1 0 proc_inode_cache 6655 6655 360 11 1 : tunables 54 27 8 : slabdata 605 605 0 sigqueue 295 297 148 27 1 : tunables 120 60 8 : slabdata 11 11 0 radix_tree_node 18650 18802 276 14 1 : tunables 54 27 8 : slabdata 1343 1343 27 bdev_cache 58 63 512 7 1 : tunables 54 27 8 : slabdata 9 9 0 mnt_cache 29 62 128 31 1 : tunables 120 60 8 : slabdata 2 2 0 inode_cache 1999 2035 344 11 1 : tunables 54 27 8 : slabdata 185 185 0 dentry_cache 127110 127140 152 26 1 : tunables 120 60 8 : slabdata 4890 4890 30 filp 1451 2610 256 15 1 : tunables 120 60 8 : slabdata 174 174 264 names_cache 47 47 4096 1 1 : tunables 24 12 8 : slabdata 47 47 0 avc_node 12 300 52 75 1 : tunables 120 60 8 : slabdata 4 4 0 idr_layer_cache 88 116 136 29 1 : tunables 120 60 8 : slabdata 4 4 0 buffer_head 62085 62100 52 75 1 : tunables 120 60 8 : slabdata 828 828 120 mm_struct 261 435 768 5 1 : tunables 54 27 8 : slabdata 87 87 0 vm_area_struct 4228 8325 88 45 1 : tunables 120 60 8 : slabdata 185 185 360 fs_cache 465 671 64 61 1 : tunables 120 60 8 : slabdata 11 11 0 files_cache 297 448 512 7 1 : tunables 54 27 8 : slabdata 64 64 33 signal_cache 468 713 128 31 1 : tunables 120 60 8 : slabdata 23 23 0 sighand_cache 344 470 1408 5 2 : tunables 24 12 8 : slabdata 94 94 88 task_struct 2407 2530 1408 5 2 : tunables 24 12 8 : slabdata 506 506 88 anon_vma 1315 3616 16 226 1 : tunables 120 60 8 : slabdata 16 16 344 pgd 414 714 32 119 1 : tunables 120 60 8 : slabdata 6 6 7 pmd 618 629 4096 1 1 : tunables 24 12 8 : slabdata 618 629 96 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 1 1 65536 1 16 : tunables 8 4 0 : slabdata 1 1 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 3 3 32768 1 8 : tunables 8 4 0 : slabdata 3 3 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 1 1 16384 1 4 : tunables 8 4 0 : slabdata 1 1 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 7 7 8192 1 2 : tunables 8 4 0 : slabdata 7 7 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 8 : slabdata 0 0 0 size-4096 3564 3578 4096 1 1 : tunables 24 12 8 : slabdata 3564 3578 72 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 8 : slabdata 0 0 0 size-2048 258 258 2048 2 1 : tunables 24 12 8 : slabdata 129 129 36 size-1620(DMA) 0 0 1664 4 2 : tunables 24 12 8 : slabdata 0 0 0 size-1620 35 36 1664 4 2 : tunables 24 12 8 : slabdata 9 9 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0 size-1024 215 252 1024 4 1 : tunables 54 27 8 : slabdata 63 63 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 8 : slabdata 0 0 0 size-512 934 2312 512 8 1 : tunables 54 27 8 : slabdata 289 289 216 size-256(DMA) 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 size-256 607 1800 256 15 1 : tunables 120 60 8 : slabdata 120 120 0 size-128(DMA) 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 size-128 3279 4712 128 31 1 : tunables 120 60 8 : slabdata 152 152 60 size-64(DMA) 0 0 64 61 1 : tunables 120 60 8 : slabdata 0 0 0 size-64 48922 48922 64 61 1 : tunables 120 60 8 : slabdata 802 802 60 size-32(DMA) 0 0 32 119 1 : tunables 120 60 8 : slabdata 0 0 0 size-32 7728 7973 32 119 1 : tunables 120 60 8 : slabdata 67 67 273 kmem_cache 165 165 256 15 1 : tunables 120 60 8 : slabdata 11 11 0 May 20 20:32:54 romulus kernel: oom-killer: gfp_mask=0xd0 May 20 20:32:55 romulus kernel: DMA per-cpu: May 20 20:32:55 romulus kernel: cpu 0 hot: low 2, high 6, batch 1 May 20 20:32:55 romulus kernel: cpu 0 cold: low 0, high 2, batch 1 May 20 20:32:55 romulus kernel: cpu 1 hot: low 2, high 6, batch 1 May 20 20:32:55 romulus kernel: cpu 1 cold: low 0, high 2, batch 1 May 20 20:32:55 romulus kernel: cpu 2 hot: low 2, high 6, batch 1 May 20 20:32:55 romulus kernel: cpu 2 cold: low 0, high 2, batch 1 May 20 20:32:55 romulus kernel: cpu 3 hot: low 2, high 6, batch 1 May 20 20:32:55 romulus kernel: cpu 3 cold: low 0, high 2, batch 1 May 20 20:32:55 romulus kernel: Normal per-cpu: May 20 20:32:55 romulus kernel: cpu 0 hot: low 32, high 96, batch 16 May 20 20:32:55 romulus kernel: cpu 0 cold: low 0, high 32, batch 16 May 20 20:33:01 romulus kernel: cpu 1 hot: low 32, high 96, batch 16 May 20 20:33:04 romulus kernel: cpu 1 cold: low 0, high 32, batch 16 May 20 20:33:06 romulus kernel: cpu 2 hot: low 32, high 96, batch 16 May 20 20:33:07 romulus crond(pam_unix)[1481]: session opened for user root by (uid=0) May 20 20:33:07 romulus kernel: cpu 2 cold: low 0, high 32, batch 16 May 20 20:33:08 romulus kernel: cpu 3 hot: low 32, high 96, batch 16 May 20 20:33:10 romulus kernel: cpu 3 cold: low 0, high 32, batch 16 May 20 20:33:11 romulus kernel: HighMem per-cpu: May 20 20:33:13 romulus kernel: cpu 0 hot: low 14, high 42, batch 7 May 20 20:33:15 romulus kernel: cpu 0 cold: low 0, high 14, batch 7 May 20 20:33:15 romulus kernel: cpu 1 hot: low 14, high 42, batch 7 May 20 20:33:16 romulus kernel: cpu 1 cold: low 0, high 14, batch 7 May 20 20:33:17 romulus kernel: cpu 2 hot: low 14, high 42, batch 7 May 20 20:33:18 romulus kernel: cpu 2 cold: low 0, high 14, batch 7 May 20 20:33:19 romulus kernel: cpu 3 hot: low 14, high 42, batch 7 May 20 20:33:20 romulus kernel: cpu 3 cold: low 0, high 14, batch 7 May 20 20:33:20 romulus kernel: May 20 20:33:20 romulus kernel: Free pages: 9564kB (252kB HighMem) May 20 20:33:20 romulus kernel: Active:28055 inactive:310 dirty:0 writeback:26 unstable:0 free:2391 slab:217669 mapped:27615 pagetables:4796 May 20 20:33:22 romulus kernel: DMA free:16kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB May 20 20:33:24 romulus kernel: protections[]: 0 0 0 May 20 20:33:24 romulus crond(pam_unix)[1481]: session closed for user root May 20 20:33:24 romulus kernel: Normal free:9296kB min:936kB low:1872kB high:2808kB active:1188kB inactive:376kB present:901120kB May 20 20:33:24 romulus kernel: protections[]: 0 0 0 May 20 20:33:24 romulus kernel: HighMem free:252kB min:128kB low:256kB high:384kB active:111032kB inactive:864kB present:131008kB May 20 20:33:25 romulus kernel: protections[]: 0 0 0 May 20 20:33:25 romulus kernel: DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB May 20 20:33:25 romulus kernel: Normal: 2324*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 9296kB May 20 20:33:26 romulus kernel: HighMem: 21*4kB 5*8kB 2*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 252kB May 20 20:33:26 romulus kernel: Swap cache: add 133090, delete 131714, find 294133/298432, race 0+18 May 20 20:33:26 romulus kernel: Out of Memory: Killed process 555 (sshd). May 20 20:33:26 romulus kernel: oom-killer: gfp_mask=0xd0 May 20 20:33:26 romulus kernel: DMA per-cpu: May 20 20:33:26 romulus kernel: cpu 0 hot: low 2, high 6, batch 1 May 20 20:33:26 romulus kernel: cpu 0 cold: low 0, high 2, batch 1 May 20 20:33:26 romulus kernel: cpu 1 hot: low 2, high 6, batch 1 May 20 20:33:26 romulus kernel: cpu 1 cold: low 0, high 2, batch 1 May 20 20:33:27 romulus kernel: cpu 2 hot: low 2, high 6, batch 1 May 20 20:33:27 romulus kernel: cpu 2 cold: low 0, high 2, batch 1 May 20 20:33:28 romulus kernel: cpu 3 hot: low 2, high 6, batch 1 May 20 20:33:30 romulus kernel: cpu 3 cold: low 0, high 2, batch 1 May 20 20:33:32 romulus kernel: Normal per-cpu: May 20 20:33:32 romulus kernel: cpu 0 hot: low 32, high 96, batch 16 May 20 20:33:33 romulus kernel: cpu 0 cold: low 0, high 32, batch 16 May 20 20:33:35 romulus kernel: cpu 1 hot: low 32, high 96, batch 16 May 20 20:33:36 romulus kernel: cpu 1 cold: low 0, high 32, batch 16 May 20 20:33:38 romulus kernel: cpu 2 hot: low 32, high 96, batch 16 May 20 20:33:39 romulus kernel: cpu 2 cold: low 0, high 32, batch 16 May 20 20:33:39 romulus kernel: cpu 3 hot: low 32, high 96, batch 16 May 20 20:33:39 romulus kernel: cpu 3 cold: low 0, high 32, batch 16 May 20 20:33:39 romulus kernel: HighMem per-cpu: May 20 20:33:39 romulus kernel: cpu 0 hot: low 14, high 42, batch 7 May 20 20:33:39 romulus kernel: cpu 0 cold: low 0, high 14, batch 7 May 20 20:33:39 romulus kernel: cpu 1 hot: low 14, high 42, batch 7 May 20 20:33:39 romulus kernel: cpu 1 cold: low 0, high 14, batch 7 May 20 20:33:39 romulus kernel: cpu 2 hot: low 14, high 42, batch 7 May 20 20:33:39 romulus kernel: cpu 2 cold: low 0, high 14, batch 7 May 20 20:33:39 romulus kernel: cpu 3 hot: low 14, high 42, batch 7 May 20 20:33:39 romulus kernel: cpu 3 cold: low 0, high 14, batch 7 May 20 20:33:39 romulus kernel: May 20 20:33:39 romulus kernel: Free pages: 8968kB (560kB HighMem) May 20 20:33:39 romulus kernel: Active:13979 inactive:14354 dirty:2 writeback:480 unstable:0 free:2242 slab:217813 mapped:20388 pagetables:4802 May 20 20:33:39 romulus kernel: DMA free:16kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB May 20 20:33:39 romulus kernel: protections[]: 0 0 0 May 20 20:33:39 romulus kernel: Normal free:8392kB min:936kB low:1872kB high:2808kB active:432kB inactive:1160kB present:901120kB May 20 20:33:39 romulus kernel: protections[]: 0 0 0 May 20 20:33:39 romulus kernel: HighMem free:560kB min:128kB low:256kB high:384kB active:55416kB inactive:56280kB present:131008kB May 20 20:33:40 romulus kernel: protections[]: 0 0 0 May 20 20:33:40 romulus kernel: DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB May 20 20:33:40 romulus kernel: Normal: 2098*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 8392kB May 20 20:33:40 romulus kernel: HighMem: 28*4kB 16*8kB 8*16kB 4*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 560kB May 20 20:33:40 romulus kernel: Swap cache: add 141570, delete 133003, find 294450/298900, race 0+18 May 20 20:33:40 romulus kernel: Out of Memory: Killed process 10100 (sshd). May 20 20:33:40 romulus kernel: oom-killer: gfp_mask=0xd0 May 20 20:33:40 romulus kernel: DMA per-cpu: May 20 20:33:40 romulus kernel: cpu 0 hot: low 2, high 6, batch 1 May 20 20:33:40 romulus kernel: cpu 0 cold: low 0, high 2, batch 1 May 20 20:33:40 romulus kernel: cpu 1 hot: low 2, high 6, batch 1 May 20 20:33:40 romulus kernel: cpu 1 cold: low 0, high 2, batch 1 May 20 20:33:40 romulus kernel: cpu 2 hot: low 2, high 6, batch 1 May 20 20:33:40 romulus kernel: cpu 2 cold: low 0, high 2, batch 1 May 20 20:33:40 romulus kernel: cpu 3 hot: low 2, high 6, batch 1 May 20 20:33:40 romulus kernel: cpu 3 cold: low 0, high 2, batch 1 May 20 20:33:40 romulus kernel: Normal per-cpu: May 20 20:33:40 romulus kernel: cpu 0 hot: low 32, high 96, batch 16 May 20 20:33:40 romulus kernel: cpu 0 cold: low 0, high 32, batch 16 May 20 20:33:40 romulus kernel: cpu 1 hot: low 32, high 96, batch 16 May 20 20:33:40 romulus kernel: cpu 1 cold: low 0, high 32, batch 16 May 20 20:33:40 romulus kernel: cpu 2 hot: low 32, high 96, batch 16 May 20 20:33:40 romulus kernel: cpu 2 cold: low 0, high 32, batch 16 May 20 20:33:40 romulus kernel: cpu 3 hot: low 32, high 96, batch 16 May 20 20:33:40 romulus kernel: cpu 3 cold: low 0, high 32, batch 16 May 20 20:33:40 romulus kernel: HighMem per-cpu: May 20 20:33:40 romulus kernel: cpu 0 hot: low 14, high 42, batch 7 May 20 20:33:40 romulus kernel: cpu 0 cold: low 0, high 14, batch 7 May 20 20:33:40 romulus kernel: cpu 1 hot: low 14, high 42, batch 7 May 20 20:33:40 romulus kernel: cpu 1 cold: low 0, high 14, batch 7 May 20 20:33:40 romulus kernel: cpu 2 hot: low 14, high 42, batch 7 May 20 20:33:40 romulus kernel: cpu 2 cold: low 0, high 14, batch 7 May 20 20:33:40 romulus kernel: cpu 3 hot: low 14, high 42, batch 7 May 20 20:33:40 romulus kernel: cpu 3 cold: low 0, high 14, batch 7 May 20 20:33:40 romulus kernel: May 20 20:33:40 romulus kernel: Free pages: 9688kB (840kB HighMem) May 20 20:33:40 romulus kernel: Active:14323 inactive:13859 dirty:0 writeback:6 unstable:0 free:2422 slab:217819 mapped:20454 pagetables:4688 May 20 20:33:40 romulus kernel: DMA free:16kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB May 20 20:33:40 romulus kernel: protections[]: 0 0 0 May 20 20:33:40 romulus kernel: Normal free:8832kB min:936kB low:1872kB high:2808kB active:664kB inactive:624kB present:901120kB May 20 20:33:40 romulus kernel: protections[]: 0 0 0 May 20 20:33:40 romulus kernel: HighMem free:840kB min:128kB low:256kB high:384kB active:57212kB inactive:54484kB present:131008kB May 20 20:33:40 romulus kernel: protections[]: 0 0 0 May 20 20:33:40 romulus kernel: DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB May 20 20:33:40 romulus kernel: Normal: 2208*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 8832kB May 20 20:33:40 romulus kernel: HighMem: 72*4kB 31*8kB 5*16kB 5*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 840kB May 20 20:33:40 romulus kernel: Swap cache: add 143367, delete 135261, find 294951/299685, race 0+18 May 20 20:33:40 romulus kernel: Out of Memory: Killed process 1236 (imap-login).May 20 20:33:40 romulus kernel: Fixed up OOM kill of mm-less task May 20 20:33:40 romulus kernel: oom-killer: gfp_mask=0xd0 May 20 20:33:40 romulus kernel: DMA per-cpu: May 20 20:33:40 romulus kernel: cpu 0 hot: low 2, high 6, batch 1 May 20 20:33:40 romulus kernel: cpu 0 cold: low 0, high 2, batch 1 May 20 20:33:40 romulus kernel: cpu 1 hot: low 2, high 6, batch 1 May 20 20:33:40 romulus kernel: cpu 1 cold: low 0, high 2, batch 1 May 20 20:33:40 romulus kernel: cpu 2 hot: low 2, high 6, batch 1 May 20 20:33:41 romulus kernel: cpu 2 cold: low 0, high 2, batch 1 May 20 20:33:41 romulus kernel: cpu 3 hot: low 2, high 6, batch 1 May 20 20:33:41 romulus kernel: cpu 3 cold: low 0, high 2, batch 1 May 20 20:33:41 romulus kernel: Normal per-cpu: May 20 20:33:41 romulus kernel: cpu 0 hot: low 32, high 96, batch 16 May 20 20:33:41 romulus kernel: cpu 0 cold: low 0, high 32, batch 16 May 20 20:33:41 romulus kernel: cpu 1 hot: low 32, high 96, batch 16 May 20 20:33:41 romulus kernel: cpu 1 cold: low 0, high 32, batch 16 May 20 20:33:41 romulus kernel: cpu 2 hot: low 32, high 96, batch 16 May 20 20:33:41 romulus kernel: cpu 2 cold: low 0, high 32, batch 16 May 20 20:33:41 romulus kernel: cpu 3 hot: low 32, high 96, batch 16 May 20 20:33:41 romulus kernel: cpu 3 cold: low 0, high 32, batch 16 May 20 20:33:41 romulus kernel: HighMem per-cpu: May 20 20:33:41 romulus kernel: cpu 0 hot: low 14, high 42, batch 7 May 20 20:33:41 romulus kernel: cpu 0 cold: low 0, high 14, batch 7 May 20 20:33:41 romulus kernel: cpu 1 hot: low 14, high 42, batch 7 May 20 20:33:41 romulus kernel: cpu 1 cold: low 0, high 14, batch 7 May 20 20:33:41 romulus kernel: cpu 2 hot: low 14, high 42, batch 7 May 20 20:33:41 romulus kernel: cpu 2 cold: low 0, high 14, batch 7 May 20 20:33:41 romulus kernel: cpu 3 hot: low 14, high 42, batch 7 May 20 20:33:41 romulus kernel: cpu 3 cold: low 0, high 14, batch 7 May 20 20:33:41 romulus kernel: May 20 20:33:41 romulus kernel: Free pages: 10872kB (952kB HighMem) May 20 20:33:41 romulus kernel: Active:15073 inactive:13109 dirty:22 writeback:28 unstable:0 free:2718 slab:217672 mapped:20860 pagetables:4505 May 20 20:33:41 romulus kernel: DMA free:16kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB May 20 20:33:41 romulus kernel: protections[]: 0 0 0 May 20 20:33:41 romulus kernel: Normal free:9904kB min:936kB low:1872kB high:2808kB active:308kB inactive:368kB present:901120kB May 20 20:33:41 romulus kernel: protections[]: 0 0 0 May 20 20:33:41 romulus kernel: HighMem free:952kB min:128kB low:256kB high:384kB active:60008kB inactive:51932kB present:131008kB May 20 20:33:41 romulus kernel: protections[]: 0 0 0 May 20 20:33:41 romulus kernel: DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB May 20 20:33:41 romulus kernel: Normal: 2476*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 9904kB May 20 20:33:41 romulus kernel: HighMem: 68*4kB 35*8kB 9*16kB 6*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 952kB May 20 20:33:41 romulus kernel: Swap cache: add 144163, delete 136573, find 295354/300210, race 0+18 May 20 20:33:41 romulus kernel: Out of Memory: Killed process 879 (vdelivermail). May 20 20:33:41 romulus kernel: oom-killer: gfp_mask=0xd0 May 20 20:33:41 romulus kernel: DMA per-cpu: May 20 20:33:41 romulus kernel: cpu 0 hot: low 2, high 6, batch 1 May 20 20:33:41 romulus kernel: cpu 0 cold: low 0, high 2, batch 1 May 20 20:33:41 romulus kernel: cpu 1 hot: low 2, high 6, batch 1 May 20 20:33:41 romulus kernel: cpu 1 cold: low 0, high 2, batch 1 May 20 20:33:41 romulus kernel: cpu 2 hot: low 2, high 6, batch 1 May 20 20:33:41 romulus kernel: cpu 2 cold: low 0, high 2, batch 1 May 20 20:33:41 romulus kernel: cpu 3 hot: low 2, high 6, batch 1 May 20 20:33:41 romulus kernel: cpu 3 cold: low 0, high 2, batch 1 May 20 20:33:41 romulus kernel: Normal per-cpu: May 20 20:33:41 romulus kernel: cpu 0 hot: low 32, high 96, batch 16 May 20 20:33:41 romulus kernel: cpu 0 cold: low 0, high 32, batch 16 May 20 20:33:41 romulus kernel: cpu 1 hot: low 32, high 96, batch 16 May 20 20:33:41 romulus kernel: cpu 1 cold: low 0, high 32, batch 16 May 20 20:33:41 romulus kernel: cpu 2 hot: low 32, high 96, batch 16 May 20 20:33:41 romulus kernel: cpu 2 cold: low 0, high 32, batch 16 May 20 20:33:41 romulus kernel: cpu 3 hot: low 32, high 96, batch 16 May 20 20:33:42 romulus kernel: cpu 3 cold: low 0, high 32, batch 16 May 20 20:33:42 romulus kernel: HighMem per-cpu: May 20 20:33:42 romulus kernel: cpu 0 hot: low 14, high 42, batch 7 it goes on and on...let me know if more is desired. [root@romulus ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/md0 95500532 2564248 88085064 3% / /dev/md2 385018324 69642340 295818128 20% /home none 517336 0 517336 0% /dev/shm /dev/md1 95492532 797288 89844424 1% /var [root@romulus ~]# mount /dev/md0 on / type ext3 (rw) none on /proc type proc (rw) none on /sys type sysfs (rw) none on /dev/pts type devpts (rw,gid=5,mode=620) usbfs on /proc/bus/usb type usbfs (rw) /dev/md2 on /home type ext3 (rw) none on /dev/shm type tmpfs (rw) /dev/md1 on /var type ext3 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) I suppose we can give the hugemem kernel a try.
We ran 2.6.9-5.0.5.ELhugemem for about 4 days before this happened again. Unfortunately, I haven't been able to get into the system to do any troubleshooting once this starts. We generally have to just have someone power cycle it and then look through the logs to see what happened. We have new RAM to swap in, but I don't really expect that to help.
BTW...this looks like it might be similar to, if not the same bug as 149609 and 132562.
The problem here is that the slabcache is consuming just about all of lowmem(slab:217819). The above /proc/slabinfo output shows most of the memory in the dentry cache and bufferheads, both of which should have been shrunk by kswapd. Please attach an AltSystq-T output when this hapeens so I can see what kswapd is doing and a /proc/slabinfo output when the hugemem kernel is running when the OOM kills happen so I can verify that the same problem is happening with that kernel. Thanks for your help and patience, Larry Woodman
John is running 5.0.5, which doesnt have any of the leak fixes that went into U1, so this is probably an unrelated issue to this bug.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-420.html