Bug 112891
Summary: | OOM killer strikes with lots of free swap space | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Martin Wilck <martin.wilck> | ||||
Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||
Status: | CLOSED ERRATA | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.0 | CC: | chrisw, david.browne, hfuchi, michigantech, nobody+wcheng, petrides, raimondi, riel, rperkins, tao | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-08-05 21:26:33 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 107562, 114888 | ||||||
Attachments: |
|
Description
Martin Wilck
2004-01-05 14:43:25 UTC
Created attachment 96768 [details]
Typical system log showing OOM messages
1) please repeat with the latest available kernel instead 2) the dpt_i2o driver isn't too good and can't do direct dma over 1Gb and uses/allocated bounce buffers; on a 8Gb machine this is quite inefficient. We were able to alleviate the problem partially by re-adding the lines + /* + * Enough swap space left? Not OOM. + */ + if (nr_swap_pages > 0) + return; at the beginning of out_of_memory() in mm/oom_kill.c. These lines are deleted by the rmap patch (linux-2.4.21-rmap.patch) of the 2.4.21-4.EL kernel. More precisely, they were removed with rmap-14b as of 2002/09/17. With these lines re-added, the overall system behaviour is more graceful. Processes aren't killed anymore. However the system still runs into extreme memory pressure which is hard to explain under the given load conditions. The newest EL3.0 kernel I know about is -4.0.1EL. The 5 added lines in that kernel are unlikely to have solved this problem AFAICT. (We will try it nonetheless). Or do you want us to try the beta release? We know about the bounce buffer problem. But our Japanese colleagues need to release new machines with this controller and driver. The machine I sent the log for had 3GB which is pretty common in today's servers. Anyway, earlier RedHat kernels did not show this problem (the current AS2.1 kernel doesn't, for example). We checked -6.EL now and it shows the same behavior as -4.EL, i.e. no improvement. Processes are still killed, although there is always free memory available (both high and low), and swap isn't used at all. More info tomorrow. Took me somewhat longer to get information. First of all, I confirm that the problem still occurs with -6.EL. My most impressive finding is that the number of inactive_clean pages in zone "Normal" becomes _negative_. Yes, negative. I have put a call to show_mem into the oom_kill function in order to see what's happening, here is the output when the killer strikes. See the "ic:-4" below. Jan 15 00:22:53 egal kernel: ENOMEM in journal_get_undo_access_Rsmp_6317abf4, retrying. Jan 15 00:22:53 egal kernel: ENOMEM in journal_alloc_journal_head, retrying. Jan 15 00:22:54 egal kernel: OOM Jan 15 00:22:54 egal kernel: Mem-info: Jan 15 00:22:54 egal kernel: Zone:DMA freepages: 2099 min: 0 low: 0 high: 0 Jan 15 00:22:54 egal kernel: Zone:Normal freepages: 320 min: 1279 low: 4544 high: 6304 Jan 15 00:22:54 egal kernel: Zone:HighMem freepages:526920 min: 255 low: 12282 high: 18423 Jan 15 00:22:54 egal kernel: Free pages: 529339 (526920 HighMem) Jan 15 00:22:54 egal kernel: ( Active: 261343/151150, inactive_laundry: 1881, inactive_clean: 2684, free: 529339 ) Jan 15 00:22:54 egal kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:2099 Jan 15 00:22:54 egal kernel: aa:5 ac:100069 id:59756 il:0 ic:-4 fr:320 Jan 15 00:22:54 egal kernel: aa:15713 ac:145556 id:91394 il:1881 ic:2688 fr:526920 Jan 15 00:22:54 egal kernel: 1*4kB 1*8kB 2*16kB 3*32kB 1*64kB 2*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 8396kB) Jan 15 00:22:55 egal kernel: 0*4kB 0*8kB 0*16kB 4*32kB 4*64kB 3*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1280kB) Jan 15 00:22:56 egal kernel: 79722*4kB 66197*8kB 35599*16kB 11665*32kB 2523*64kB 798*128kB 28*256kB 9*512kB 4*1024kB 0*2048kB 9*4096kB = 2107680kB) Jan 15 00:22:57 egal kernel: Swap cache: add 491, delete 335, find 768/953, race 0+0 Jan 15 00:22:57 egal kernel: 25438 pages of slabcache Jan 15 00:22:57 egal kernel: 176 pages of kernel stacks Jan 15 00:22:57 egal kernel: 0 lowmem pagetables, 405 highmem pagetables Jan 15 00:22:57 egal kernel: Free swap: 2095680kB Jan 15 00:22:57 egal kernel: 1015536 pages of RAM Jan 15 00:22:57 egal kernel: 786160 pages of HIGHMEM Jan 15 00:22:57 egal kernel: 16106 reserved pages Jan 15 00:22:57 egal kernel: 127598 pages shared Jan 15 00:22:57 egal kernel: 156 pages swap cached Jan 15 00:22:57 egal kernel: f6549e40 c01590ab c02b8704 004a5b9c 00000000 c015919d 00000759 c01536ed Jan 15 00:22:57 egal kernel: ffffffff 00000000 f6548038 f6548000 00000001 000001f0 00000000 c0153dc1 Jan 15 00:22:57 egal kernel: 000001f0 c03a7608 00000000 c0155b47 c03a7600 00000000 00000000 00000001 Jan 15 00:22:57 egal kernel: Call Trace: [<c01590ab>] oom_kill [kernel] 0x1b (0xf6549e44) Jan 15 00:22:57 egal kernel: [<c015919d>] out_of_memory [kernel] 0x5d (0xf6549e54) Jan 15 00:22:57 egal kernel: [<c01536ed>] do_try_to_free_pages [kernel] 0x13d (0xf6549e5c) Jan 15 00:22:57 egal kernel: [<c0153dc1>] try_to_free_pages [kernel] 0x51 (0xf6549e7c) Jan 15 00:22:59 egal kernel: [<c0155b47>] __alloc_pages [kernel] 0x187 (0xf6549e8c) Jan 15 00:23:00 egal kernel: [<c0220960>] sock_wfree [kernel] 0x0 (0xf6549ea8) Jan 15 00:23:00 egal kernel: [<c0221f59>] __kfree_skb [kernel] 0x139 (0xf6549eac) Jan 15 00:23:01 egal kernel: [<c0155d8c>] __get_free_pages [kernel] 0x1c (0xf6549ed0) Jan 15 00:23:03 egal kernel: [<c0175101>] __pollwait [kernel] 0x31 (0xf6549ed4) Jan 15 00:23:07 egal kernel: [<c024a9a6>] tcp_poll [kernel] 0x36 (0xf6549ee8) Jan 15 00:23:07 egal kernel: [<c021e33c>] sock_poll [kernel] 0x2c (0xf6549f08) Jan 15 00:23:07 egal kernel: [<c01754de>] do_select [kernel] 0x24e (0xf6549f1c) Jan 15 00:23:08 egal kernel: [<c017586e>] sys_select [kernel] 0x34e (0xf6549f60) Jan 15 00:23:09 egal kernel: Jan 15 00:23:10 egal kernel: Out of Memory: Killed process 21152 (nautilus). Jan 15 00:23:10 egal kernel: Out of Memory: Killed process 21165 (nautilus). I think I know why the zone->inactive_clean_pages counter becomes negative. It is this part of reclaim_page() (mm/vmscan.c): /* You might think that we should never ever get here. * But you'd be wrong. * * The VM can grab temporary page references while * scanning pages. If the page gets unlinked while such * a reference is held, we end up here with the * page->count == 1 (the temp ref), but no mapping. */ del_page_from_inactive_clean_list(page); add_page_to_inactive_dirty_list(page); zone->inactive_clean_pages--; del_page_from_inactive_clean_list() decrements the inactive_clean_pages counter, and then it is decremented again two lines below! I am just trying out if this is really the reason for our problem. Confirmed. Whenever the above code is executed, the zone->inactive_clean_pages counter goes negative. I have put a message and a correction there. It appears that the OOM killer doesn' strike anymore. However we need to continue this test to be sure. Anyway I strongly recommend to remove the line zone->inactive_clean_pages--; from vmscan.c. I wonder why it was inserted there in the first place. Just another pick: the signedness of the counters for pages in different LRU lists is not treated consistently. Eg they are "unsigned long" in the zone struct but "int" in functions such as zone_free_limit() (mm_inline.h). Fixed and commited the patch. It will be in the U2 update. Larry Please note that the same bug appears to be in RH9 kernels, too. Bad news: an over-the-weekend test of our test load got stock after about 12h (ping works, but nothing else) with the -6.EL kernel and the "zone->inactive_clean_pages--;" removed. I don't know the reason yet. Next thing we'll try is -9.EL with that patch applied. Regards, Martin reclaim_page() decrements the zone->inactive_clean_pages twice,causing it to go negative. This results in incorrect oom kills. --- linux-2.4.21/mm/vmscan.c.orig 2004-01-16 09:52:10.000000000 -0500 +++ linux-2.4.21/mm/vmscan.c 2004-01-16 09:52:16.000000000 -0500 @@ -209,7 +209,6 @@ */ del_page_from_inactive_clean_list(page); add_page_to_inactive_dirty_list(page); - zone->inactive_clean_pages--; pte_chain_unlock(page); UnlockPage(page); } I fixed this last week so it will be in U2. Here is the patch for that fix. Larry Yes, I know, thanks. When is "U2" going to be released? The problems reported in comment #11 occured with -6.EL with that patch already applied. I wasn't able to reproduce them so far, though. Regards, Martin It would be nice if you could make bug 107562 readable for me. Bug #107562 is an internal-only tracking bug that we use to group certain bugs together. Ok thanks. That means there are no real bugs that depend on this one? Martin Wilck- yes, you are correct, there is not another bug that depends upon this one. FYI, I am Redhat's new partner manager for FSC. Can we get someone at Fujitsu-Siemens to verify that this patch applied to the RHEL3-U1 kernel fixes the OOM kill problems discribed in this bug? The same patch will apply to the 2.4.21-9.EL kernel without any problems. Thank You, Larry Woodman Yes, the patch does fix those problems. We verified this on several machines, running under high load for several days. Sorry, I just realized we didn't communicate that clearly yet. Please forget my comment #11, I have no clue what happened there and I haven't been able to reproduce it. Martin Martin has confirmed with me that the patch as applied to the 2.4.21-9.EL kernel does indeed resolve this problem. Larry's fix will be committed to the RHEL3 U2 patch pool this week. I have put this bug back into the "assigned" state until the fix is committed, at which time I'll change it to "modified" to confirm the commit. We also intend to include Larry's fix in the next available RHEL3 security errata. The availability date for this has not yet been determined, but it is likely to be sometime later this month. Thank you. The fix for this problem was committed to the RHEL3 U2 patch pool last night, and is initially available internally at Red Hat in the Engineering build of kernel version 2.4.21-9.6.EL. In order to honor Fujitsu's request for early availability of this patch, this fix was also committed to the build of kernel version 2.4.21-9.0.1.EL, which will be made available at the end of next week as part of the next Red Hat Security Advisory (RHSA-2004:066). This fix is now available to customers via RHN in today's release of Red Hat Security Advisory RHSA-2004:066. FROM ISSUE TRACKER 29298 Event posted 02-21-2004 07:08pm by glen.foster with duration of 0.00 Issue Tracker 29298 documents the code changes needed for certification tests to run, so I believe I am not the owner of this issue anymore. My "OOM" was a red herring, but clearly there are others experiencing real OOM issues... so this issue should remain open. I realize the above bug is fixed, and I've verified that the above patch is in my U2 kernel, and, my inactive clean pages is not dropping below zero, and I'm not getting any messages logged at all when my nodes hang. Otherwise, I seem to be having the same problem ;) As this is the closest bug to the problem I'm experiencing, I'm guessing I'm doing something wrong... HPC apps tend to hog RAM and CPU cycles. In previous distributions (i.e. 7.3), hogging too much memory leads to swapping and a badly performing HPC app (that doesn't hog lots of CPU while swapping). In RHEL, with the -15smp kernel, if an app starts requesting more memory than is physically available, then, rather than swapping, the node hangs hard (even though there's plenty of swap space to handle these apps going overboard). These are dual XEONs w/ 2GB RAM. Although the node is hung, I can get a Magic SysReq on memory: SysRq : Show Memory Mem-info: Zone:DMA freepages: 3131 min: 0 low: 0 high: 0 Zone:Normal freepages: 320 min: 1279 low: 4544 high: 6304 Zone:HighMem freepages: 254 min: 255 low: 4608 high: 6912 Free pages: 3705 ( 254 HighMem) ( Active: 484476/5763, inactive_laundry: 28, inactive_clean: 0, free: 3705 ) aa:0 ac:0 id:0 il:0 ic:0 fr:3131 aa:198239 ac:3771 id:2684 il:28 ic:0 fr:320 aa:279423 ac:2805 id:3318 il:0 ic:0 fr:254 3*4kB 2*8kB 3*16kB 3*32kB 1*64kB 2*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12524kB) 4*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1280kB) 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1016kB) Swap cache: add 44665, delete 44665, find 1415/10133, race 0+0 8364 pages of slabcache 150 pages of kernel stacks 0 lowmem pagetables, 1220 highmem pagetables Free swap: 0kB 524288 pages of RAM 294912 pages of HIGHMEM 8672 reserved pages 48835 pages shared 0 pages swap cached This is as compared to like-configured node that is not experiencing problems: Mem-info: Zone:DMA freepages: 3131 min: 0 low: 0 high: 0 Zone:Normal freepages: 59724 min: 1279 low: 4544 high: 6304 Zone:HighMem freepages: 4325 min: 255 low: 4608 high: 6912 Free pages: 67180 ( 4325 HighMem) ( Active: 371202/42460, inactive_laundry: 12345, inactive_clean: 468, free: 67180 ) aa:0 ac:0 id:0 il:0 ic:0 fr:3131 aa:133220 ac:5279 id:4586 il:1415 ic:4 fr:59724 aa:228287 ac:4416 id:37874 il:10930 ic:464 fr:4325 3*4kB 2*8kB 3*16kB 3*32kB 1*64kB 2*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12524kB) 64*4kB 4*8kB 129*16kB 84*32kB 34*64kB 14*128kB 18*256kB 14*512kB 7*1024kB 11*2048kB 46*4096kB = 238896kB) 9*4kB 2*8kB 2*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 4*4096kB = 17300kB) Swap cache: add 9229, delete 122, find 355/355, race 0+0 8650 pages of slabcache 150 pages of kernel stacks 0 lowmem pagetables, 1044 highmem pagetables Free swap: 67984kB 524288 pages of RAM 294912 pages of HIGHMEM 8672 reserved pages 34404 pages shared 9107 pages swap cached Can anybody see what I'm doing wrong? Nevermind. It looks like I had an order of magnitude less swap than I thought I had. Hi, We are on RHAS 3.0 2.4.21-9.0.1.ELsmp #1 SMP. The story goes that we are using Oracle and the Oracle Clustered File System (OCFS). I am simply trying to cp file from an ext filesystem to the OCFS and the boxing keeps running out of memory...we have 12GB of RAM and shed loads of swap which is never being used ?? There is nothing else running on the box . Heres the messages file extract Aug 3 12:03:23 dev1 kernel: aacraid:Battery charge is now OK Aug 3 12:14:20 dev1 sshd(pam_unix)[2749]: session closed for user root Aug 3 12:25:25 dev1 sshd(pam_unix)[2943]: session opened for user root by (uid=0) Aug 3 12:46:09 dev1 sshd(pam_unix)[3016]: session opened for user root by (uid=0) Aug 3 12:51:19 dev1 kernel: Out of Memory: Killed process 2620 (httpd). Aug 3 12:56:28 dev1 EV_AGENT[2171]: Agent Main --Estream construct failed. Err: RWSockErr: in RWSocket::getpeername: NOTCONN Aug 3 12:56:28 dev1 EV_AGENT[2171]: Agent Main --Estream construct failed. Err: RWSockErr: in RWSocket::getpeername: NOTCONN Aug 3 12:56:28 dev1 kernel: ENOMEM in journal_get_undo_access_Rsmp_6317abf4, retrying. Aug 3 12:56:28 dev1 kernel: Out of Memory: Killed process 2621 (httpd). Aug 3 12:56:33 dev1 kernel: Out of Memory: Killed process 2622 (httpd). Aug 3 12:56:38 dev1 kernel: Out of Memory: Killed process 2623 (httpd). Aug 3 12:56:43 dev1 kernel: Out of Memory: Killed process 2624 (httpd). Aug 3 12:56:48 dev1 kernel: Out of Memory: Killed process 2625 (httpd). Aug 3 12:56:53 dev1 kernel: Out of Memory: Killed process 2626 (httpd). Aug 3 12:56:58 dev1 kernel: Out of Memory: Killed process 2627 (httpd). Aug 3 12:57:03 dev1 kernel: Out of Memory: Killed process 2561 (sendmail). Aug 3 12:57:08 dev1 kernel: Out of Memory: Killed process 2679 (xfs). Aug 3 13:02:29 dev1 kernel: Out of Memory: Killed process 3076 (httpd). Aug 3 13:02:29 dev1 EV_AGENT[2171]: Agent Main --Estream construct failed. Err: RWSockErr: in RWSocket::getpeername: NOTCONN Aug 3 13:02:29 dev1 kernel: ENOMEM in journal_get_undo_access_Rsmp_6317abf4, retrying. Aug 3 13:02:30 dev1 sshd(pam_unix)[2943]: session closed for user root Aug 3 13:02:34 dev1 kernel: Out of Memory: Killed process 3077 (httpd). Aug 3 13:02:39 dev1 kernel: Out of Memory: Killed process 3078 (httpd). Aug 3 13:02:44 dev1 kernel: Out of Memory: Killed process 3079 (httpd). Aug 3 13:02:49 dev1 kernel: Out of Memory: Killed process 1895 (rpc.statd). Aug 3 13:02:54 dev1 kernel: Out of Memory: Killed process 1876 (portmap). Aug 3 13:02:59 dev1 kernel: Out of Memory: Killed process 2581 (httpd). Aug 3 13:03:04 dev1 kernel: Out of Memory: Killed process 2294 (ntpd). Aug 3 13:03:09 dev1 kernel: Out of Memory: Killed process 2731 (mdmpd). Aug 3 13:03:09 dev1 kernel: Out of Memory: Killed process 2732 (mdmpd). Aug 3 13:03:14 dev1 kernel: Out of Memory: Killed process 2171 (naviagent). Aug 3 13:03:14 dev1 EV_AGENT[2171]: Signal TERM received Aug 3 13:03:14 dev1 EV_AGENT[2171]: Agent is going down Aug 3 13:45:40 dev1 syslogd 1.4.1: restart. Aug 3 13:45:40 dev1 syslog: syslogd startup succeeded Update 2 has been applied to this box. David, the 2.4.21-9 kernel you are running is quite old. Since that kernel was distributed, we have made several changes to eliminate or at lease reduce the occurance or OOM kills. In addition, we have added more debugging code the the OOM kill path so that memory statistics are written out to the messages file when an OOK kill occurs. Please try out the lastest RHEL3-U3 kernel and see if that helps these problems. Its located here: http://people.redhat.com/~lwoodman/.RHEL3/ Larry Woodman Thanks Larry. Doing this today will let you know how it goes. Dave. The fix for the problem (as notated in comment #29) first became available in advisory RHSA-2004:066, so I'm closing this bug. This level of the kernel is too high for our QLA cards (ie. EMC not recognised at this level) so we have had to go back to 2.4.21.15.0.4.ELsmp. As this fix is not included in this kernel - could you advise what we should do now ? OK forget this. We are going with this level as this is the RedHat supported version. Thanks, even 2.4.21-27EL-smp kernel i found out the same problem while running oracle. oracle bk processes would be killed automatically. how can we expect something fixed for this bug shortly. |