Bug 633825
Summary: | kswapd0 100% | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Leslie <lphartm> | ||||||||||||
Component: | kernel | Assignee: | Johannes Weiner <jweiner> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Caspar Zhang <czhang> | ||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||
Priority: | high | ||||||||||||||
Version: | 6.1 | CC: | dhoward, esandeen, ian.chard, jweiner, jwest, lwang, lwoodman, mzywusko, qcai, riel | ||||||||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | kernel-2.6.32-112.el6 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2011-05-19 12:19:12 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 694186 | ||||||||||||||
Attachments: |
|
Description
Leslie
2010-09-14 13:46:52 UTC
sysrq-t may give you a backtrace of kswapd to see where it's at. (In reply to comment #2) > sysrq-t may give you a backtrace of kswapd to see where it's at. what is the key sequence for sysrq-t? Using sysrq is described in the kernel docs, i.e. /usr/share/doc/kernel-doc-2.6.32/Documentation/sysrq.txt from the kernel-doc rpm. Depends on if you're on the physical console, etc. Output will go to dmesg and/or/var/log/messages. Thanks, -Eric Created attachment 447756 [details]
/var/log/message output
output from sysrq t
So, kswapd is here: Sep 16 07:05:51 zebra kernel: kswapd0 R running task 0 34 2 0x00000000 ... Sep 16 07:05:51 zebra kernel: Call Trace: Sep 16 07:05:51 zebra kernel: [<ffffffff810668ea>] __cond_resched+0x2a/0x40 Sep 16 07:05:51 zebra kernel: [<ffffffff814d8800>] _cond_resched+0x30/0x40 Sep 16 07:05:51 zebra kernel: [<ffffffff81124d05>] balance_pgdat+0x335/0x760 Sep 16 07:05:51 zebra kernel: [<ffffffff811253f0>] ? isolate_pages_global+0x0/0x250 Sep 16 07:05:51 zebra kernel: [<ffffffff8112524e>] kswapd+0x11e/0x2c0 Sep 16 07:05:51 zebra kernel: [<ffffffff81090d50>] ? autoremove_wake_function+0x0/0x40 Sep 16 07:05:51 zebra kernel: [<ffffffff81125130>] ? kswapd+0x0/0x2c0 Sep 16 07:05:51 zebra kernel: [<ffffffff810909e6>] kthread+0x96/0xa0 Sep 16 07:05:51 zebra kernel: [<ffffffff810141ca>] child_rip+0xa/0x20 Sep 16 07:05:51 zebra kernel: [<ffffffff81090950>] ? kthread+0x0/0xa0 If we're in cond_resched, we're here: out: if (!all_zones_ok) { cond_resched(); ... goto loop_again; } so looks like we're never getting out of this function. I'm no expert here, I'll pass this off to someone who is :) -Eric Leslie, would you have output of /proc/zoneinfo so we can see if any of the memory zones really are low on memory? Also, are you running any programs on the system that could be using the memory? Created attachment 448008 [details]
/proc/zoneinfo
(In reply to comment #7) > Leslie, > > would you have output of /proc/zoneinfo so we can see if any of the memory > zones really are low on memory? > > Also, are you running any programs on the system that could be using the > memory? Rik: I assigned 3 Gig to the VM and have not run anything but a browser. Top is reporting over 2 Gigs free. Mem: 2914984k total, 790280k used, 2124704k free, 183660k buffers Swap: 5144568k total, 0k used, 5144568k free, 188704k cached Leslie Looks like we found the problem. This virtual machine has a tiny (24MB) ZONE_NORMAL, which has been pretty much completely filled up with unreclaimable slab pages. As a consequence, kswapd tries to free pages from this zone, but is not succeeding. Node 0, zone Normal pages free 0 min 131 low 163 high 196 scanned 0 spanned 6144 present 6060 nr_free_pages 0 nr_inactive_anon 0 nr_active_anon 0 nr_inactive_file 0 nr_active_file 0 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 0 nr_file_pages 0 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 6136 As a test, could you decrease the amount of memory the virtual machine has by 50MB and see if the issue still happens? Also, could you attach the full output of "dmesg", so we can see the memory layout in the virtual machine? Created attachment 448023 [details]
/var/log/messages
(In reply to comment #10) > Looks like we found the problem. This virtual machine has a tiny (24MB) > ZONE_NORMAL, which has been pretty much completely filled up with unreclaimable > slab pages. > > As a consequence, kswapd tries to free pages from this zone, but is not > succeeding. > > Node 0, zone Normal > pages free 0 > min 131 > low 163 > high 196 > scanned 0 > spanned 6144 > present 6060 > nr_free_pages 0 > nr_inactive_anon 0 > nr_active_anon 0 > nr_inactive_file 0 > nr_active_file 0 > nr_unevictable 0 > nr_mlock 0 > nr_anon_pages 0 > nr_mapped 0 > nr_file_pages 0 > nr_dirty 0 > nr_writeback 0 > nr_slab_reclaimable 0 > nr_slab_unreclaimable 6136 > > As a test, could you decrease the amount of memory the virtual machine has by > 50MB and see if the issue still happens? > > Also, could you attach the full output of "dmesg", so we can see the memory > layout in the virtual machine? Rik: Sure enough, I reduced the memory down by 50 megs and kswapd stoped running at 100%. Top: Tasks: 174 total, 3 running, 171 sleeping, 0 stopped, 0 zombie Cpu(s): 5.7%us, 2.8%sy, 0.0%ni, 91.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2933932k total, 730124k used, 2203808k free, 36252k buffers Swap: 5144568k total, 0k used, 5144568k free, 268932k cached Created attachment 449028 [details]
[patch] mm: skip rebalance on hopeless zone
Leslie,
could you test this patch and tell us if it fixes the problem?
@@ -2320,7 +2338,7 @@ void wakeup_kswapd(struct zone *zone, int order) return; pgdat = zone->zone_pgdat; - if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0)) + if (zone_needs_scan(zone, order, low_wmark_pages(zone), 0)) return; if (pgdat->kswapd_max_order < order) pgdat->kswapd_max_order = order; Johannes, shouldn't the above be "if (!zone_needs_scan(zone, ...." ? Created attachment 449070 [details]
[patch v2] mm: skip rebalance on hopeless zone
D'oh, you are right, Rik, thanks for spotting it. Rebasing manually was a bad idea. Here is the revision.
Johannes: How do I apply the patch? Thanks. Leslie Hartman (In reply to comment #15) > Created attachment 449070 [details] > [patch v2] mm: skip rebalance on hopeless zone > > D'oh, you are right, Rik, thanks for spotting it. Rebasing manually was a bad > idea. Here is the revision. (In reply to comment #16) > Johannes: > > How do I apply the patch? I prebuilt a kernel for you, please find it at http://people.redhat.com/~jweiner/bz633825/ . `rpm -i kernel*.rpm' should install it in parallel to the old kernel and also set the bootloader to choose this kernel per default. I applied the patch and tried a number of different memory settings. It worked correctly every time. I noticed that when vmware now uses multiple's of 4 in there latest version, so I was not even able to get the exact amount of memory I had selected the first time. Any way, I think you have resolved the problem. Thank you for assistance. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Patch(es) available on kernel-2.6.32-112.el6 Re-tested for many times, failed to reproduce the problem. confirmed the patch is included in 131.0.9.el6. Mark SanityOnly. Leslie, can you help to verify? An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html |