767127 – swapper: page allocation failure. order:1, mode:0x20

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 767127 - swapper: page allocation failure. order:1, mode:0x20

Summary: swapper: page allocation failure. order:1, mode:0x20

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Jerome Marchand
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	846704
TreeView+	depends on / blocked

Reported:	2011-12-13 10:30 UTC by Andre ten Bohmer
Modified:	2018-12-01 18:27 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-05-20 09:12:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Kernel ring output this morning (64.70 KB, text/plain) 2011-12-13 10:30 UTC, Andre ten Bohmer	no flags	Details
iostat output (46.52 KB, text/plain) 2011-12-13 10:33 UTC, Andre ten Bohmer	no flags	Details
sister server : Pid: 0, comm: swapper Not tainted 2.6.32-220.el6.x86_64 #1 (14.74 KB, text/plain) 2011-12-20 12:06 UTC, Andre ten Bohmer	no flags	Details
Kernel ring output this morning (14.89 KB, text/plain) 2011-12-22 11:16 UTC, Andre ten Bohmer	no flags	Details
2011-01-24 OOM crash (42.82 KB, image/png) 2012-01-24 09:03 UTC, Andre ten Bohmer	no flags	Details
View All

Description Andre ten Bohmer 2011-12-13 10:30:22 UTC

Created attachment 546142 [details]
Kernel ring output this morning

Description of problem:
"swapper: page allocation failure. order:1, mode:0x20" kernel ring messages during heavy io load


Version-Release number of selected component (if applicable):


How reproducible:
Stress this nfs server with lot's off IO from the HPC cluster

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
HP ProLiant BL460c G6
4G memory
1x Intel(R) Xeon(R) CPU X5550 @ 2.67GHz
System disk : BootFromSan 50G (HP EVA 8400), LVM2, ext4 partitions
Data disk : 1) 46 TB HP MDS 40 RAID6 LUNS sriped via lvm2
                $ lvcreate -i 40 -I 256  -n Ldata -l 11919320 Vdata
                $ mkfs.xfs -d su=256k,sw=40 /dev/Vdata/Ldata
            2) 6 TB HP EVA LUN, xfs filesystem

MDS /dev/mapper/Vdata-Ldata on /srv/nfs02 type xfs
(rw,nosuid,nodev,noatime,nodiratime,nobarrier,largeio)
EVA /dev/mapper/mpathap on /srv/nfs03 type xfs
(rw,nosuid,nodev,noatime,nodiratime,nobarrier,largeio)


Red Hat Enterpris2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011
x86_64 x86_64 x86_64 GNU/Linuxe Linux Server release 6.2 (Santiago)
NFS exports to serve as HPC data server

Comment 1 Andre ten Bohmer 2011-12-13 10:33:35 UTC

Created attachment 546143 [details]
iostat output

This iostat shows 40 cciss lun's which are connected together via a striped lvm logical volume into a 46TB xfs file system. This load is not even so extreme, we've seen iostat values r/s and w/s > 2000 and w/sec and r/sev > 500,000

Comment 3 Jerome Marchand 2011-12-14 13:38:19 UTC

There apparently are a lot of high priority allocation made and they deplete the available memory. You can try to increase the value /proc/sys/vm/min_free_kbytes. It may have the side effect of increasing swapping.

Comment 4 Andre ten Bohmer 2011-12-14 13:55:47 UTC

Thanks!
This is the current setting:
]# sysctl -a | grep vm.min_free_kbytes
vm.min_free_kbytes = 67584

I've found some discussion about increasing this value, but it was all about values way below 67584, so do you have an idea about how much a should increase this value?

Maybe a side step, but could it also be related to the nic driver maby and scatter/gather to be specific? We use Redhat bnx2x drivers,  Broadcom Corporation NetXtreme II BCM57711E 10-Gigabit PCIe

Comment 5 Andre ten Bohmer 2011-12-17 14:09:16 UTC

I've raised vm.min_free_kbytes from default 67584 to 135168. Now I'm seeing these messages in the kernel ring:

XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)

Getting closer to the real problem/bug/... ?
Thanks!

Comment 6 Andre ten Bohmer 2011-12-20 12:06:23 UTC

Created attachment 548821 [details]
sister server : Pid: 0, comm: swapper Not tainted 2.6.32-220.el6.x86_64 #1

Comment 7 Andre ten Bohmer 2011-12-20 12:10:42 UTC

Yesterday the server crashed again under not so heavy IO load. Because this HP blade server is connected to HP Virtual Connect, it has eight NIC's bond together on Linux. Only two nic's are in use, so in hope to preserve memory I've disabled the other six nic's (ifconfig eth2 down etc).
As seen in the 'sister server' kernel ring attachment (comment 6), bnx2x_start_xmit is logged quite often, so maybe it's a bnx2x driver issue? TIA

Comment 8 Andre ten Bohmer 2011-12-22 11:16:02 UTC

Created attachment 549172 [details]
Kernel ring output this morning

Dec 22 03:19:39 scomp1110 kernel: swapper: page allocation failure. order:1, mode:0x20
Dec 22 03:19:39 scomp1110 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.el6.x86_64 #1
Dec 22 03:19:39 scomp1110 kernel: Call Trace:
Dec 22 03:19:39 scomp1110 kernel: <IRQ>  [<ffffffff81123f0f>] ? __alloc_pages_nodemask+0x77f/0x940
Dec 22 03:19:39 scomp1110 kernel: [<ffffffff8142c700>] ? dev_hard_start_xmit+0x1e0/0x3f0

Comment 9 Andre ten Bohmer 2011-12-24 12:54:07 UTC

After reboot with latest kernel release 2.6.32-220.2.1.el6.x86_64:

------------[ cut here ]------------
WARNING: at kernel/sched.c:5914 thread_return+0x232/0x79d() (Not tainted)
Hardware name: ProLiant BL460c G6
Modules linked in: bridge mptctl mptbase ipmi_devintf nfsd autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc xt_NOTRACK iptable_raw ipt_LOG xt_multiport xt_limit ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables 8021q garp stp llc bonding ipv6 xfs exportfs ext2 power_meter hpilo ipmi_si ipmi_msghandler hpwdt sg bnx2x libcrc32c mdio microcode serio_raw iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 dm_round_robin sd_mod crc_t10dif hpsa(U) cciss(U) qla2xxx scsi_transport_fc scsi_tgt radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 2045, comm: bnx2x Not tainted 2.6.32-220.2.1.el6.x86_64 #1
Call Trace:
 [<ffffffff81069997>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff810699ea>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff814eccc5>] ? thread_return+0x232/0x79d
 [<ffffffffa036dd20>] ? bnx2x_sp_task+0x0/0x1b10 [bnx2x]
 [<ffffffff8108b15c>] ? worker_thread+0x1fc/0x2a0
 [<ffffffff81090a10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108af60>] ? worker_thread+0x0/0x2a0
 [<ffffffff810906a6>] ? kthread+0x96/0xa0
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff81090610>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
---[ end trace f1bf66b476766cc6 ]---
cciss 0000:09:00.0: vpd r/w failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update.

Comment 10 Madison Kelly 2011-12-25 21:29:20 UTC

I'm now see the same oops as comment #9, but I don't run NFS. I do have a 2-node cman cluster running GFS2 on DRBD 8.3.12.

Dec 25 15:40:12 an-node01 kernel: ------------[ cut here ]------------
Dec 25 15:40:12 an-node01 kernel: WARNING: at kernel/sched.c:5914 thread_return+0x232/0x79d() (Tainted: G        W  ----------------  )
Dec 25 15:40:12 an-node01 kernel: Hardware name: empty
Dec 25 15:40:12 an-node01 kernel: Modules linked in: gfs2 iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 drbd(U) dlm configfs ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bridge stp llc bonding ipv6 vhost_net macvtap macvlan tun kvm_intel kvm microcode shpchp i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Dec 25 15:40:12 an-node01 kernel: Pid: 1343, comm: bond1 Tainted: G        W  ----------------   2.6.32-220.2.1.el6.x86_64 #1
Dec 25 15:40:12 an-node01 kernel: Call Trace:
Dec 25 15:40:12 an-node01 kernel: [<ffffffff81069997>] ? warn_slowpath_common+0x87/0xc0
Dec 25 15:40:12 an-node01 kernel: [<ffffffff810699ea>] ? warn_slowpath_null+0x1a/0x20
Dec 25 15:40:12 an-node01 kernel: [<ffffffff814eccc5>] ? thread_return+0x232/0x79d
Dec 25 15:40:12 an-node01 kernel: [<ffffffff8107d068>] ? add_timer+0x18/0x30
Dec 25 15:40:12 an-node01 kernel: [<ffffffff8108be79>] ? queue_delayed_work_on+0xb9/0x120
Dec 25 15:40:12 an-node01 kernel: [<ffffffffa0269650>] ? bond_mii_monitor+0x0/0x610 [bonding]
Dec 25 15:40:12 an-node01 kernel: [<ffffffff8108b15c>] ? worker_thread+0x1fc/0x2a0
Dec 25 15:40:12 an-node01 kernel: [<ffffffff81090a10>] ? autoremove_wake_function+0x0/0x40
Dec 25 15:40:12 an-node01 kernel: [<ffffffff8108af60>] ? worker_thread+0x0/0x2a0
Dec 25 15:40:12 an-node01 kernel: [<ffffffff810906a6>] ? kthread+0x96/0xa0
Dec 25 15:40:12 an-node01 kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
Dec 25 15:40:12 an-node01 kernel: [<ffffffff81090610>] ? kthread+0x0/0xa0
Dec 25 15:40:12 an-node01 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Dec 25 15:40:12 an-node01 kernel: ---[ end trace 705d5c1db0fb1e00 ]---

The nodes are also Intel Xeons, but they are the much more modest E3-1220. The mainboard is a Tyan S5510 with 8GB of DDR3 ECC memory.

Comment 11 Moritz Baumann 2012-01-05 08:42:41 UTC

We get this (from dmesg) as well:

swapper: page allocation failure. order:1, mode:0x20
Pid: 0, comm: swapper Tainted: G        W  ----------------   2.6.32-220.2.1.el6.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81123d2f>] ? __alloc_pages_nodemask+0x77f/0x940
 [<ffffffff8115dbe2>] ? kmem_getpages+0x62/0x170
 [<ffffffff8115e7fa>] ? fallback_alloc+0x1ba/0x270
 [<ffffffff8115e24f>] ? cache_grow+0x2cf/0x320
 [<ffffffff8115e579>] ? ____cache_alloc_node+0x99/0x160
 [<ffffffff8115f35b>] ? kmem_cache_alloc+0x11b/0x190
 [<ffffffff8141f5b8>] ? sk_prot_alloc+0x48/0x1c0
 [<ffffffff8141f842>] ? sk_clone+0x22/0x2e0
 [<ffffffff8146cab6>] ? inet_csk_clone+0x16/0xd0
 [<ffffffff81485983>] ? tcp_create_openreq_child+0x23/0x450
 [<ffffffff8148336d>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
 [<ffffffff81485741>] ? tcp_check_req+0x201/0x420
 [<ffffffff81482d8b>] ? tcp_v4_do_rcv+0x35b/0x430
 [<ffffffffa045b557>] ? ipv4_confirm+0x87/0x1d0 [nf_conntrack_ipv4]
 [<ffffffff81484501>] ? tcp_v4_rcv+0x4e1/0x860
 [<ffffffff814621b0>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8146228d>] ? ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81462518>] ? ip_local_deliver+0x98/0xa0
 [<ffffffff814619dd>] ? ip_rcv_finish+0x12d/0x440
 [<ffffffff81461f65>] ? ip_rcv+0x275/0x350
 [<ffffffff8142bf6b>] ? __netif_receive_skb+0x49b/0x6e0
 [<ffffffff8142e018>] ? netif_receive_skb+0x58/0x60
 [<ffffffff8142e120>] ? napi_skb_finish+0x50/0x70
 [<ffffffff814307a9>] ? napi_gro_receive+0x39/0x50
 [<ffffffffa01e2e71>] ? ixgbe_clean_rx_irq+0x531/0x8b0 [ixgbe]
 [<ffffffffa01e35ff>] ? ixgbe_clean_rxtx_many+0x10f/0x220 [ixgbe]
 [<ffffffff814308c3>] ? net_rx_action+0x103/0x2f0
 [<ffffffff81071f81>] ? __do_softirq+0xc1/0x1d0
 [<ffffffff810d9310>] ? handle_IRQ_event+0x60/0x170
 [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
 [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
 [<ffffffff81071d65>] ? irq_exit+0x85/0x90
 [<ffffffff814f4dd5>] ? do_IRQ+0x75/0xf0
 [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff812c4ade>] ? intel_idle+0xde/0x170
 [<ffffffff812c4ac1>] ? intel_idle+0xc1/0x170
 [<ffffffff813f9ff7>] ? cpuidle_idle_call+0xa7/0x140
 [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
 [<ffffffff814e5fbb>] ? start_secondary+0x202/0x245
swapper: page allocation failure. order:1, mode:0x20
Pid: 0, comm: swapper Tainted: G        W  ----------------   2.6.32-220.2.1.el6.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81123d2f>] ? __alloc_pages_nodemask+0x77f/0x940
 [<ffffffff8115dbe2>] ? kmem_getpages+0x62/0x170
 [<ffffffff8115e7fa>] ? fallback_alloc+0x1ba/0x270
 [<ffffffff8115e24f>] ? cache_grow+0x2cf/0x320
 [<ffffffff8115e579>] ? ____cache_alloc_node+0x99/0x160
 [<ffffffff8115f35b>] ? kmem_cache_alloc+0x11b/0x190
 [<ffffffff8141f5b8>] ? sk_prot_alloc+0x48/0x1c0
 [<ffffffff8141f842>] ? sk_clone+0x22/0x2e0
 [<ffffffff8146cab6>] ? inet_csk_clone+0x16/0xd0
 [<ffffffff81485983>] ? tcp_create_openreq_child+0x23/0x450
 [<ffffffff8148336d>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
 [<ffffffff81485741>] ? tcp_check_req+0x201/0x420
 [<ffffffff81482d8b>] ? tcp_v4_do_rcv+0x35b/0x430
 [<ffffffffa045b557>] ? ipv4_confirm+0x87/0x1d0 [nf_conntrack_ipv4]
 [<ffffffff81484501>] ? tcp_v4_rcv+0x4e1/0x860
 [<ffffffff814621b0>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8146228d>] ? ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81462518>] ? ip_local_deliver+0x98/0xa0
 [<ffffffff814619dd>] ? ip_rcv_finish+0x12d/0x440
 [<ffffffff81461f65>] ? ip_rcv+0x275/0x350
 [<ffffffff8142bf6b>] ? __netif_receive_skb+0x49b/0x6e0
 [<ffffffff8142e018>] ? netif_receive_skb+0x58/0x60
 [<ffffffff8142e120>] ? napi_skb_finish+0x50/0x70
 [<ffffffff814307a9>] ? napi_gro_receive+0x39/0x50
 [<ffffffffa01e2e71>] ? ixgbe_clean_rx_irq+0x531/0x8b0 [ixgbe]
 [<ffffffffa01e35ff>] ? ixgbe_clean_rxtx_many+0x10f/0x220 [ixgbe]
 [<ffffffff814308c3>] ? net_rx_action+0x103/0x2f0
 [<ffffffff81071f81>] ? __do_softirq+0xc1/0x1d0
 [<ffffffff810d9310>] ? handle_IRQ_event+0x60/0x170
 [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
 [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
 [<ffffffff81071d65>] ? irq_exit+0x85/0x90
 [<ffffffff814f4dd5>] ? do_IRQ+0x75/0xf0
 [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff812c4ade>] ? intel_idle+0xde/0x170
 [<ffffffff812c4ac1>] ? intel_idle+0xc1/0x170
 [<ffffffff813f9ff7>] ? cpuidle_idle_call+0xa7/0x140
 [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
 [<ffffffff814e5fbb>] ? start_secondary+0x202/0x245

We also have a 10Gbe and use XFS.
The error occurred during rsync through ssh onto the xfs filesystem. The node has 24GB ram.

Comment 12 Moritz Baumann 2012-01-23 15:42:57 UTC

FYI:
slabtop showed 

330807 330807 100%    1.02K 110269  	  3    441076K nfs_inode_cache
359460 359460 100%    1.00K  89865	  4    359460K xfs_inode

ever growing and never being flushed.

Setting
sysctl vm.zone_reclaim_mode=1

fixed this. This used to be the default up to RHEL 6.1 (and for 5./ as well).
No idea why this switched, or why having this value at 1 (which some googling indicates is bad) fixed the issue with slab memory.

Fast inode growing in our setup occured because we rsynced millions of files over nfs in a dirvish like way.

Comment 13 Andre ten Bohmer 2012-01-24 09:01:49 UTC

Today server crashed again and indeed it seems a memory problem en the lower memory region, see attached ILO screen shot 2011-01-24

$ sar -r
12:00:02 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
07:10:01 AM   1004068   2905432     74.32   1190296    936964    451272      5.71
07:20:01 AM   1118908   2790592     71.38   1190804    817912    476800      6.03
07:30:01 AM   1110092   2799408     71.61   1191608    829932    453084      5.73
07:40:01 AM   1212780   2696720     68.98   1192316    727104    451396      5.71
Average:      1196440   2713060     69.40    774475   1223061    456965      5.78

08:37:56 AM       LINUX RESTART

08:40:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
08:50:01 AM    821604   3087896     78.98      2232   2526304    430820      5.45
09:00:02 AM    790396   3119104     79.78      2380   2569040    430924      5.45

$ tail /etc/sysctl.conf
# raised def value in resonse to bug report
# https://bugzilla.redhat.com /show_bug.cgi?id=767127
# default 67584 KB
vm.min_free_kbytes = 524288
# https://bugzilla.redhat.com/show_bug.cgi?id=767127
vm.zone_reclaim_mode=1

Comment 14 Andre ten Bohmer 2012-01-24 09:03:33 UTC

Created attachment 557168 [details]
2011-01-24 OOM crash

Comment 16 Orion Poplawski 2012-03-28 15:37:17 UTC

Me too.  No crashes though.  Will try setting zone_reclaim_mode=1.

Comment 17 Jerome Marchand 2012-04-25 12:27:32 UTC

(In reply to comment #14)
> Created attachment 557168 [details]
> 2011-01-24 OOM crash

Hi Andre,

Can you run slabtop when you get the warnings?

Comment 18 Jerome Marchand 2012-04-25 12:35:55 UTC

(In reply to comment #11)
> We get this (from dmesg) as well
> [...]

Does it crashes too? These kind of warning from network drivers are not necessary the symptom of something bad happening. Net drivers use GFP_ATOMIC a lot and know how to handle gracefully a failed allocation (by dropping a packet or something).
There has been some discussion in the past about dropping these warnings. Apparently, nothing has been done.

Comment 19 Moritz Baumann 2012-04-25 12:52:40 UTC

Hi Jerome,

For some time (and since we still use zone_reclaim_mode=1) we get many proc hangs for > 120 secs but system remains stable. 
But I have a seperate case open with redhat for this. So no I am unable to reproduce this for the last month and it might be that dmesg is completely unrelated to Andres issue.

Comment 20 Moritz Baumann 2012-04-25 12:57:40 UTC

maybe this is related to 

https://bugzilla.redhat.com/show_bug.cgi?id=770545

?

Comment 21 Andre ten Bohmer 2012-04-25 13:38:54 UTC

Hi Jerome,
We've increased system ram from a very low 4GB up to 16GB, I've disabled a
daily xfs defrag job and ever since it's running stable
(2.6.32-220.7.1.el6.x86_64) without crashing or even reporting a swapper
message.

2012-01-11 01:30:38 0 4 scomp1110  swapper: page allocation failure. order:1,
mode:0x20 kernel:
2012-01-11 01:30:38 0 4 scomp1110  Pid: 0, comm: swapper Tainted: G        W 
----------------   2.6.32-220.2.1.el6.x86_64 #1 kernel:

Comment 22 RHEL Program Management 2012-05-03 05:29:31 UTC

Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 23 Orion Poplawski 2012-05-16 16:29:31 UTC

# sysctl vm.zone_reclaim_mode=1
error: "vm.zone_reclaim_mode" is an unknown key

Seems to be not present in 32-bit kernel.

swapper: page allocation failure. order:0, mode:0x20
Pid: 0, comm: swapper Not tainted 2.6.32-220.13.1.el6.i686 #1
Call Trace:
 [<c04f0ddc>] ? __alloc_pages_nodemask+0x6bc/0x870
 [<c051d09c>] ? cache_alloc_refill+0x2bc/0x510
 [<c051d432>] ? __kmalloc+0x142/0x180
 [<c07b55ff>] ? ip_local_deliver_finish+0x9f/0x260
 [<c078003f>] ? __alloc_skb+0x4f/0x140
 [<c078003f>] ? __alloc_skb+0x4f/0x140
 [<c078017b>] ? __netdev_alloc_skb+0x1b/0x40
 [<f8ceb06a>] ? e1000_alloc_rx_buffers+0x28a/0x480 [e1000]
 [<f8ce979a>] ? e1000_clean_rx_irq+0x2fa/0x4b0 [e1000]
 [<f8ce7984>] ? e1000_clean+0x194/0x530 [e1000]
 [<c0465975>] ? run_timer_softirq+0x35/0x2c0
 [<c0831715>] ? apic_timer_interrupt+0x31/0x38
 [<c078ccbe>] ? net_rx_action+0xde/0x280
 [<c045c16a>] ? __do_softirq+0x8a/0x1a0
 [<c042a5cf>] ? ack_apic_level+0x5f/0x1f0
 [<c04b6555>] ? handle_fasteoi_irq+0x85/0xc0
 [<c045c2bd>] ? do_softirq+0x3d/0x50
 [<c045c415>] ? irq_exit+0x65/0x70
 [<c040b110>] ? do_IRQ+0x50/0xc0
 [<c0428793>] ? smp_apic_timer_interrupt+0x53/0x90
 [<c0409ff0>] ? common_interrupt+0x30/0x38
 [<c0431742>] ? native_safe_halt+0x2/0x10
 [<c04110dd>] ? default_idle+0x4d/0xc0
 [<c0408964>] ? cpu_idle+0x94/0xd0
 [<c082aa50>] ? start_secondary+0x20d/0x252
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd:  84
CPU    1: hi:  186, btch:  31 usd:  80
CPU    2: hi:  186, btch:  31 usd: 124
CPU    3: hi:  186, btch:  31 usd: 166
HighMem per-cpu:
CPU    0: hi:  186, btch:  31 usd: 134
CPU    1: hi:  186, btch:  31 usd: 177
CPU    2: hi:  186, btch:  31 usd: 155
CPU    3: hi:  186, btch:  31 usd:  35
active_anon:18889 inactive_anon:2706 isolated_anon:0
 active_file:153064 inactive_file:2676003 isolated_file:0
 unevictable:0 dirty:7802 writeback:1 unstable:0
 free:121839 slab_reclaimable:62957 slab_unreclaimable:21028
 mapped:4551 shmem:304 pagetables:596 bounce:2
DMA free:3480kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:84kB inactive_file:3108kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15788kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:636kB slab_unreclaimable:592kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:8kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 863 12174 12174
Normal free:1388kB min:3724kB low:4652kB high:5584kB active_anon:0kB inactive_anon:0kB active_file:94520kB inactive_file:123252kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:883912kB mlocked:0kB dirty:296kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:251192kB slab_unreclaimable:83520kB kernel_stack:1960kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? no
lowmem_reserve[]: 0 0 90490 90490
HighMem free:482488kB min:512kB low:12716kB high:24920kB active_anon:75556kB inactive_anon:10824kB active_file:517652kB inactive_file:10577456kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:11582836kB mlocked:0kB dirty:30912kB writeback:4kB mapped:18200kB shmem:1216kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:2384kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 2*4kB 2*8kB 0*16kB 0*32kB 2*64kB 0*128kB 3*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 3480kB
Normal: 1*4kB 1*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1388kB
HighMem: 148*4kB 52*8kB 28*16kB 14*32kB 9*64kB 8*128kB 8*256kB 8*512kB 10*1024kB 8*2048kB 109*4096kB = 482736kB
2828818 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 14450680kB
Total swap = 14450680kB
3178480 pages RAM
2951682 pages HighMem
93675 pages reserved
211271 pages shared
2748117 pages non-shared

Comment 25 RHEL Program Management 2012-12-14 08:28:48 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 26 Jerome Marchand 2013-05-20 09:12:16 UTC

(In reply to Jerome Marchand from comment #18)
> (In reply to comment #11)
> > We get this (from dmesg) as well
> > [...]
> 
> Does it crashes too? These kind of warning from network drivers are not
> necessary the symptom of something bad happening. Net drivers use GFP_ATOMIC
> a lot and know how to handle gracefully a failed allocation (by dropping a
> packet or something).
> There has been some discussion in the past about dropping these warnings.
> Apparently, nothing has been done.

By all apparences, this is just a GFP_ATOMIC allocation that failed which isn't a bug.

Note You need to log in before you can comment on or make changes to this bug.