Bug 761442

Summary: swapper: page allocation failure. order:2, mode:0x20
Product: Red Hat Enterprise Linux 6 Reporter: Andre ten Bohmer <andre.tenbohmer>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: aquini, baumanmo, cww, lwang, mishu, nitinics, orion
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-04 19:05:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1359574    
Attachments:
Description Flags
after server boot and low IO load
none
dmesg output when under IO stress
none
nfsd: page allocation failure. order:3, mode:0x20
none
Starting udev: multipath invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=-17, oom_score_adj=-1000 none

Description Andre ten Bohmer 2011-12-08 10:53:39 UTC
Created attachment 542478 [details]
after server boot and low IO load

Description of problem:
NFS data server RH 6.2 x64 shows new messages in kernel ring:
swapper: page allocation failure. order:2, mode:0x20

Version-Release number of selected component (if applicable):


How reproducible:
Put some strain on the NFS file system exported

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
This server was sometimes unstable with RH5.7:
serevr login: INFO: task xfsdatad/2:3426 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xfsdatad/2    D ffffffff80154db9     0  3426     71          3427  3425 (L-TLB)
 ffff81011b1f1dc0 0000000000000046 0000000000000000 0000000000000000
 0000000000000100 000000000000000a ffff81011d0d77a0 ffff81011ff24080
 000000f44d72caa3 000000000000071c ffff81011d0d7988 0000000200000000
Call Trace:
 [<ffffffff885d1d16>] :xfs:xfs_end_bio_delalloc+0x0/0x12
 [<ffffffff800645e3>] __down_write_nested+0x7a/0x92
 [<ffffffff885d1ca4>] :xfs:xfs_setfilesize+0x2d/0x8d
 [<ffffffff885d1d1f>] :xfs:xfs_end_bio_delalloc+0x9/0x12
 [<ffffffff8004d32e>] run_workqueue+0x9e/0xfb
 [<ffffffff80049b3d>] worker_thread+0x0/0x122
 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80049c2d>] worker_thread+0xf0/0x122
 [<ffffffff8008e87f>] default_wake_function+0x0/0xe
 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003270f>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032611>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

So we build a new install RH 6.2 based.
HP ProLiant BL460c G6
4G memory
1x Intel(R) Xeon(R) CPU X5550 @ 2.67GHz
System disk : BootFromSan 50G (HP EVA 8400), LVM2, ext4 partitions
Data disk : 1) 46 TB HP MDS 40 RAID6 LUNS sriped via lvm2
                $ lvcreate -i 40 -I 256  -n Ldata -l 11919320 Vdata
                $ mkfs.xfs -d su=256k,sw=40 /dev/Vdata/Ldata
            2) 6 TB HP EVA LUN, xfs filesystem

MDS /dev/mapper/Vdata-Ldata on /srv/nfs02 type xfs (rw,nosuid,nodev,noatime,nodiratime,nobarrier,largeio)
EVA /dev/mapper/mpathap on /srv/nfs03 type xfs (rw,nosuid,nodev,noatime,nodiratime,nobarrier,largeio)


Red Hat Enterpris2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 x86_64 x86_64 x86_64 GNU/Linuxe Linux Server release 6.2 (Santiago)
NFS exports to serve as HPC data server

Comment 1 Andre ten Bohmer 2011-12-08 10:55:16 UTC
Created attachment 542479 [details]
dmesg output when under IO stress

Comment 3 Andre ten Bohmer 2011-12-08 12:39:09 UTC
System crashed (kdump console got stuck on a ping test because ctrl-c did not work so no vmcore ...sigh) , but I now enabled TSO and so far so good with regards to the "swapper: page allocation failure. order:2, mode:0x20" messages, none seen so far even when the IO stress hit's the server again.

$ cat /etc/modprobe.d/bnx2x.conf 
options bnx2x disable_tpa=0 debug=0

$ ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: on

Comment 4 Andre ten Bohmer 2011-12-14 08:43:51 UTC
Created attachment 546627 [details]
nfsd: page allocation failure. order:3, mode:0x20

nfsd: page allocation failure. order:3, mode:0x20
swapper: page allocation failure. order:1, mode:0x20
swapper: page allocation failure. order:3, mode:0x20

Comment 5 Andre ten Bohmer 2012-01-23 10:07:09 UTC
------------[ cut here ]------------
WARNING: at kernel/sched.c:5914 thread_return+0x232/0x79d() (Not tainted)
Hardware name: ProLiant BL460c G6
Modules linked in: mptctl mptbase ipmi_devintf nfsd lockd nfs_acl auth_rpcgss autofs4 sunrpc xt_NOTRACK iptable_raw ipt_LOG xt_multiport xt_limit ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables 8021q garp stp llc bonding ipv6 xfs exportfs ext2 power_meter ipmi_si ipmi_msghandler hpilo hpwdt sg bnx2x libcrc32c mdio microcode serio_raw iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 dm_round_robin sd_mod crc_t10dif hpsa(U) cciss(U) qla2xxx scsi_transport_fc scsi_tgt radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 2658, comm: xfsdatad/1 Not tainted 2.6.32-220.2.1.el6.x86_64 #1
Call Trace:
 [<ffffffff81069997>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff810699ea>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff814eccc5>] ? thread_return+0x232/0x79d
 [<ffffffff8107bf0c>] ? lock_timer_base+0x3c/0x70
 [<ffffffff814ed902>] ? schedule_timeout+0x192/0x2e0
 [<ffffffff8107c020>] ? process_timeout+0x0/0x10
 [<ffffffffa0468600>] ? xfs_end_io+0x0/0xb0 [xfs]
 [<ffffffff814eda6e>] ? schedule_timeout_uninterruptible+0x1e/0x20
 [<ffffffffa04686a0>] ? xfs_end_io+0xa0/0xb0 [xfs]
 [<ffffffff8108b0d0>] ? worker_thread+0x170/0x2a0
 [<ffffffff81090a10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108af60>] ? worker_thread+0x0/0x2a0
 [<ffffffff810906a6>] ? kthread+0x96/0xa0
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff81090610>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
---[ end trace 5ce70fd41350c32d ]---

Comment 6 Moritz Baumann 2012-01-23 15:51:21 UTC
Hi Andre,

for me sysctl -w vm.zone_reclaim_mode=1 fixed this.

Comment 7 Andre ten Bohmer 2012-01-23 16:11:53 UTC
Hi Moritz,
Ok thanks, we'll have it a go!

Comment 8 Andre ten Bohmer 2012-02-06 16:07:52 UTC
Created attachment 559671 [details]
Starting udev: multipath invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=-17, oom_score_adj=-1000

After increasing memory from 4GB to 16GB, we captured this console log. multipath invokes the oom killer which finaly results in an unresponsive syste,

Comment 10 RHEL Program Management 2012-05-03 05:27:55 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 11 Orion Poplawski 2012-10-08 16:48:31 UTC
Oct  8 10:31:37 alexandria kernel: nfsd: page allocation failure. order:5, mode:0x20

What does the higher order mean?  Unfortunately this is a 32-bit machine and does not have the vm.zone_reclaim_mode option.

Comment 12 Jes Sorensen 2012-12-06 14:09:08 UTC
order:5 means it is trying to allocate 2^5 pages, ie. 32 pages or a total of
128KB of contiguous memory.

Comment 13 Nitin Sharma 2013-01-12 18:43:58 UTC
Is this bug specific to xfs? I saw similar traces on my 2.6.32-279.14.1.el6.x86_64

Comment 14 Chris Williams 2016-08-04 19:05:01 UTC
When Red Hat shipped 6.8 on May 10, 2016 RHEL 6 entered Production Phase 2. 
https://access.redhat.com/support/policy/updates/errata#Production_2_Phase
That means only "Critical and Important Security errata advisories (RHSAs) and Urgent Priority Bug Fix errata advisories (RHBAs) may be released"
This BZ is now going to be closed as it does not appear to meet Phase 2 criteria. 
If this BZ is deemed critical to the customer please open a support case in the Red Hat Customer Portal and ask that this BZ be re-opened.