Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 761442

Summary:

swapper: page allocation failure. order:2, mode:0x20

Product:

Red Hat Enterprise Linux 6

Reporter:

Andre ten Bohmer <andre.tenbohmer>

Component:

kernel

Assignee:

Red Hat Kernel Manager <kernel-mgr>

Status:

CLOSED WONTFIX

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

6.2

CC:

aquini, baumanmo, cww, lwang, mishu, nitinics, orion

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-08-04 19:05:01 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1359574

Attachments:

Description	Flags
after server boot and low IO load	none
dmesg output when under IO stress	none
nfsd: page allocation failure. order:3, mode:0x20	none
Starting udev: multipath invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=-17, oom_score_adj=-1000	none

Description Andre ten Bohmer 2011-12-08 10:53:39 UTC

Created attachment 542478 [details]
after server boot and low IO load

Description of problem:
NFS data server RH 6.2 x64 shows new messages in kernel ring:
swapper: page allocation failure. order:2, mode:0x20

Version-Release number of selected component (if applicable):


How reproducible:
Put some strain on the NFS file system exported

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
This server was sometimes unstable with RH5.7:
serevr login: INFO: task xfsdatad/2:3426 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xfsdatad/2    D ffffffff80154db9     0  3426     71          3427  3425 (L-TLB)
 ffff81011b1f1dc0 0000000000000046 0000000000000000 0000000000000000
 0000000000000100 000000000000000a ffff81011d0d77a0 ffff81011ff24080
 000000f44d72caa3 000000000000071c ffff81011d0d7988 0000000200000000
Call Trace:
 [<ffffffff885d1d16>] :xfs:xfs_end_bio_delalloc+0x0/0x12
 [<ffffffff800645e3>] __down_write_nested+0x7a/0x92
 [<ffffffff885d1ca4>] :xfs:xfs_setfilesize+0x2d/0x8d
 [<ffffffff885d1d1f>] :xfs:xfs_end_bio_delalloc+0x9/0x12
 [<ffffffff8004d32e>] run_workqueue+0x9e/0xfb
 [<ffffffff80049b3d>] worker_thread+0x0/0x122
 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80049c2d>] worker_thread+0xf0/0x122
 [<ffffffff8008e87f>] default_wake_function+0x0/0xe
 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003270f>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032611>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

So we build a new install RH 6.2 based.
HP ProLiant BL460c G6
4G memory
1x Intel(R) Xeon(R) CPU X5550 @ 2.67GHz
System disk : BootFromSan 50G (HP EVA 8400), LVM2, ext4 partitions
Data disk : 1) 46 TB HP MDS 40 RAID6 LUNS sriped via lvm2
                $ lvcreate -i 40 -I 256  -n Ldata -l 11919320 Vdata
                $ mkfs.xfs -d su=256k,sw=40 /dev/Vdata/Ldata
            2) 6 TB HP EVA LUN, xfs filesystem

MDS /dev/mapper/Vdata-Ldata on /srv/nfs02 type xfs (rw,nosuid,nodev,noatime,nodiratime,nobarrier,largeio)
EVA /dev/mapper/mpathap on /srv/nfs03 type xfs (rw,nosuid,nodev,noatime,nodiratime,nobarrier,largeio)


Red Hat Enterpris2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 x86_64 x86_64 x86_64 GNU/Linuxe Linux Server release 6.2 (Santiago)
NFS exports to serve as HPC data server

Comment 1 Andre ten Bohmer 2011-12-08 10:55:16 UTC

Created attachment 542479 [details]
dmesg output when under IO stress

Comment 3 Andre ten Bohmer 2011-12-08 12:39:09 UTC

System crashed (kdump console got stuck on a ping test because ctrl-c did not work so no vmcore ...sigh) , but I now enabled TSO and so far so good with regards to the "swapper: page allocation failure. order:2, mode:0x20" messages, none seen so far even when the IO stress hit's the server again.

$ cat /etc/modprobe.d/bnx2x.conf 
options bnx2x disable_tpa=0 debug=0

$ ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: on

Comment 4 Andre ten Bohmer 2011-12-14 08:43:51 UTC

Created attachment 546627 [details]
nfsd: page allocation failure. order:3, mode:0x20

nfsd: page allocation failure. order:3, mode:0x20
swapper: page allocation failure. order:1, mode:0x20
swapper: page allocation failure. order:3, mode:0x20

Comment 5 Andre ten Bohmer 2012-01-23 10:07:09 UTC

------------[ cut here ]------------
WARNING: at kernel/sched.c:5914 thread_return+0x232/0x79d() (Not tainted)
Hardware name: ProLiant BL460c G6
Modules linked in: mptctl mptbase ipmi_devintf nfsd lockd nfs_acl auth_rpcgss autofs4 sunrpc xt_NOTRACK iptable_raw ipt_LOG xt_multiport xt_limit ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables 8021q garp stp llc bonding ipv6 xfs exportfs ext2 power_meter ipmi_si ipmi_msghandler hpilo hpwdt sg bnx2x libcrc32c mdio microcode serio_raw iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 dm_round_robin sd_mod crc_t10dif hpsa(U) cciss(U) qla2xxx scsi_transport_fc scsi_tgt radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 2658, comm: xfsdatad/1 Not tainted 2.6.32-220.2.1.el6.x86_64 #1
Call Trace:
 [<ffffffff81069997>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff810699ea>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff814eccc5>] ? thread_return+0x232/0x79d
 [<ffffffff8107bf0c>] ? lock_timer_base+0x3c/0x70
 [<ffffffff814ed902>] ? schedule_timeout+0x192/0x2e0
 [<ffffffff8107c020>] ? process_timeout+0x0/0x10
 [<ffffffffa0468600>] ? xfs_end_io+0x0/0xb0 [xfs]
 [<ffffffff814eda6e>] ? schedule_timeout_uninterruptible+0x1e/0x20
 [<ffffffffa04686a0>] ? xfs_end_io+0xa0/0xb0 [xfs]
 [<ffffffff8108b0d0>] ? worker_thread+0x170/0x2a0
 [<ffffffff81090a10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108af60>] ? worker_thread+0x0/0x2a0
 [<ffffffff810906a6>] ? kthread+0x96/0xa0
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff81090610>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
---[ end trace 5ce70fd41350c32d ]---

Comment 6 Moritz Baumann 2012-01-23 15:51:21 UTC

Hi Andre,

for me sysctl -w vm.zone_reclaim_mode=1 fixed this.

Comment 7 Andre ten Bohmer 2012-01-23 16:11:53 UTC

Hi Moritz,
Ok thanks, we'll have it a go!

Comment 8 Andre ten Bohmer 2012-02-06 16:07:52 UTC

Created attachment 559671 [details]
Starting udev: multipath invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=-17, oom_score_adj=-1000

After increasing memory from 4GB to 16GB, we captured this console log. multipath invokes the oom killer which finaly results in an unresponsive syste,

Comment 10 RHEL Program Management 2012-05-03 05:27:55 UTC

Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 11 Orion Poplawski 2012-10-08 16:48:31 UTC

Oct  8 10:31:37 alexandria kernel: nfsd: page allocation failure. order:5, mode:0x20

What does the higher order mean?  Unfortunately this is a 32-bit machine and does not have the vm.zone_reclaim_mode option.

Comment 12 Jes Sorensen 2012-12-06 14:09:08 UTC

order:5 means it is trying to allocate 2^5 pages, ie. 32 pages or a total of
128KB of contiguous memory.

Comment 13 Nitin Sharma 2013-01-12 18:43:58 UTC

Is this bug specific to xfs? I saw similar traces on my 2.6.32-279.14.1.el6.x86_64

Comment 14 Chris Williams 2016-08-04 19:05:01 UTC

When Red Hat shipped 6.8 on May 10, 2016 RHEL 6 entered Production Phase 2. 
https://access.redhat.com/support/policy/updates/errata#Production_2_Phase
That means only "Critical and Important Security errata advisories (RHSAs) and Urgent Priority Bug Fix errata advisories (RHBAs) may be released"
This BZ is now going to be closed as it does not appear to meet Phase 2 criteria. 
If this BZ is deemed critical to the customer please open a support case in the Red Hat Customer Portal and ask that this BZ be re-opened.