Bug 761442
Summary: | swapper: page allocation failure. order:2, mode:0x20 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Andre ten Bohmer <andre.tenbohmer> |
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> |
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.2 | CC: | aquini, baumanmo, cww, lwang, mishu, nitinics, orion |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-08-04 19:05:01 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1359574 | ||
Attachments: |
Created attachment 542479 [details]
dmesg output when under IO stress
System crashed (kdump console got stuck on a ping test because ctrl-c did not work so no vmcore ...sigh) , but I now enabled TSO and so far so good with regards to the "swapper: page allocation failure. order:2, mode:0x20" messages, none seen so far even when the IO stress hit's the server again. $ cat /etc/modprobe.d/bnx2x.conf options bnx2x disable_tpa=0 debug=0 $ ethtool -k eth0 Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: on Created attachment 546627 [details]
nfsd: page allocation failure. order:3, mode:0x20
nfsd: page allocation failure. order:3, mode:0x20
swapper: page allocation failure. order:1, mode:0x20
swapper: page allocation failure. order:3, mode:0x20
------------[ cut here ]------------ WARNING: at kernel/sched.c:5914 thread_return+0x232/0x79d() (Not tainted) Hardware name: ProLiant BL460c G6 Modules linked in: mptctl mptbase ipmi_devintf nfsd lockd nfs_acl auth_rpcgss autofs4 sunrpc xt_NOTRACK iptable_raw ipt_LOG xt_multiport xt_limit ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables 8021q garp stp llc bonding ipv6 xfs exportfs ext2 power_meter ipmi_si ipmi_msghandler hpilo hpwdt sg bnx2x libcrc32c mdio microcode serio_raw iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 dm_round_robin sd_mod crc_t10dif hpsa(U) cciss(U) qla2xxx scsi_transport_fc scsi_tgt radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 2658, comm: xfsdatad/1 Not tainted 2.6.32-220.2.1.el6.x86_64 #1 Call Trace: [<ffffffff81069997>] ? warn_slowpath_common+0x87/0xc0 [<ffffffff810699ea>] ? warn_slowpath_null+0x1a/0x20 [<ffffffff814eccc5>] ? thread_return+0x232/0x79d [<ffffffff8107bf0c>] ? lock_timer_base+0x3c/0x70 [<ffffffff814ed902>] ? schedule_timeout+0x192/0x2e0 [<ffffffff8107c020>] ? process_timeout+0x0/0x10 [<ffffffffa0468600>] ? xfs_end_io+0x0/0xb0 [xfs] [<ffffffff814eda6e>] ? schedule_timeout_uninterruptible+0x1e/0x20 [<ffffffffa04686a0>] ? xfs_end_io+0xa0/0xb0 [xfs] [<ffffffff8108b0d0>] ? worker_thread+0x170/0x2a0 [<ffffffff81090a10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8108af60>] ? worker_thread+0x0/0x2a0 [<ffffffff810906a6>] ? kthread+0x96/0xa0 [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [<ffffffff81090610>] ? kthread+0x0/0xa0 [<ffffffff8100c140>] ? child_rip+0x0/0x20 ---[ end trace 5ce70fd41350c32d ]--- Hi Andre, for me sysctl -w vm.zone_reclaim_mode=1 fixed this. Hi Moritz, Ok thanks, we'll have it a go! Created attachment 559671 [details]
Starting udev: multipath invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=-17, oom_score_adj=-1000
After increasing memory from 4GB to 16GB, we captured this console log. multipath invokes the oom killer which finaly results in an unresponsive syste,
Since RHEL 6.3 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Oct 8 10:31:37 alexandria kernel: nfsd: page allocation failure. order:5, mode:0x20 What does the higher order mean? Unfortunately this is a 32-bit machine and does not have the vm.zone_reclaim_mode option. order:5 means it is trying to allocate 2^5 pages, ie. 32 pages or a total of 128KB of contiguous memory. Is this bug specific to xfs? I saw similar traces on my 2.6.32-279.14.1.el6.x86_64 When Red Hat shipped 6.8 on May 10, 2016 RHEL 6 entered Production Phase 2. https://access.redhat.com/support/policy/updates/errata#Production_2_Phase That means only "Critical and Important Security errata advisories (RHSAs) and Urgent Priority Bug Fix errata advisories (RHBAs) may be released" This BZ is now going to be closed as it does not appear to meet Phase 2 criteria. If this BZ is deemed critical to the customer please open a support case in the Red Hat Customer Portal and ask that this BZ be re-opened. |
Created attachment 542478 [details] after server boot and low IO load Description of problem: NFS data server RH 6.2 x64 shows new messages in kernel ring: swapper: page allocation failure. order:2, mode:0x20 Version-Release number of selected component (if applicable): How reproducible: Put some strain on the NFS file system exported Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: This server was sometimes unstable with RH5.7: serevr login: INFO: task xfsdatad/2:3426 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. xfsdatad/2 D ffffffff80154db9 0 3426 71 3427 3425 (L-TLB) ffff81011b1f1dc0 0000000000000046 0000000000000000 0000000000000000 0000000000000100 000000000000000a ffff81011d0d77a0 ffff81011ff24080 000000f44d72caa3 000000000000071c ffff81011d0d7988 0000000200000000 Call Trace: [<ffffffff885d1d16>] :xfs:xfs_end_bio_delalloc+0x0/0x12 [<ffffffff800645e3>] __down_write_nested+0x7a/0x92 [<ffffffff885d1ca4>] :xfs:xfs_setfilesize+0x2d/0x8d [<ffffffff885d1d1f>] :xfs:xfs_end_bio_delalloc+0x9/0x12 [<ffffffff8004d32e>] run_workqueue+0x9e/0xfb [<ffffffff80049b3d>] worker_thread+0x0/0x122 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4 [<ffffffff80049c2d>] worker_thread+0xf0/0x122 [<ffffffff8008e87f>] default_wake_function+0x0/0xe [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4 [<ffffffff8003270f>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032611>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 So we build a new install RH 6.2 based. HP ProLiant BL460c G6 4G memory 1x Intel(R) Xeon(R) CPU X5550 @ 2.67GHz System disk : BootFromSan 50G (HP EVA 8400), LVM2, ext4 partitions Data disk : 1) 46 TB HP MDS 40 RAID6 LUNS sriped via lvm2 $ lvcreate -i 40 -I 256 -n Ldata -l 11919320 Vdata $ mkfs.xfs -d su=256k,sw=40 /dev/Vdata/Ldata 2) 6 TB HP EVA LUN, xfs filesystem MDS /dev/mapper/Vdata-Ldata on /srv/nfs02 type xfs (rw,nosuid,nodev,noatime,nodiratime,nobarrier,largeio) EVA /dev/mapper/mpathap on /srv/nfs03 type xfs (rw,nosuid,nodev,noatime,nodiratime,nobarrier,largeio) Red Hat Enterpris2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 x86_64 x86_64 x86_64 GNU/Linuxe Linux Server release 6.2 (Santiago) NFS exports to serve as HPC data server