Hide Forgot
Created attachment 479686 [details] kernel logs to /var/log/messages Description of problem: I am unable to do a full filesystem backup of a 3 TB RAID5 array either using XFS or EXT4 as the filesystem. The backup will run for a while and eventually kernel traces will appear and the system will eventually lockup. I had no troubles backing up this filesystem on RHEL5 on the same hardware. Version-Release number of selected component (if applicable): Fully updated RHEL6 Server, no fastrack. 2.6.32-71.14.1.el6.x86_64 How reproducible: Always Steps to Reproduce: 1. Run xfsdump or dump 2. system locks up 3. Actual results: system lock up Expected results: I have backups and can sleep easier. thanks!
Thanks for the report! While we look into this, you might consider using tar to backup your data.
Are you dumping over the network? Can you test your backup with local storage (assuming you have something large enough)? Thanks!
I am backing up over network NFS. I don't have the ability to do local backups. I have tried a network rsync backup too, and that will dump kernel traces too, but the system won't lock up. I figured this all was related to the kernel memory leaks reported on the rhel6 mailing list, but am not sure. I have nothing but problems with RHEL6 and NFSv3 and v4, so maybe that is related as well. Sorry, that I am not of more help.
For some reason the free memory is so fragmented the allocator cant satisfy an order-1 allocation. Since the GFP mask doesnt not include GFP_FS or GFP_IO for sockets it wont try to reclaim pagecache pages: Node 0 DMA32: 26783*4kB 718*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 112876kB I'm not sure why sk_prot_alloc() would result in an order-1 allocation, every socket related slab on my RHEL6 system has one page-per-slab which would result in an order-0 allocation. Daryl, can you get us a /proc/slabinfo output from your system? Larry Woodman
Created attachment 479922 [details] /proc/slabinfo thank you, attached is my /proc/slabinfo . I have 4 GB of memory and 6 GB of swap. The system is a Dell PowerEdge 1900 tower.
The problem is we are in an irq receiving a TPC packet and sk_prot_alloc() calls the slab allocator for a struct that is allocated out of a slab with more than one page: static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority, int family) { struct sock *sk; struct kmem_cache *slab; slab = prot->slab; if (slab != NULL) { sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO) Feb 18 21:15:10 xxx kernel: Call Trace: Feb 18 21:15:10 xxx kernel: <IRQ> [<ffffffff8111ea06>] __alloc_pages_nodemask+0x706/0x850 Feb 18 21:15:10 xxx kernel: [<ffffffff811560e2>] kmem_getpages+0x62/0x170 Feb 18 21:15:10 xxx kernel: [<ffffffff81156cfa>] fallback_alloc+0x1ba/0x270 Feb 18 21:15:10 xxx kernel: [<ffffffff8115674f>] ? cache_grow+0x2cf/0x320 Feb 18 21:15:10 xxx kernel: [<ffffffff81156a79>] ____cache_alloc_node+0x99/0x160 Feb 18 21:15:10 xxx kernel: [<ffffffff811573db>] kmem_cache_alloc+0x11b/0x190 Feb 18 21:15:10 xxx kernel: [<ffffffff81403fc8>] sk_prot_alloc+0x48/0x180 Feb 18 21:15:10 xxx kernel: [<ffffffff81404212>] sk_clone+0x22/0x2a0 Feb 18 21:15:10 xxx kernel: [<ffffffff8144bd96>] inet_csk_clone+0x16/0xd0 Feb 18 21:15:10 xxx kernel: [<ffffffff81464ce3>] tcp_create_openreq_child+0x23/0x450 Feb 18 21:15:10 xxx kernel: [<ffffffff8146272d>] tcp_v4_syn_recv_sock+0x4d/0x280 Feb 18 21:15:10 xxx kernel: [<ffffffff81464aa1>] tcp_check_req+0x201/0x420 Feb 18 21:15:10 xxx kernel: [<ffffffff8146214b>] tcp_v4_do_rcv+0x35b/0x430 Feb 18 21:15:10 xxx kernel: [<ffffffff81463960>] tcp_v4_rcv+0x5b0/0x7e0 Feb 18 21:15:10 xxx kernel: [<ffffffff814418c0>] ? ip_local_deliver_finish+0x0/0x2d0 Feb 18 21:15:10 xxx kernel: [<ffffffff81436d74>] ? nf_hook_slow+0x74/0x100 Feb 18 21:15:10 xxx kernel: [<ffffffff814418c0>] ? ip_local_deliver_finish+0x0/0x2d0 Feb 18 21:15:10 xxx kernel: [<ffffffff8144199d>] ip_local_deliver_finish+0xdd/0x2d0 Feb 18 21:15:10 xxx kernel: [<ffffffff81441c28>] ip_local_deliver+0x98/0xa0 Feb 18 21:15:10 xxx kernel: [<ffffffff814410ed>] ip_rcv_finish+0x12d/0x440 Feb 18 21:15:10 xxx kernel: [<ffffffff81441675>] ip_rcv+0x275/0x350 Feb 18 21:15:10 xxx kernel: [<ffffffff8140fb4b>] netif_receive_skb+0x38b/0x670 Feb 18 21:15:10 xxx kernel: [<ffffffffa022b238>] bnx2_poll_work+0xd18/0x1240 [bnx2] Feb 18 21:15:10 xxx kernel: [<ffffffff8126d470>] ? swiotlb_map_page+0x0/0x100 Feb 18 21:15:10 xxx kernel: [<ffffffffa022b889>] bnx2_poll+0x69/0x2d8 [bnx2] Feb 18 21:15:10 xxx kernel: [<ffffffff814106d3>] net_rx_action+0x103/0x210 Feb 18 21:15:10 xxx kernel: [<ffffffff81095373>] ? hrtimer_get_next_event+0xc3/0x100 Feb 18 21:15:10 xxx kernel: [<ffffffff81073cd7>] __do_softirq+0xb7/0x1e0 Feb 18 21:15:10 xxx kernel: [<ffffffff810d8950>] ? handle_IRQ_event+0x60/0x170 Feb 18 21:15:10 xxx kernel: [<ffffffff810142cc>] call_softirq+0x1c/0x30 Feb 18 21:15:10 xxx kernel: [<ffffffff81015f35>] do_softirq+0x65/0xa0 Feb 18 21:15:10 xxx kernel: [<ffffffff81073ad5>] irq_exit+0x85/0x90 Feb 18 21:15:10 xxx kernel: [<ffffffff814d0425>] do_IRQ+0x75/0xf0 Feb 18 21:15:10 xxx kernel: [<ffffffff81013ad3>] ret_from_intr+0x0/0x11 When this happens the GFP_MASK is only GFP_HIGH(0x20) so we will not try any page reclamation even for defragmentation therefore the allocation fails when there is not enough free contiguous memory even though there is plenty free. We need to get someone more familiar with the network structure requirements to see why. Larry Woodman
The exact cause of this failure is inet_init() calls proto_register(&tcp_prot,1) which will cause sk_alloc() to call kmem_cache_alloc() for a TCP struct. Since the TCP slab has 2 pages per slab cache_grow() will call kmem_getpages() for an order-1 allocation. Since memory is so fragmented the allocation will fail. /cat /proc/slabinfo slabinfo - version: 2.1 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> TCP 41 72 1664 4 2 Larry
Created attachment 480781 [details] 2.6.32-71.18.1 kernel logs Thanks for the very kind help. For what it is worth, I tried with the latest errata kernel and have attached the message that appears during the dump. I was watching it closely and stopped the backup process when the message appeared, so I think I may have saved the system from locking up so far :) I'd be happy to try any suggestions you may have.
I tried the RHEL6.1 beta kernel and was able to run for 12 hours of dump over NFS without a kernel trace. Unfortunately, the dump was going so slow (3 MB/s) that it would have taken a week to finish :(
Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Hello, I am fine closing this bugzilla. I have not seen any issues with RHEL6.1 kernels. NFS IO with solaris stinks, but that is a different issue :) thanks! daryl
Thanks for the report and testing!
Hi, Sorry about this, but I was confused yesterday and this issue still exists on the most recent 6.1 kernel. The system did not lock up, but was dumping kernel traces that I did not notice, sorry. swapper: page allocation failure. order:1, mode:0x20 Pid: 0, comm: swapper Tainted: G ---------------- T 2.6.32-131.0.15.el6.x86_64 #1 daryl
Hi, I'm probably going to downgrade to RHEL5 this week so that I can have backups again :( Anyway you wish for me to test before doing so? thanks! daryl
We're seeing a similar problem on one of our FTP servers running RHEL 6.1, which has NFS directories mounted on it. We're not doing any dump/backup operation as such, just accessing files. Will attach a trace shortly.
Created attachment 517038 [details] Stack trace from 2.6.32-71.el6.x86_64 This is the trace we're seeing on our end. Unfortunately, we had to downgrade one of the servers to RHEL 5.x to keep the cluster up and that machine hasn't been seeing any errors whatsoever. I would be happy to provide more details, if needed. TIA for looking into this!
Is there interest to try to resolve this? I'd be happy to do whatever suggested?
Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
I haven't seen issues with RHEL6.2 2.6.32-220.el6.x86_64 yet.
I saw similar problem with RHEL6.2 kernel 2.6.32-220.4.1.el6.x86_64 swapper: page allocation failure. order:2, mode:0x20 Pid: 0, comm: swapper Not tainted 2.6.32-220.4.1.el6.x86_64 #1 Call Trace: <IRQ> [<ffffffff81123daf>] ? __alloc_pages_nodemask+0x77f/0x940 [<ffffffff8115dc62>] ? kmem_getpages+0x62/0x170 [<ffffffff8115e87a>] ? fallback_alloc+0x1ba/0x270 [<ffffffff8115e2cf>] ? cache_grow+0x2cf/0x320 [<ffffffff8115e5f9>] ? ____cache_alloc_node+0x99/0x160 [<ffffffff8142186a>] ? __alloc_skb+0x7a/0x180 [<ffffffff8115f4bf>] ? kmem_cache_alloc_node_notrace+0x6f/0x130 [<ffffffff814620b0>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8115f6fb>] ? __kmalloc_node+0x7b/0x100 [<ffffffff8142186a>] ? __alloc_skb+0x7a/0x180 [<ffffffff8142198d>] ? dev_alloc_skb+0x1d/0x40 [<ffffffffa01b36e7>] ? netxen_alloc_rx_skb+0x37/0x1d0 [netxen_nic] [<ffffffffa01b3cde>] ? netxen_process_rcv_ring+0x32e/0xa30 [netxen_nic] [<ffffffff81421267>] ? __kfree_skb+0x47/0xa0 [<ffffffff814212fb>] ? consume_skb+0x3b/0x80 [<ffffffffa01b69e5>] ? netxen_process_cmd_ring+0x185/0x2a0 [netxen_nic] [<ffffffffa01b08d5>] ? netxen_nic_poll+0x45/0xc0 [netxen_nic] [<ffffffff814307c3>] ? net_rx_action+0x103/0x2f0 [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 [<ffffffff810d93c2>] ? handle_IRQ_event+0x92/0x170 [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 [<ffffffff81071de5>] ? irq_exit+0x85/0x90 [<ffffffff814f4c85>] ? do_IRQ+0x75/0xf0 [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11 <EOI> [<ffffffff812c49de>] ? intel_idle+0xde/0x170 [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 [<ffffffff814e5ebc>] ? start_secondary+0x202/0x245 ---------------- ~~~ ---------------- swapper: page allocation failure. order:2, mode:0x20 Pid: 0, comm: swapper Not tainted 2.6.32-220.4.1.el6.x86_64 #1 Call Trace: <IRQ> [<ffffffff81123daf>] ? __alloc_pages_nodemask+0x77f/0x940 [<ffffffff8115dc62>] ? kmem_getpages+0x62/0x170 [<ffffffff8115e87a>] ? fallback_alloc+0x1ba/0x270 [<ffffffff8115e2cf>] ? cache_grow+0x2cf/0x320 [<ffffffff8115e5f9>] ? ____cache_alloc_node+0x99/0x160 [<ffffffff8142186a>] ? __alloc_skb+0x7a/0x180 [<ffffffff8115f4bf>] ? kmem_cache_alloc_node_notrace+0x6f/0x130 [<ffffffff8115f6fb>] ? __kmalloc_node+0x7b/0x100 [<ffffffff8142186a>] ? __alloc_skb+0x7a/0x180 [<ffffffff8142198d>] ? dev_alloc_skb+0x1d/0x40 [<ffffffffa01b36e7>] ? netxen_alloc_rx_skb+0x37/0x1d0 [netxen_nic] [<ffffffff8142e415>] ? __napi_gro_receive+0x55/0x100 [<ffffffffa01b3cde>] ? netxen_process_rcv_ring+0x32e/0xa30 [netxen_nic] [<ffffffff8126c977>] ? kobject_put+0x27/0x60 [<ffffffff81341a47>] ? put_device+0x17/0x20 [<ffffffffa01b68a4>] ? netxen_process_cmd_ring+0x44/0x2a0 [netxen_nic] [<ffffffffa01b08d5>] ? netxen_nic_poll+0x45/0xc0 [netxen_nic] [<ffffffff814307c3>] ? net_rx_action+0x103/0x2f0 [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 [<ffffffff810d93c2>] ? handle_IRQ_event+0x92/0x170 [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 [<ffffffff81071de5>] ? irq_exit+0x85/0x90 [<ffffffff814f4c85>] ? do_IRQ+0x75/0xf0 [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11 <EOI> [<ffffffff812c49de>] ? intel_idle+0xde/0x170 [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 [<ffffffff814e5ebc>] ? start_secondary+0x202/0x245 very frequent occurrence, about >50 times over 30 minutes, but only lasted for 30 minutes.
It also has something to do with the network driver. (netxen_nic) here. Both stock (netxen_nic) driver or HP (nx_nic) drivers suffer from the same issue. An earlier post in this thread, comment 7, also has a network driver module in its backtrace. Could it be some kind of attack?
Debian with 2.6.32 kernel also has similar problem pertaining to network. http://lists.debian.org/debian-backports/2011/01/msg00037.html
I spoke too soon in Comment #20, I am still seeing this with 2.6.32-220.7.1.el6.x86_64 swapper: page allocation failure. order:1, mode:0x20 Pid: 0, comm: swapper Not tainted 2.6.32-220.7.1.el6.x86_64 #1 Call Trace: <IRQ> [<ffffffff81123ebf>] ? __alloc_pages_nodemask+0x77f/0x940 [<ffffffff8142c800>] ? dev_hard_start_xmit+0x240/0x3f0 [<ffffffff8115dd72>] ? kmem_getpages+0x62/0x170 [<ffffffff8115e98a>] ? fallback_alloc+0x1ba/0x270 [<ffffffff8115e3df>] ? cache_grow+0x2cf/0x320 [<ffffffff8115e709>] ? ____cache_alloc_node+0x99/0x160 [<ffffffff8115f4eb>] ? kmem_cache_alloc+0x11b/0x190 [<ffffffff8141f5c8>] ? sk_prot_alloc+0x48/0x1c0 [<ffffffff8141f852>] ? sk_clone+0x22/0x2e0 [<ffffffff8146cac6>] ? inet_csk_clone+0x16/0xd0 [<ffffffff81485993>] ? tcp_create_openreq_child+0x23/0x450 [<ffffffff8148337d>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0 [<ffffffff81485751>] ? tcp_check_req+0x201/0x420 [<ffffffff81482d9b>] ? tcp_v4_do_rcv+0x35b/0x430 [<ffffffffa03d1557>] ? ipv4_confirm+0x87/0x1d0 [nf_conntrack_ipv4] [<ffffffff81484511>] ? tcp_v4_rcv+0x4e1/0x860 [<ffffffff814621c0>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8146229d>] ? ip_local_deliver_finish+0xdd/0x2d0 [<ffffffff81462528>] ? ip_local_deliver+0x98/0xa0 [<ffffffff814619ed>] ? ip_rcv_finish+0x12d/0x440 [<ffffffff81461f75>] ? ip_rcv+0x275/0x350 [<ffffffff8142bf7b>] ? __netif_receive_skb+0x49b/0x6e0 [<ffffffff8142e028>] ? netif_receive_skb+0x58/0x60 [<ffffffff8142e130>] ? napi_skb_finish+0x50/0x70 [<ffffffff814307b9>] ? napi_gro_receive+0x39/0x50 [<ffffffffa0289d4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2] [<ffffffff8127fab0>] ? swiotlb_map_page+0x0/0x100 [<ffffffffa028a399>] ? bnx2_poll+0x69/0x2d8 [bnx2] [<ffffffff814308d3>] ? net_rx_action+0x103/0x2f0 [<ffffffff81095963>] ? __hrtimer_start_range_ns+0x1a3/0x460 [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 [<ffffffff810d94a0>] ? handle_IRQ_event+0x60/0x170 [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 [<ffffffff81071de5>] ? irq_exit+0x85/0x90 [<ffffffff814f4dc5>] ? do_IRQ+0x75/0xf0 [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11 <EOI> [<ffffffff81014687>] ? mwait_idle+0x77/0xd0 [<ffffffff814f265a>] ? atomic_notifier_call_chain+0x1a/0x20 [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 [<ffffffff814e5ffc>] ? start_secondary+0x202/0x245 The machine does not lock up or crash, so that's good at least :)
A "me too" post ... Using 2.6.32-220.4.1 on Lustre storage nodes, connected via two gigE links, spewing this as soon as some real IO on Lustre comes up: kswapd0: page allocation failure. order:2, mode:0x20 Pid: 38, comm: kswapd0 Not tainted 2.6.32-220.4.2.el6_lustre.x86_64 #1 Call Trace: <IRQ> [<ffffffff81123daf>] ? __alloc_pages_nodemask+0x77f/0x940 [<ffffffff8115dc62>] ? kmem_getpages+0x62/0x170 [<ffffffff8115e87a>] ? fallback_alloc+0x1ba/0x270 [<ffffffff8115e2cf>] ? cache_grow+0x2cf/0x320 [<ffffffff8115e5f9>] ? ____cache_alloc_node+0x99/0x160 [<ffffffff81421bca>] ? __alloc_skb+0x7a/0x180 [<ffffffff8115f4bf>] ? kmem_cache_alloc_node_notrace+0x6f/0x130 [<ffffffff81462410>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8115f6fb>] ? __kmalloc_node+0x7b/0x100 [<ffffffff814624ed>] ? ip_local_deliver_finish+0xdd/0x2d0 [<ffffffff81421bca>] ? __alloc_skb+0x7a/0x180 [<ffffffff81421d46>] ? __netdev_alloc_skb+0x36/0x60 [<ffffffffa03c0f5a>] ? tg3_alloc_rx_data+0x9a/0x250 [tg3] [<ffffffffa03c6308>] ? tg3_poll_work+0xdc8/0x1050 [tg3] [<ffffffffa03d4c81>] ? tg3_poll+0x71/0x450 [tg3] [<ffffffff81430b23>] ? net_rx_action+0x103/0x2f0 [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 [<ffffffff810d9390>] ? handle_IRQ_event+0x60/0x170 [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 [<ffffffff81071de5>] ? irq_exit+0x85/0x90 [<ffffffff814f4fe5>] ? do_IRQ+0x75/0xf0 [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11 <EOI> [<ffffffff8112a4ee>] ? shrink_inactive_list+0x2de/0x740 [<ffffffff8100bdae>] ? call_function_single_interrupt+0xe/0x20 [<ffffffffa048e634>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs] [<ffffffffa0490522>] ? cfs_hash_lookup+0x82/0xa0 [libcfs] [<ffffffff8112b21f>] ? shrink_zone+0x38f/0x520 [<ffffffff81120f0d>] ? zone_watermark_ok_safe+0xad/0xc0 [<ffffffff8112c769>] ? balance_pgdat+0x6e9/0x7e0 [<ffffffff8112cc10>] ? isolate_pages_global+0x0/0x350 [<ffffffff81133486>] ? set_pgdat_percpu_threshold+0xa6/0xd0 [<ffffffff81133670>] ? calculate_pressure_threshold+0x0/0x50 [<ffffffff8112c996>] ? kswapd+0x136/0x3b0 [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8112c860>] ? kswapd+0x0/0x3b0 [<ffffffff81090726>] ? kthread+0x96/0xa0 [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [<ffffffff81090690>] ? kthread+0x0/0xa0 [<ffffffff8100c140>] ? child_rip+0x0/0x20 As I understand, this means dropped packets and if this is true, it is the likely reason for the sporiadic Lustre failures we're observing. Are there any tweaks to either decrease the likelihood of these events or get rid of them completely?
I tested the memory fragmentation hypothesis by dropping caches (echo 3 > /proc/sys/vm/drop_caches). So far it appears to help. I'm raising vm.min_free_bytes, I expect it to help too. Is there a way to figure out what is the largest contigious chunk of memory available?
Sorry to ask but is anyone still interested in this bug? It's idle for more than 2 years now. Daryl?
I don't have any current reproducers, so I am not of help for this.
Probably not anymore this particular bug, but I see xfs in general still suffers from memory fragmentation. Even on machines with 512GB of ram, with about 90% of it being in buffers/cache, I still see XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) on el7.1.
(In reply to Jure Pečar from comment #30) > Probably not anymore this particular bug, but I see xfs in general still > suffers from memory fragmentation. Even on machines with 512GB of ram, with > about 90% of it being in buffers/cache, I still see XFS: possible memory > allocation deadlock in kmem_alloc (mode:0x250) on el7.1. Okay. Please fill another bug for that, Jure. Thanks.
RHEL6 had known issues with memory compaction, which probably were one if not the main cause of this issue. As this bz unfortunately lost traction and now with the lack of reproducers, we don't have other choice than close this bug. Thank you all for the time and patience invested here.