Bug 676022

Summary: RHELS6_64 nfsd bug causes system hang
Product: Red Hat Enterprise Linux 6 Reporter: Andre ten Bohmer <andre.tenbohmer>
Component: kernelAssignee: J. Bruce Fields <bfields>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Filesystem QE <fs-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.0CC: bfields, dchinner, dhowells, jlayton, kzhang, mschmidt, pasteur, rwheeler, sprabhu, steved, syeghiay, yanwang
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-17 22:35:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
XFS quota enabled causes server crash none

Description Andre ten Bohmer 2011-02-08 16:21:04 UTC
Description of problem:
RHELS6_64 server running as NFS server with about 10 clients:
Console message:
BUG:scheduling while atomic: nfsd/3538/0xffffffff
BUG: unable to handle kernel paging request at 000000038ab00bc0
IP: [<ffffffff81056fd>] task_rq_lock+0x4d/0xa0
PGD 0
Oops: 0000 {#1} SMP

Version-Release number of selected component (if applicable):
nfs-utils-1.2.2-7.el6.x86_64
nfs-utils-lib-1.1.5-1.el6.x86_64


How reproducible:
I've seen the "atomic" word once before regarding a xfs defrag on this server (46TB xfs file system) which also caused a system hang. Maybe if I run a defrag again, the system will hang.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
System hang (no IP ping, no keyboard response from system console)

Expected results:
No system hang

Additional info:
]# lsb_release -a
LSB Version:	:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID:	RedHatEnterpriseServer
Description:	Red Hat Enterprise Linux Server release 6.0 (Santiago)
Release:	6.0
Codename:	Santiago

]# uname -a
Linux scomp1110 2.6.32-71.14.1.el6.x86_64 #1 SMP Wed Jan 5 17:01:01 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

Manufacturer: HP
Product Name: ProLiant BL460c G6

NFS export on a 46T lvm striped volume group with a XFS file system.

Comment 2 Steve Dickson 2011-02-17 16:18:17 UTC
What is the kernel version?

Comment 3 Andre ten Bohmer 2011-02-17 16:23:08 UTC
See "uname -a" part

]# uname -a
Linux scomp1110 2.6.32-71.14.1.el6.x86_64 #1 SMP Wed Jan 5 17:01:01 EST 2011
x86_64 x86_64 x86_64 GNU/Linux

Comment 4 J. Bruce Fields 2011-02-28 21:25:40 UTC
Is there any more to that console message?  Normally I'd expect a backtrace to follow.

Comment 5 Andre ten Bohmer 2011-03-01 21:09:10 UTC
Created attachment 481720 [details]
XFS quota enabled causes server crash

Seems xfs is buggy on this system. After enabling quota, this show up on the remote console.

Comment 6 Andre ten Bohmer 2011-03-04 10:35:53 UTC
Kernel ring message on the 'sister' server which did not cause a system hang:

nfsd: page allocation failure. order:4, mode:0x20
Pid: 4187, comm: nfsd Not tainted 2.6.32-71.14.1.el6.x86_64 #1
Call Trace:
 [<ffffffff8111ea06>] __alloc_pages_nodemask+0x706/0x850
 [<ffffffff811560e2>] kmem_getpages+0x62/0x170
 [<ffffffff81156cfa>] fallback_alloc+0x1ba/0x270
 [<ffffffff8115674f>] ? cache_grow+0x2cf/0x320
 [<ffffffff81156a79>] ____cache_alloc_node+0x99/0x160
 [<ffffffff814063df>] ? pskb_expand_head+0x5f/0x1e0
 [<ffffffff81157809>] __kmalloc+0x189/0x220
 [<ffffffff814063df>] pskb_expand_head+0x5f/0x1e0
 [<ffffffff8140881a>] __pskb_pull_tail+0x2aa/0x360
 [<ffffffffa028a6ce>] bnx2x_start_xmit+0x19e/0xf50 [bnx2x]
 [<ffffffffa028a8cf>] ? bnx2x_start_xmit+0x39f/0xf50 [bnx2x]
 [<ffffffffa03b55f0>] ? xfs_iomap_eof_want_preallocate+0xd0/0x150 [xfs]
 [<ffffffff81410da8>] dev_hard_start_xmit+0x2b8/0x370
 [<ffffffff814291ba>] sch_direct_xmit+0x15a/0x1c0
 [<ffffffff81414338>] dev_queue_xmit+0x378/0x4a0
 [<ffffffffa02bb0b5>] ? ipt_do_table+0x295/0x678 [ip_tables]
 [<ffffffffa0510615>] bond_dev_queue_xmit+0x45/0x1b0 [bonding]
 [<ffffffffa0510b32>] bond_start_xmit+0x3b2/0x4a0 [bonding]
 [<ffffffff81410da8>] dev_hard_start_xmit+0x2b8/0x370
 [<ffffffff81414386>] dev_queue_xmit+0x3c6/0x4a0
 [<ffffffffa05939d4>] vlan_dev_hwaccel_hard_start_xmit+0x84/0xb0 [8021q]
 [<ffffffff81410da8>] dev_hard_start_xmit+0x2b8/0x370
 [<ffffffff81414386>] dev_queue_xmit+0x3c6/0x4a0
 [<ffffffff8144758c>] ip_finish_output+0x13c/0x310
 [<ffffffff81447818>] ip_output+0xb8/0xc0
 [<ffffffff8144676f>] ? __ip_local_out+0x9f/0xb0
 [<ffffffff814467a5>] ip_local_out+0x25/0x30
 [<ffffffff81446ff0>] ip_queue_xmit+0x190/0x420
 [<ffffffff81447818>] ? ip_output+0xb8/0xc0
 [<ffffffff8144676f>] ? __ip_local_out+0x9f/0xb0
 [<ffffffff814467a5>] ? ip_local_out+0x25/0x30
 [<ffffffff8145bca1>] tcp_transmit_skb+0x3f1/0x790
 [<ffffffff8145e017>] tcp_write_xmit+0x1e7/0x9e0
 [<ffffffff8145e9a0>] __tcp_push_pending_frames+0x30/0xe0
 [<ffffffff81456613>] tcp_data_snd_check+0x33/0x100
 [<ffffffff8145a310>] tcp_rcv_established+0x5c0/0x820
 [<ffffffffa03b016c>] ? xfs_iext_bno_to_ext+0x8c/0x170 [xfs]
 [<ffffffff814620d3>] tcp_v4_do_rcv+0x2e3/0x430
 [<ffffffff81264885>] ? memmove+0x45/0x50
 [<ffffffff814019a5>] release_sock+0x65/0xd0
 [<ffffffff814512c1>] tcp_recvmsg+0x821/0xe80
 [<ffffffffa0392e5d>] ? xfs_bmapi+0x1bd/0x11a0 [xfs]
CE: hpet increasing min_delta_ns to 15000 nsec
CE: hpet increasing min_delta_ns to 22500 nsec
 [<ffffffff8145dea6>] ? tcp_write_xmit+0x76/0x9e0
 [<ffffffff81401069>] sock_common_recvmsg+0x39/0x50
 [<ffffffff813fea53>] sock_recvmsg+0x133/0x160
 [<ffffffff810566d0>] ? __dequeue_entity+0x30/0x50
 [<ffffffff8105c7f6>] ? update_curr+0xe6/0x1e0
 [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81061c11>] ? dequeue_entity+0x1a1/0x1e0
 [<ffffffff810116e0>] ? __switch_to+0xd0/0x320
 [<ffffffff81059db2>] ? finish_task_switch+0x42/0xd0
 [<ffffffff814c8d96>] ? thread_return+0x4e/0x778
 [<ffffffff8105c434>] ? try_to_wake_up+0x284/0x380
 [<ffffffff814cb656>] ? _spin_lock_bh+0x16/0x40
 [<ffffffff813feac4>] kernel_recvmsg+0x44/0x60
 [<ffffffffa053b885>] svc_recvfrom+0x65/0xa0 [sunrpc]
 [<ffffffff81472430>] ? inet_ioctl+0x30/0xa0
 [<ffffffffa053c412>] svc_tcp_recvfrom+0x192/0x660 [sunrpc]
 [<ffffffffa0548acb>] svc_recv+0x7fb/0x830 [sunrpc]
 [<ffffffff8105c530>] ? default_wake_function+0x0/0x20
 [<ffffffffa0599b45>] nfsd+0xa5/0x160 [nfsd]
 [<ffffffffa0599aa0>] ? nfsd+0x0/0x160 [nfsd]
 [<ffffffff81091a76>] kthread+0x96/0xa0
 [<ffffffff810141ca>] child_rip+0xa/0x20
 [<ffffffff810919e0>] ? kthread+0x0/0xa0
 [<ffffffff810141c0>] ? child_rip+0x0/0x20
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
CPU    4: hi:    0, btch:   1 usd:   0
CPU    5: hi:    0, btch:   1 usd:   0
CPU    6: hi:    0, btch:   1 usd:   0
CPU    7: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 182
CPU    1: hi:  186, btch:  31 usd: 170
CPU    2: hi:  186, btch:  31 usd:  30
CPU    3: hi:  186, btch:  31 usd: 114
CPU    4: hi:  186, btch:  31 usd:  43
CPU    5: hi:  186, btch:  31 usd:  29
CPU    6: hi:  186, btch:  31 usd:  15
CPU    7: hi:  186, btch:  31 usd: 110
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 157
CPU    1: hi:  186, btch:  31 usd: 182
CPU    2: hi:  186, btch:  31 usd:  74
CPU    3: hi:  186, btch:  31 usd:  37
CPU    4: hi:  186, btch:  31 usd:  36
CPU    5: hi:  186, btch:  31 usd:  67
CPU    6: hi:  186, btch:  31 usd:  49
CPU    7: hi:  186, btch:  31 usd: 131
active_anon:8275 inactive_anon:2222 isolated_anon:0
 active_file:648839 inactive_file:2098774 isolated_file:0
 unevictable:1475 dirty:16782 writeback:45649 unstable:0
 free:48629 slab_reclaimable:65582 slab_unreclaimable:146436
 mapped:5001 shmem:223 pagetables:1652 bounce:0
Node 0 DMA free:15700kB min:80kB low:100kB high:120kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15308kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3502 12087 12087
Node 0 DMA32 free:82484kB min:19556kB low:24444kB high:29332kB active_anon:80kB inactive_anon:2448kB active_file:758724kB inactive_file:2384696kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3586464kB mlocked:0kB dirty:17772kB writeback:2064kB mapped:336kB shmem:0kB slab_reclaimable:121300kB slab_unreclaimable:28720kB kernel_stack:24kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 8584 8584
Node 0 Normal free:96332kB min:47940kB low:59924kB high:71908kB active_anon:33020kB inactive_anon:6440kB active_file:1836632kB inactive_file:6010400kB unevictable:5900kB isolated(anon):0kB isolated(file):0kB present:8791036kB mlocked:5900kB dirty:49356kB writeback:180532kB mapped:19668kB shmem:892kB slab_reclaimable:141028kB slab_unreclaimable:557024kB kernel_stack:3272kB pagetables:6608kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15700kB
Node 0 DMA32: 11572*4kB 2347*8kB 373*16kB 176*32kB 69*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 82616kB
Node 0 Normal: 14879*4kB 1103*8kB 707*16kB 297*32kB 62*64kB 1*128kB 3*256kB 1*512kB 2*1024kB 0*2048kB 0*4096kB = 96580kB
2748831 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 8388600kB
Total swap = 8388600kB
3145726 pages RAM
66706 pages reserved
1230916 pages shared
1749857 pages non-shared

Comment 7 Andre ten Bohmer 2011-04-05 11:53:13 UTC
Last Friday, this server crashed multiple times under heavy IO stress via nfs. We decided to rebuild the server with RedHat 5.6 x64 and it's stable so far.

Comment 9 Michal Schmidt 2011-04-07 09:12:40 UTC
(In reply to comment #6)
> nfsd: page allocation failure. order:4, mode:0x20
> Pid: 4187, comm: nfsd Not tainted 2.6.32-71.14.1.el6.x86_64 #1
> [...]

So bnx2x was attempting an order 4 (64 KiB) atomic allocation. It must be the skb_linearize() call which bnx2x does when it is asked to transmit a skb with more frags than the card can handle.

What are the offload settings for the card (ethtool -k ethX)?
From the stack trace I can see that VLANs and bonding are involved. I'll try to reproduce the allocation failures.

I don't yet see how the allocation failure would relate to the BUGs
from the original description though:
> BUG:scheduling while atomic: nfsd/3538/0xffffffff
> BUG: unable to handle kernel paging request at 000000038ab00bc0

Comment 10 Jeff Layton 2011-04-07 12:48:07 UTC
Agreed. I think there are likely several bugs here that may or may not be related. We'll likely need a stack trace from the scheduling while atomic message in order to know what that is.

Comment 12 Andre ten Bohmer 2011-04-22 07:23:38 UTC
On a 'sister' server running RHEL6 (stable) :
]# ethtool -k eth14
Offload parameters for eth14:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off

On the 'problem' server, but now running RHEL5 
]# ethtool -k eth0
Offload parameters for eth0:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
generic-receive-offload: on

I'm sorry I can't do any further research on the 'problem' server. It's running production and RHEL6 on this system was to unstable. It has exact the same setup as the 'sister' server which in  fact is running RHEL6 very stable. The only big difference is that the problem server is 'boot from san' and the sister server boot's from local disks.

Comment 13 Andre ten Bohmer 2011-06-28 09:40:09 UTC
The RHEL 5.6 installation is more stable, but last weekend it also crashed several time. Managed to setup a serial console monitor via the ILO2 board and now testing with heavy io loads and hope the server crashes again so I can provide you with a dump.

Comment 14 J. Bruce Fields 2011-09-13 21:23:28 UTC
Looks like we're waiting on more information from the reporter?

Comment 16 Ric Wheeler 2011-09-13 23:29:35 UTC
I am going to close this specific BZ pending an update from the reporter. Please reopen it if this issue happens again, or a new one to track the other issues you mention in comment https://bugzilla.redhat.com/show_bug.cgi?id=676022#c13

Thanks!

Comment 17 Andre ten Bohmer 2011-12-05 15:28:14 UTC
Hello,
Today finaly managed to catch a console dump:

Red Hat Enterprise Linux Server release 5.7 (Tikanga)
Kernel 2.6.18-274.7.1.el5 on an x86_64

serevr login: INFO: task xfsdatad/2:3426 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xfsdatad/2    D ffffffff80154db9     0  3426     71          3427  3425 (L-TLB)
 ffff81011b1f1dc0 0000000000000046 0000000000000000 0000000000000000
 0000000000000100 000000000000000a ffff81011d0d77a0 ffff81011ff24080
 000000f44d72caa3 000000000000071c ffff81011d0d7988 0000000200000000
Call Trace:
 [<ffffffff885d1d16>] :xfs:xfs_end_bio_delalloc+0x0/0x12
 [<ffffffff800645e3>] __down_write_nested+0x7a/0x92
 [<ffffffff885d1ca4>] :xfs:xfs_setfilesize+0x2d/0x8d
 [<ffffffff885d1d1f>] :xfs:xfs_end_bio_delalloc+0x9/0x12
 [<ffffffff8004d32e>] run_workqueue+0x9e/0xfb
 [<ffffffff80049b3d>] worker_thread+0x0/0x122
 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80049c2d>] worker_thread+0xf0/0x122
 [<ffffffff8008e87f>] default_wake_function+0x0/0xe
 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003270f>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032611>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

INFO: task nfsd:5298 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D ffffffff80154db9     0  5298      1          5299  5297 (L-TLB)
 ffff8100d9a799d0 0000000000000046 ffff8100c8ad3a9c ffff8100d9fc4440
 ffffffffffffff5d 000000000000000a ffff8100d9957080 ffff81011fe39100
 000000f44b13bfd6 0000000000006a2e ffff8100d9957268 000000078003f92f
Call Trace:
 [<ffffffff800ef670>] inode_wait+0x0/0xd
 [<ffffffff800ef679>] inode_wait+0x9/0xd
 [<ffffffff800639fa>] __wait_on_bit+0x40/0x6e
 [<ffffffff800ef670>] inode_wait+0x0/0xd
 [<ffffffff80063a94>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff800a2e7f>] wake_bit_function+0x0/0x23
 [<ffffffff80031ab5>] sock_common_recvmsg+0x2d/0x43
 [<ffffffff8003d988>] ifind_fast+0x6e/0x83
 [<ffffffff8002355d>] iget_locked+0x59/0x149
 [<ffffffff885b6bd9>] :xfs:xfs_iget+0x4f/0x17a
 [<ffffffff885d519c>] :xfs:xfs_fs_get_dentry+0x3e/0xae
 [<ffffffff887f536d>] :exportfs:find_exported_dentry+0x43/0x486
 [<ffffffff88802739>] :nfsd:nfsd_acceptable+0x0/0xdc
 [<ffffffff8880680b>] :nfsd:exp_get_by_name+0x5b/0x71
 [<ffffffff88806dfa>] :nfsd:exp_find_key+0x89/0x9c
 [<ffffffff8008cca4>] __wake_up_common+0x3e/0x68
 [<ffffffff88802739>] :nfsd:nfsd_acceptable+0x0/0xdc
 [<ffffffff885d5042>] :xfs:xfs_fs_decode_fh+0xce/0xd8
 [<ffffffff88802ab1>] :nfsd:fh_verify+0x29c/0x4cf
 [<ffffffff88803d1f>] :nfsd:nfsd_open+0x2c/0x196
 [<ffffffff88804051>] :nfsd:nfsd_write+0x89/0xd5
 [<ffffffff8880abae>] :nfsd:nfsd3_proc_write+0xea/0x109
 [<ffffffff888001db>] :nfsd:nfsd_dispatch+0xd8/0x1d6
 [<ffffffff8877f80d>] :sunrpc:svc_process+0x44c/0x713
 [<ffffffff80064614>] __down_read+0x12/0x92
 [<ffffffff88800580>] :nfsd:nfsd+0x0/0x2c8
 [<ffffffff88800725>] :nfsd:nfsd+0x1a5/0x2c8
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff88800580>] :nfsd:nfsd+0x0/0x2c8
 [<ffffffff88800580>] :nfsd:nfsd+0x0/0x2c8
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

Comment 18 Andre ten Bohmer 2011-12-06 14:03:15 UTC
Hi,
Found this via google:
http://comments.gmane.org/gmane.comp.file-systems.xfs.general/32747
Seems to be a bug solved in newer kernels as off 2.6.34?
Will RedHat reverse engineer this in 2.6.18 (RH5.7) and 2.6.32 (RH6.1) ?

Comment 19 J. Bruce Fields 2011-12-06 16:53:44 UTC
(In reply to comment #18)
> Found this via google:
> http://comments.gmane.org/gmane.comp.file-systems.xfs.general/32747

That does look similar to what you report in Comment #17, though I'm less certain it's related to the bug originally reported here--probably this deserves a new bug if it's not already fixed.

I don't see any xfs people on the cc, so adding Dave Chinner to see what he thinks.

Comment 20 Dave Chinner 2011-12-06 22:08:24 UTC
(In reply to comment #18)
> Hi,
> Found this via google:
> http://comments.gmane.org/gmane.comp.file-systems.xfs.general/32747
> Seems to be a bug solved in newer kernels as off 2.6.34?

It's a different problem and completely irrelevant. xfstests 104 is testing online filesystem growing functionality, which used to deadlock in the allocator
code under extreme stress.

> Will RedHat reverse engineer this in 2.6.18 (RH5.7)

Very unlikely because it's an extremely rare problem in production systems and the fix is very intrusive. And in most cases, growing a filesystem is done during scheduled downtime, so it's not likely to be a serious problem even if the deadlock is tripped.

> and 2.6.32 (RH6.1) ?

RHEL6.0 already has this fixed.

Comment 21 Andre ten Bohmer 2011-12-06 22:18:08 UTC
(In reply to comment #20)
> (In reply to comment #18)
> > Hi,
> > Found this via google:
> > http://comments.gmane.org/gmane.comp.file-systems.xfs.general/32747
> > Seems to be a bug solved in newer kernels as off 2.6.34?
> 
> It's a different problem and completely irrelevant. xfstests 104 is testing
> online filesystem growing functionality, which used to deadlock in the
> allocator
> code under extreme stress.
That's the nature of intensive HPC jobs, extreme stress.

> > Will RedHat reverse engineer this in 2.6.18 (RH5.7)
> 
> Very unlikely because it's an extremely rare problem in production systems and
> the fix is very intrusive. And in most cases, growing a filesystem is done
> during scheduled downtime, so it's not likely to be a serious problem even if
> the deadlock is tripped.
So it's not advisable to grow file systems online?

> > and 2.6.32 (RH6.1) ?
> 
> RHEL6.0 already has this fixed.
Since when? Otherwise I'll rebuild this system with RH 6.2

Thanks for your time.

Comment 22 Dave Chinner 2011-12-06 22:23:00 UTC
(In reply to comment #17)
> Hello,
> Today finaly managed to catch a console dump:
> 
> Red Hat Enterprise Linux Server release 5.7 (Tikanga)
> Kernel 2.6.18-274.7.1.el5 on an x86_64
> 
> serevr login: INFO: task xfsdatad/2:3426 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> xfsdatad/2    D ffffffff80154db9     0  3426     71          3427  3425 (L-TLB)
>  ffff81011b1f1dc0 0000000000000046 0000000000000000 0000000000000000
>  0000000000000100 000000000000000a ffff81011d0d77a0 ffff81011ff24080
>  000000f44d72caa3 000000000000071c ffff81011d0d7988 0000000200000000
> Call Trace:
>  [<ffffffff885d1d16>] :xfs:xfs_end_bio_delalloc+0x0/0x12
>  [<ffffffff800645e3>] __down_write_nested+0x7a/0x92
>  [<ffffffff885d1ca4>] :xfs:xfs_setfilesize+0x2d/0x8d
>  [<ffffffff885d1d1f>] :xfs:xfs_end_bio_delalloc+0x9/0x12
>  [<ffffffff8004d32e>] run_workqueue+0x9e/0xfb
>  [<ffffffff80049b3d>] worker_thread+0x0/0x122
>  [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff80049c2d>] worker_thread+0xf0/0x122
>  [<ffffffff8008e87f>] default_wake_function+0x0/0xe
>  [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff8003270f>] kthread+0xfe/0x132
>  [<ffffffff8005dfb1>] child_rip+0xa/0x11
>  [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff80032611>] kthread+0x0/0x132
>  [<ffffffff8005dfa7>] child_rip+0x0/0x11

This implies that something else is holding the XFS inode ilock and not letting it go. That's one step closer to the potential root cause of the problem - "echo w > /proc/sysrq-trigger" should dump the traces of all the currently blocked processes and will probably tell us who is holding the ilock and why they are not letting it go. Then we'll know whether this is caused by or the cause of your OOM problem.

Comment 23 Dave Chinner 2011-12-06 23:07:15 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #18)
> > > Will RedHat reverse engineer this in 2.6.18 (RH5.7)
> > 
> > Very unlikely because it's an extremely rare problem in production systems and
> > the fix is very intrusive. And in most cases, growing a filesystem is done
> > during scheduled downtime, so it's not likely to be a serious problem even if
> > the deadlock is tripped.
> So it's not advisable to grow file systems online?

It's never advisable to do filesystem/storage administration tasks while your system is under extreme stress. Best practise implies that you make config changes when there is the least chance of something going wrong, regardless of whether you need to take something offline or not to make the change. Online filesystem growing simply reduces the downtime needed for the operation; it doesn't remove the need to schedule or perform that operation in a safe manner...

Indeed, this problem, before it was fixed in January 2010, had been present in XFS for more than 10 years and I don't recall ever seeing a bug report from anything other than test 104 about the problem....

Anyway, the fact we have a test in a regression test suite that can trip a bug doesn't mean everyone who does that operation will trip that very bug. That's because we devise stress tests specifically to trip over such known issues. There isn't a workload on the planet that looks like the load that test 104 is generating - how many workloads do you know that repeatedly grow the filesystem wile doing hundreds of concurrent operations known to be specifically problematic for the grow operation?

> > > and 2.6.32 (RH6.1) ?
> > 
> > RHEL6.0 already has this fixed.
> Since when? Otherwise I'll rebuild this system with RH 6.2

The fixes were in the original RHEL 6.0 release.

Comment 24 Andre ten Bohmer 2011-12-07 08:40:40 UTC
W'll see, now rebuilding this server with RH 6.2. Thanks for your time.

Comment 25 Dave Chinner 2011-12-07 11:25:27 UTC
(In reply to comment #24)
> W'll see, now rebuilding this server with RH 6.2. Thanks for your time.

Just ot be clear - the fixes for the completely unrelated XFS growfs deadlock problem you pointed to are in rhel6.x. The problem you actually reported is still completely unknown at at this point....

Comment 26 Andre ten Bohmer 2011-12-07 12:00:18 UTC
Ok thanks, sorry for mixing things up but the main issue was an unstable RH5/6 system under heavy IO stress on a NFS export. The first issue was indeed XFS related I guess (becoming unresponsive/crashes when running xfs defrag tool), later under on it seemed also related to a problem in the kernel which roars its ugly head when there is a lot of IO stress on the system. Should I file a new bug report for this one the clear things up?

Comment 28 J. Bruce Fields 2012-02-17 22:35:29 UTC
(In reply to comment #26)
> Ok thanks, sorry for mixing things up but the main issue was an unstable RH5/6
> system under heavy IO stress on a NFS export. The first issue was indeed XFS
> related I guess (becoming unresponsive/crashes when running xfs defrag tool),
> later under on it seemed also related to a problem in the kernel which roars
> its ugly head when there is a lot of IO stress on the system. Should I file a
> new bug report for this one the clear things up?

Yes, if you haven't already done that, please do.  I've lost track of what exactly the problem is here.

Assuming this one should be closed for now.