Bug 783973 - [4e92c58d27b5cea9d7346d6dd88be9b3479c9e3b]: application on nfs client hung
Summary: [4e92c58d27b5cea9d7346d6dd88be9b3479c9e3b]: application on nfs client hung
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact: Raghavendra Bhat
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-23 12:11 UTC by Raghavendra Bhat
Modified: 2012-06-13 09:54 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-13 09:54:28 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
statedump of nfs server process (1.05 MB, application/octet-stream)
2012-01-23 12:13 UTC, Raghavendra Bhat
no flags Details
program of fs-perf-test tool which caused this issue (2.76 KB, text/x-csrc)
2012-01-23 12:13 UTC, Raghavendra Bhat
no flags Details

Description Raghavendra Bhat 2012-01-23 12:11:20 UTC
Description of problem:
On a replicate volume with replica count 2, had 1 fuse and 1 nfs client. On nfs client running fs-perf-test with 1024 fds as argument. After some time the tests were hung.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:
nfs server log messages:

[2012-01-23 16:41:54.936212] E [client3_1-fops.c:1263:client3_1_finodelk_cbk] 0-mirror-client-1: XDR decoding failed
[2012-01-23 16:41:54.936225] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-mirror-client-1: remote operation failed: Invalid argument
[2012-01-23 16:41:54.936309] W [rpc-common.c:64:xdr_to_generic] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x29f) [0x7f57af319d5c] (-->/
usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x20e) [0x7f57af3199fa] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_
1_finodelk_cbk+0x92) [0x7f57ac5d5491]))) 0-xdr: XDR decoding failed
[2012-01-23 16:41:54.936328] E [client3_1-fops.c:1263:client3_1_finodelk_cbk] 0-mirror-client-1: XDR decoding failed
[2012-01-23 16:41:54.936341] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-mirror-client-1: remote operation failed: Invalid argument
[2012-01-23 16:41:54.936446] W [rpc-common.c:64:xdr_to_generic] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x29f) [0x7f57af319d5c] (-->/
usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x20e) [0x7f57af3199fa] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_
1_finodelk_cbk+0x92) [0x7f57ac5d5491]))) 0-xdr: XDR decoding failed
[2012-01-23 16:41:54.936466] E [client3_1-fops.c:1263:client3_1_finodelk_cbk] 0-mirror-client-1: XDR decoding failed
[2012-01-23 16:41:54.936479] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-mirror-client-1: remote operation failed: Invalid argument
[2012-01-23 16:41:54.936560] W [rpc-common.c:64:xdr_to_generic] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x29f) [0x7f57af319d5c] (-->/
usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x20e) [0x7f57af3199fa] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_
1_finodelk_cbk+0x92) [0x7f57ac5d5491]))) 0-xdr: XDR decoding failed
[2012-01-23 16:41:54.936580] E [client3_1-fops.c:1263:client3_1_finodelk_cbk] 0-mirror-client-1: XDR decoding failed
[2012-01-23 16:41:54.936592] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-mirror-client-1: remote operation failed: Invalid argument
[2012-01-23 16:41:54.936675] W [rpc-common.c:64:xdr_to_generic] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x29f) [0x7f57af319d5c] (-->/
usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x20e) [0x7f57af3199fa] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_
1_finodelk_cbk+0x92) [0x7f57ac5d5491]))) 0-xdr: XDR decoding failed
[2012-01-23 16:41:54.936704] E [client3_1-fops.c:1263:client3_1_finodelk_cbk] 0-mirror-client-1: XDR decoding failed
[2012-01-23 16:41:54.936718] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-mirror-client-1: remote operation failed: Invalid argument
[2012-01-23 16:41:54.936771] W [client-lk.c:384:delete_granted_locks_owner] 0-mirror-client-0: fdctx not valid
[2012-01-23 16:41:54.936834] W [rpc-common.c:64:xdr_to_generic] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x29f) [0x7f57af319d5c] (-->/
usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x20e) [0x7f57af3199fa] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_
1_finodelk_cbk+0x92) [0x7f57ac5d5491]))) 0-xdr: XDR decoding failed
[2012-01-23 16:41:54.936853] E [client3_1-fops.c:1263:client3_1_finodelk_cbk] 0-mirror-client-1: XDR decoding failed
[2012-01-23 16:41:54.936866] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-mirror-client-1: remote operation failed: Invalid argument
[2012-01-23 16:41:54.936941] I [client-handshake.c:917:client_setvolume_cbk] 0-mirror-client-1: Connected to 127.0.0.1:24010, attached to remo
te volume '/mnt/sda8/export2'.
[2012-01-23 16:41:54.936961] I [afr-common.c:3473:afr_notify] 0-mirror-replicate-0: subvol 1 came up, start crawl
[2012-01-23 16:41:54.937017] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-mirror-client-1: remote operation failed: Invalid argument
[2012-01-23 16:41:54.937034] E [afr-lk-common.c:567:afr_unlock_inodelk_cbk] 0-mirror-replicate-0: (null): unlock failed on 1, reason: Invalid 
argument
[2012-01-23 16:41:54.937104] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-mirror-client-1: remote operation failed: Invalid argument
[2012-01-23 16:41:54.937120] E [afr-lk-common.c:567:afr_unlock_inodelk_cbk] 0-mirror-replicate-0: (null): unlock failed on 1, reason: Invalid 
argument
[2012-01-23 16:41:54.937170] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-mirror-client-1: remote operation failed: Invalid argument
[2012-01-23 16:41:54.937187] E [afr-lk-common.c:567:afr_unlock_inodelk_cbk] 0-mirror-replicate-0: (null): unlock failed on 1, reason: Invalid 
argument
[2012-01-23 16:41:54.937264] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-mirror-client-1: remote operation failed: Invalid argument
[2012-01-23 16:41:54.937282] E [afr-lk-common.c:567:afr_unlock_inodelk_cbk] 0-mirror-replicate-0: (null): unlock failed on 1, reason: Invalid 
argument
[2012-01-23 16:41:54.937337] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-mirror-client-1: remote operation failed: Invalid argument
:


Expected results:


Additional info:

dmesg says this:
[25283.748598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[25283.748601] glusterfsd      D 0000000000000001     0  5971      1 0x00000000
[25283.748606]  ffff8800af3d5468 0000000000000086 ffff8800af3d5fd8 ffff8800af3d4000
[25283.748610]  0000000000013d00 ffff880094aa5f38 ffff8800af3d5fd8 0000000000013d00
[25283.748614]  ffff880137b38000 ffff880094aa5b80 ffff8800bac876e0 ffff8800baa53d00
[25283.748618] Call Trace:
[25283.748634]  [<ffffffffa05fe6b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
[25283.748639]  [<ffffffff815c0980>] io_schedule+0x70/0xc0
[25283.748647]  [<ffffffffa05fe6be>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
[25283.748651]  [<ffffffff815c12ff>] __wait_on_bit+0x5f/0x90
[25283.748658]  [<ffffffffa05fe6b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
[25283.748662]  [<ffffffff815c13ac>] out_of_line_wait_on_bit+0x7c/0x90
[25283.748666]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
[25283.748670]  [<ffffffff81124507>] ? vma_prio_tree_next+0x47/0x70
[25283.748678]  [<ffffffffa05fea66>] nfs_wait_on_request+0x36/0x40 [nfs]
[25283.748686]  [<ffffffffa0603038>] nfs_find_and_lock_request+0x88/0xb0 [nfs]
[25283.748694]  [<ffffffffa0604e51>] nfs_migrate_page+0x41/0x100 [nfs]
[25283.748699]  [<ffffffff8115957a>] move_to_new_page+0x8a/0x120
[25283.748703]  [<ffffffff81159aa6>] unmap_and_move+0x2f6/0x350
[25283.748707]  [<ffffffff8114e7b0>] ? compaction_alloc+0x0/0x70
[25283.748710]  [<ffffffff81159cf2>] migrate_pages+0xa2/0x150
[25283.748714]  [<ffffffff8114e984>] compact_zone+0xa4/0x220
[25283.748717]  [<ffffffff8114eccc>] compact_zone_order+0xac/0xf0
[25283.748721]  [<ffffffff8111344e>] ? prep_new_page+0x18e/0x1b0
[25283.748725]  [<ffffffff8114eddd>] try_to_compact_pages+0xcd/0x100
[25283.748728]  [<ffffffff811140ff>] __alloc_pages_direct_compact+0xcf/0x190
[25283.748732]  [<ffffffff811145fd>] __alloc_pages_nodemask+0x43d/0x840
[25283.748736]  [<ffffffff81149f45>] alloc_pages_current+0xa5/0x110
[25283.748739]  [<ffffffff81153695>] new_slab+0x1f5/0x290
[25283.748742]  [<ffffffff81155132>] __slab_alloc+0x1b2/0x390
[25283.748747]  [<ffffffff81513321>] ? sk_stream_alloc_skb+0x41/0x130
[25283.748750]  [<ffffffff811584eb>] __kmalloc_node_track_caller+0x9b/0x1a0
[25283.748754]  [<ffffffff81513321>] ? sk_stream_alloc_skb+0x41/0x130
[25283.748758]  [<ffffffff814c5473>] __alloc_skb+0x83/0x170
[25283.748762]  [<ffffffff81513321>] sk_stream_alloc_skb+0x41/0x130
[25283.748766]  [<ffffffff815137ec>] tcp_sendmsg+0x3dc/0xc80
[25283.748770]  [<ffffffff81057310>] ? reweight_entity+0x150/0x160
[25283.748774]  [<ffffffff81538d04>] inet_sendmsg+0x64/0xb0
[25283.748779]  [<ffffffff812ad947>] ? apparmor_socket_sendmsg+0x17/0x20
[25283.748782]  [<ffffffff814bab4e>] sock_aio_write+0x14e/0x160
[25283.748785]  [<ffffffff812e0f7e>] ? rb_erase+0xce/0x150
[25283.748788]  [<ffffffff814baa00>] ? sock_aio_write+0x0/0x160
[25283.748792]  [<ffffffff81165383>] do_sync_readv_writev+0xd3/0x110
[25283.748795]  [<ffffffff8109b2e4>] ? futex_wait+0x204/0x300
[25283.748800]  [<ffffffff812ae7e8>] ? apparmor_file_permission+0x18/0x20
[25283.748804]  [<ffffffff81279bac>] ? security_file_permission+0x2c/0xb0
[25283.748807]  [<ffffffff81164ab1>] ? rw_verify_area+0x61/0xf0
[25283.748810]  [<ffffffff81165654>] do_readv_writev+0xd4/0x1e0
[25283.748814]  [<ffffffff8109ce7b>] ? do_futex+0xbb/0x210
[25283.748817]  [<ffffffff811657a8>] vfs_writev+0x48/0x60
[25283.748820]  [<ffffffff81165931>] sys_writev+0x51/0xc0
[25283.748824]  [<ffffffff8100c002>] system_call_fastpath+0x16/0x1b

Comment 1 Raghavendra Bhat 2012-01-23 12:13:13 UTC
Created attachment 556947 [details]
statedump of nfs server process

Comment 2 Raghavendra Bhat 2012-01-23 12:13:57 UTC
Created attachment 556951 [details]
program of fs-perf-test tool which caused this issue

Comment 3 Rajesh 2012-03-15 10:45:50 UTC
Hi johnny, is it consistently reproducible?

Comment 4 Rajesh 2012-04-04 11:15:42 UTC
this looks like a bug in afr. reassigning to pranith.

Comment 5 Pranith Kumar K 2012-05-25 10:31:17 UTC
This bug is never recreated. Removing it from the blockers list.

Comment 6 Pranith Kumar K 2012-06-13 09:54:28 UTC
I dont see any problem with the afr [f]inodelk code at the moment and it never happened after the very first time. Closing with works for me.


Note You need to log in before you can comment on or make changes to this bug.