Bug 801996

Summary: XDR decoding failure
Product: [Community] GlusterFS Reporter: Shwetha Panduranga <shwetha.h.panduranga>
Component: coreAssignee: Amar Tumballi <amarts>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: amarts, gluster-bugs, spandura, vbellur, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-21 09:32:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
brick log form which XDR decoding failure got reported
none
Mount log file none

Description Shwetha Panduranga 2012-03-10 05:40:27 UTC
Description of problem:
ping_pong test failed with error "Invalid argument, No such file or directory" when one of the bricks in a distribute-replicate volume came live. 

The following is the client log message:-
----------------------------------------
[2012-03-09 16:10:18.691597] D [client-handshake.c:193:client_start_ping] 1-dstore-client-3: returning as transport is already disconnected OR there are no frames (2 ||
 1)
[2012-03-09 16:10:18.693439] I [client-handshake.c:1533:select_server_supported_programs] 1-dstore-client-3: Using Program GlusterFS 3.3.0qa26, Num (1298437), Version (
330)
[2012-03-09 16:10:18.693535] D [client-handshake.c:193:client_start_ping] 1-dstore-client-3: returning as transport is already disconnected OR there are no frames (2 ||
 1)
[2012-03-09 16:10:18.695223] D [afr-lk-common.c:1427:afr_nonblocking_inodelk] 1-dstore-replicate-1: attempting data lock range 1 1 by ffff62ffff7f0000
[2012-03-09 16:10:18.696461] E [client3_1-fops.c:1764:client3_1_lk_cbk] 1-dstore-client-3: XDR decoding failed
[2012-03-09 16:10:18.696505] W [client3_1-fops.c:1794:client3_1_lk_cbk] 1-dstore-client-3: remote operation failed: Invalid argument
[2012-03-09 16:10:18.696535] W [fuse-bridge.c:3162:fuse_setlk_cbk] 0-glusterfs-fuse: 194469: ERR => -1 (Invalid argument)
[2012-03-09 16:10:18.697236] D [client-handshake.c:193:client_start_ping] 1-dstore-client-3: returning as transport is already disconnected OR there are no frames (2 || 1)
[2012-03-09 16:10:18.698479] I [client-handshake.c:1308:client_setvolume_cbk] 1-dstore-client-3: clnt-lk-version = 59, server-lk-version = 0
[2012-03-09 16:10:18.698519] I [client-handshake.c:1334:client_setvolume_cbk] 1-dstore-client-3: Connected to 192.168.2.35:24010, attached to remote volume '/export2'.
[2012-03-09 16:10:18.698543] D [client-handshake.c:1195:client_post_handshake] 1-dstore-client-3: no fds to open - notifying all parents child up
[2012-03-09 16:10:18.698564] D [client-handshake.c:452:client_set_lk_version] 1-dstore-client-3: Sending SET_LK_VERSION
[2012-03-09 16:10:18.698631] D [fuse-bridge.c:4113:notify] 0-fuse: got event 8 on graph 1
[2012-03-09 16:10:18.699612] D [afr-lk-common.c:406:transaction_lk_op] 1-dstore-replicate-1: lk op is for a transaction
[2012-03-09 16:10:18.699662] D [afr-lk-common.c:607:afr_unlock_inodelk] 1-dstore-replicate-1: attempting data unlock range 0 1 by ff1163ffff7f0000
[2012-03-09 16:10:18.700523] W [client3_1-fops.c:1794:client3_1_lk_cbk] 1-dstore-client-3: remote operation failed: No such file or directory
[2012-03-09 16:10:18.700574] W [fuse-bridge.c:3162:fuse_setlk_cbk] 0-glusterfs-fuse: 194472: ERR => -1 (No such file or directory)
[2012-03-09 16:10:18.702412] D [client-handshake.c:429:client_set_lk_version_cbk] 1-dstore-client-3: Server lk version = 59
[2012-03-09 16:10:18.703122] D [afr-transaction.c:1008:afr_post_nonblocking_inodelk_cbk] 1-dstore-replicate-1: Non blocking inodelks done. Proceeding to FOP
[2012-03-09 16:10:18.704353] W [client3_1-fops.c:1794:client3_1_lk_cbk] 1-dstore-client-3: remote operation failed: No such file or directory
[2012-03-09 16:10:18.704399] W [fuse-bridge.c:3162:fuse_setlk_cbk] 0-glusterfs-fuse: 194473: ERR => -1 (No such file or directory)
[2012-03-09 16:10:18.708176] W [client3_1-fops.c:1794:client3_1_lk_cbk] 1-dstore-client-3: remote operation failed: No such file or directory
[2012-03-09 16:10:18.708232] W [fuse-bridge.c:3162:fuse_setlk_cbk] 0-glusterfs-fuse: 194476: ERR => -1 (No such file or directory)

Version-Release number of selected component (if applicable):
3.3.0qa26

How reproducible:


Steps to Reproduce:
1.create a distribute-replicate volume (2 x 3, 150GB total available space). Start the mount
2.Create 2 gluster mounts and one nfs mount
3.Start dd on one of the gluster mount and nfs mount
4.start ping_pong on a file on other gluster mount
5.Bring down 2 bricks (one from each replicate pair)
6.Bring back the bricks. 

The quota limit-usage is set to 150GB. 

Actual results:
[03/09/12 - 15:54:13 root@Shwetha-Laptop gfsc1]# ping_pong -rw file1 3
read failed at 0
read failed at 1
read failed at 2
data increment = 1
lock at 0 failed! - Invalid argument
unlock at 2 failed! - No such file or directory
lock at 1 failed! - No such file or directory
unlock at 0 failed! - No such file or directory
lock at 2 failed! - No such file or directory
unlock at 1 failed! - No such file or directory
lock at 0 failed! - No such file or directory
unlock at 2 failed! - No such file or directory
lock at 1 failed! - No such file or directory
unlock at 0 failed! - No such file or directory
lock at 2 failed! - No such file or directory
unlock at 1 failed! - No such file or directory

Expected results:


Additional info:

Comment 1 Raghavendra G 2012-04-12 05:36:45 UTC
Is this bug still reproducible on master? on v3.3.0qa26, server_lk_cbk doesn't receive xdata as its argument and hence xdata will be NULL when rsp is decoded client3_1_lk_cbk. This is fixed in 9d3af972.

However, the puzzling thing is that both server_lk_cbk and client_lk_cbk use same xdr_proc to encode/decode. So, we should've seen similar log messages in server log too. But, there are no server logs attached here. I've asked for server side logs from QA.

To summarize, this bug is likely have been fixed on current master (6a995ab3300a5ee0ee79a4d7d75281a79deec96e).

regards,
Raghavendra.

Comment 2 Shwetha Panduranga 2012-04-13 05:23:48 UTC
I was able to recreate the bug on 3.3.0qa34.

client log:-
------------
[2012-04-13 16:07:50.459379] I [client-handshake.c:1632:select_server_supported_programs] 2-dstore-client-5: Using Program GlusterFS 3.3.0qa34, Num (1298437), Version (330)
[2012-04-13 16:07:50.702409] E [client3_1-fops.c:1525:client3_1_finodelk_cbk] 2-dstore-client-5: XDR decoding failed
[2012-04-13 16:07:50.702555] W [client3_1-fops.c:1539:client3_1_finodelk_cbk] 2-dstore-client-5: remote operation failed: Invalid argument
[2012-04-13 16:07:50.702648] I [afr-lk-common.c:572:afr_unlock_inodelk_cbk] 2-dstore-replicate-1: <gfid:dfee4a39-dd8b-4ba4-abfc-c192d39b9a52>/dir.4/file.1: unlock failed on 2 unlock by ace3a6fa487f0000
[2012-04-13 16:07:50.702795] E [client3_1-fops.c:1525:client3_1_finodelk_cbk] 2-dstore-client-5: XDR decoding failed
[2012-04-13 16:07:50.702842] W [client3_1-fops.c:1539:client3_1_finodelk_cbk] 2-dstore-client-5: remote operation failed: Invalid argument
[2012-04-13 16:07:50.702876] I [afr-lk-common.c:572:afr_unlock_inodelk_cbk] 2-dstore-replicate-1: <gfid:dfee4a39-dd8b-4ba4-abfc-c192d39b9a52>/dir.4/file.1: unlock failed on 2 unlock by b838a6fa487f0000
[2012-04-13 16:07:50.702979] E [client3_1-fops.c:1525:client3_1_finodelk_cbk] 2-dstore-client-5: XDR decoding failed
[2012-04-13 16:07:50.703038] W [client3_1-fops.c:1539:client3_1_finodelk_cbk] 2-dstore-client-5: remote operation failed: Invalid argument
[2012-04-13 16:07:50.703761] I [afr-lk-common.c:572:afr_unlock_inodelk_cbk] 2-dstore-replicate-1: <gfid:dfee4a39-dd8b-4ba4-abfc-c192d39b9a52>/dir.4/file.1: unlock failed on 2 unlock by f4f3a6fa487f0000
[2012-04-13 16:07:50.704372] E [client3_1-fops.c:1525:client3_1_finodelk_cbk] 2-dstore-client-5: XDR decoding failed
[2012-04-13 16:07:50.704445] W [client3_1-fops.c:1539:client3_1_finodelk_cbk] 2-dstore-client-5: remote operation failed: Invalid argument
[2012-04-13 16:07:50.704503] I [afr-lk-common.c:572:afr_unlock_inodelk_cbk] 2-dstore-replicate-1: <gfid:dfee4a39-dd8b-4ba4-abfc-c192d39b9a52>/dir.4/file.1: unlock failed on 2 unlock by 88dba6fa487f0000
[2012-04-13 16:07:50.706048] E [client3_1-fops.c:1525:client3_1_finodelk_cbk] 2-dstore-client-5: XDR decoding failed
[2012-04-13 16:07:50.706137] W [client3_1-fops.c:1539:client3_1_finodelk_cbk] 2-dstore-client-5: remote operation failed: Invalid argument
[2012-04-13 16:07:50.707292] I [client-handshake.c:1429:client_setvolume_cbk] 2-dstore-client-5: Connected to 192.168.2.37:24011, attached to remote volume '/export2/dstore2'.

Server Log:-
-----------

[2012-04-13 16:07:50.678671] E [server-helpers.c:763:server_alloc_frame] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x360) [0x7fd83044d102] (-->/usr/local
/lib/glusterfs/3.3.0qa34/xlator/protocol/server.so(server_finodelk+0xd4) [0x7fd827156f74] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/protocol/server.so(get_frame_fr
om_request+0x89) [0x7fd82713d4ff]))) 0-server: invalid argument: conn
[2012-04-13 16:07:50.682952] I [server-handshake.c:571:server_setvolume] 0-dstore-server: accepted client from 192.168.2.34:997 (version: 3.3.0qa34)
[2012-04-13 16:07:50.683696] I [server-resolve.c:481:server_resolve_fd] 0-: fd not found in context
[2012-04-13 16:07:50.683741] I [server3_1-fops.c:1320:server_flush_cbk] 0-dstore-server: 93885: FLUSH 1 (--) ==> -1 (Bad file descriptor)
[2012-04-13 16:07:50.806532] I [server3_1-fops.c:1523:server_open_cbk] 0-dstore-server: 93890: OPEN (null) (--) ==> -1 (No such file or directory)
[2012-04-13 16:07:50.822899] E [marker-quota-helper.c:230:mq_dict_set_contribution] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/debug/io-stats.so(io_stats_lookup+0x2
8c) [0x7fd827377b86] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/features/marker.so(marker_lookup+0x142) [0x7fd827595fc5] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlat
or/features/marker.so(mq_req_xattr+0x123) [0x7fd8275a11e6]))) 0-marker: invalid argument: loc->parent
[2012-04-13 16:07:50.823221] W [marker-quota.c:2047:mq_inspect_directory_xattr] 0-dstore-marker: cannot add a new contribution node
[2012-04-13 16:07:50.870252] E [marker-quota-helper.c:230:mq_dict_set_contribution] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/debug/io-stats.so(io_stats_lookup+0x2
8c) [0x7fd827377b86] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/features/marker.so(marker_lookup+0x142) [0x7fd827595fc5] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlat
or/features/marker.so(mq_req_xattr+0x123) [0x7fd8275a11e6]))) 0-marker: invalid argument: loc->parent
[2012-04-13 16:07:50.870551] W [marker-quota.c:2047:mq_inspect_directory_xattr] 0-dstore-marker: cannot add a new contribution node
[2012-04-13 16:07:51.116988] I [server3_1-fops.c:1523:server_open_cbk] 0-dstore-server: 93896: OPEN (null) (--) ==> -1 (No such file or directory)
[2012-04-13 16:07:51.127612] I [server3_1-fops.c:1320:server_flush_cbk] 0-dstore-server: 93899: FLUSH -2 (--) ==> -1 (No such file or directory)
[2012-04-13 16:07:51.331096] W [marker-quota.c:1284:mq_get_parent_inode_local] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/performance/io-threads.so(iot_inodelk_cbk+
0x158) [0x7fd8279c70a7] (-->/usr/local/lib/libglusterfs.so.0(default_inodelk_cbk+0x158) [0x7fd83068083e] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/features/marker.
so(mq_inodelk_cbk+0x1d0) [0x7fd82759d912]))) 0-dstore-marker: contribution node list is empty which is an error

Comment 3 Shwetha Panduranga 2012-04-13 05:26:05 UTC
Created attachment 577223 [details]
brick log form which XDR decoding failure got reported

Comment 4 Shwetha Panduranga 2012-04-13 05:26:29 UTC
Created attachment 577224 [details]
Mount log file

Comment 5 Vijay Bellur 2012-05-18 12:50:28 UTC
Not consistently reproducible. Removing it from the blocker list.