Bug 801996 - XDR decoding failure
XDR decoding failure
Status: CLOSED WORKSFORME
Product: GlusterFS
Classification: Community
Component: core (Show other bugs)
mainline
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Amar Tumballi
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-10 00:40 EST by Shwetha Panduranga
Modified: 2013-12-18 19:07 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-11-21 04:32:04 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
brick log form which XDR decoding failure got reported (3.30 MB, text/x-log)
2012-04-13 01:26 EDT, Shwetha Panduranga
no flags Details
Mount log file (3.47 MB, text/x-log)
2012-04-13 01:26 EDT, Shwetha Panduranga
no flags Details

  None (edit)
Description Shwetha Panduranga 2012-03-10 00:40:27 EST
Description of problem:
ping_pong test failed with error "Invalid argument, No such file or directory" when one of the bricks in a distribute-replicate volume came live. 

The following is the client log message:-
----------------------------------------
[2012-03-09 16:10:18.691597] D [client-handshake.c:193:client_start_ping] 1-dstore-client-3: returning as transport is already disconnected OR there are no frames (2 ||
 1)
[2012-03-09 16:10:18.693439] I [client-handshake.c:1533:select_server_supported_programs] 1-dstore-client-3: Using Program GlusterFS 3.3.0qa26, Num (1298437), Version (
330)
[2012-03-09 16:10:18.693535] D [client-handshake.c:193:client_start_ping] 1-dstore-client-3: returning as transport is already disconnected OR there are no frames (2 ||
 1)
[2012-03-09 16:10:18.695223] D [afr-lk-common.c:1427:afr_nonblocking_inodelk] 1-dstore-replicate-1: attempting data lock range 1 1 by ffff62ffff7f0000
[2012-03-09 16:10:18.696461] E [client3_1-fops.c:1764:client3_1_lk_cbk] 1-dstore-client-3: XDR decoding failed
[2012-03-09 16:10:18.696505] W [client3_1-fops.c:1794:client3_1_lk_cbk] 1-dstore-client-3: remote operation failed: Invalid argument
[2012-03-09 16:10:18.696535] W [fuse-bridge.c:3162:fuse_setlk_cbk] 0-glusterfs-fuse: 194469: ERR => -1 (Invalid argument)
[2012-03-09 16:10:18.697236] D [client-handshake.c:193:client_start_ping] 1-dstore-client-3: returning as transport is already disconnected OR there are no frames (2 || 1)
[2012-03-09 16:10:18.698479] I [client-handshake.c:1308:client_setvolume_cbk] 1-dstore-client-3: clnt-lk-version = 59, server-lk-version = 0
[2012-03-09 16:10:18.698519] I [client-handshake.c:1334:client_setvolume_cbk] 1-dstore-client-3: Connected to 192.168.2.35:24010, attached to remote volume '/export2'.
[2012-03-09 16:10:18.698543] D [client-handshake.c:1195:client_post_handshake] 1-dstore-client-3: no fds to open - notifying all parents child up
[2012-03-09 16:10:18.698564] D [client-handshake.c:452:client_set_lk_version] 1-dstore-client-3: Sending SET_LK_VERSION
[2012-03-09 16:10:18.698631] D [fuse-bridge.c:4113:notify] 0-fuse: got event 8 on graph 1
[2012-03-09 16:10:18.699612] D [afr-lk-common.c:406:transaction_lk_op] 1-dstore-replicate-1: lk op is for a transaction
[2012-03-09 16:10:18.699662] D [afr-lk-common.c:607:afr_unlock_inodelk] 1-dstore-replicate-1: attempting data unlock range 0 1 by ff1163ffff7f0000
[2012-03-09 16:10:18.700523] W [client3_1-fops.c:1794:client3_1_lk_cbk] 1-dstore-client-3: remote operation failed: No such file or directory
[2012-03-09 16:10:18.700574] W [fuse-bridge.c:3162:fuse_setlk_cbk] 0-glusterfs-fuse: 194472: ERR => -1 (No such file or directory)
[2012-03-09 16:10:18.702412] D [client-handshake.c:429:client_set_lk_version_cbk] 1-dstore-client-3: Server lk version = 59
[2012-03-09 16:10:18.703122] D [afr-transaction.c:1008:afr_post_nonblocking_inodelk_cbk] 1-dstore-replicate-1: Non blocking inodelks done. Proceeding to FOP
[2012-03-09 16:10:18.704353] W [client3_1-fops.c:1794:client3_1_lk_cbk] 1-dstore-client-3: remote operation failed: No such file or directory
[2012-03-09 16:10:18.704399] W [fuse-bridge.c:3162:fuse_setlk_cbk] 0-glusterfs-fuse: 194473: ERR => -1 (No such file or directory)
[2012-03-09 16:10:18.708176] W [client3_1-fops.c:1794:client3_1_lk_cbk] 1-dstore-client-3: remote operation failed: No such file or directory
[2012-03-09 16:10:18.708232] W [fuse-bridge.c:3162:fuse_setlk_cbk] 0-glusterfs-fuse: 194476: ERR => -1 (No such file or directory)

Version-Release number of selected component (if applicable):
3.3.0qa26

How reproducible:


Steps to Reproduce:
1.create a distribute-replicate volume (2 x 3, 150GB total available space). Start the mount
2.Create 2 gluster mounts and one nfs mount
3.Start dd on one of the gluster mount and nfs mount
4.start ping_pong on a file on other gluster mount
5.Bring down 2 bricks (one from each replicate pair)
6.Bring back the bricks. 

The quota limit-usage is set to 150GB. 

Actual results:
[03/09/12 - 15:54:13 root@Shwetha-Laptop gfsc1]# ping_pong -rw file1 3
read failed at 0
read failed at 1
read failed at 2
data increment = 1
lock at 0 failed! - Invalid argument
unlock at 2 failed! - No such file or directory
lock at 1 failed! - No such file or directory
unlock at 0 failed! - No such file or directory
lock at 2 failed! - No such file or directory
unlock at 1 failed! - No such file or directory
lock at 0 failed! - No such file or directory
unlock at 2 failed! - No such file or directory
lock at 1 failed! - No such file or directory
unlock at 0 failed! - No such file or directory
lock at 2 failed! - No such file or directory
unlock at 1 failed! - No such file or directory

Expected results:


Additional info:
Comment 1 Raghavendra G 2012-04-12 01:36:45 EDT
Is this bug still reproducible on master? on v3.3.0qa26, server_lk_cbk doesn't receive xdata as its argument and hence xdata will be NULL when rsp is decoded client3_1_lk_cbk. This is fixed in 9d3af972.

However, the puzzling thing is that both server_lk_cbk and client_lk_cbk use same xdr_proc to encode/decode. So, we should've seen similar log messages in server log too. But, there are no server logs attached here. I've asked for server side logs from QA.

To summarize, this bug is likely have been fixed on current master (6a995ab3300a5ee0ee79a4d7d75281a79deec96e).

regards,
Raghavendra.
Comment 2 Shwetha Panduranga 2012-04-13 01:23:48 EDT
I was able to recreate the bug on 3.3.0qa34.

client log:-
------------
[2012-04-13 16:07:50.459379] I [client-handshake.c:1632:select_server_supported_programs] 2-dstore-client-5: Using Program GlusterFS 3.3.0qa34, Num (1298437), Version (330)
[2012-04-13 16:07:50.702409] E [client3_1-fops.c:1525:client3_1_finodelk_cbk] 2-dstore-client-5: XDR decoding failed
[2012-04-13 16:07:50.702555] W [client3_1-fops.c:1539:client3_1_finodelk_cbk] 2-dstore-client-5: remote operation failed: Invalid argument
[2012-04-13 16:07:50.702648] I [afr-lk-common.c:572:afr_unlock_inodelk_cbk] 2-dstore-replicate-1: <gfid:dfee4a39-dd8b-4ba4-abfc-c192d39b9a52>/dir.4/file.1: unlock failed on 2 unlock by ace3a6fa487f0000
[2012-04-13 16:07:50.702795] E [client3_1-fops.c:1525:client3_1_finodelk_cbk] 2-dstore-client-5: XDR decoding failed
[2012-04-13 16:07:50.702842] W [client3_1-fops.c:1539:client3_1_finodelk_cbk] 2-dstore-client-5: remote operation failed: Invalid argument
[2012-04-13 16:07:50.702876] I [afr-lk-common.c:572:afr_unlock_inodelk_cbk] 2-dstore-replicate-1: <gfid:dfee4a39-dd8b-4ba4-abfc-c192d39b9a52>/dir.4/file.1: unlock failed on 2 unlock by b838a6fa487f0000
[2012-04-13 16:07:50.702979] E [client3_1-fops.c:1525:client3_1_finodelk_cbk] 2-dstore-client-5: XDR decoding failed
[2012-04-13 16:07:50.703038] W [client3_1-fops.c:1539:client3_1_finodelk_cbk] 2-dstore-client-5: remote operation failed: Invalid argument
[2012-04-13 16:07:50.703761] I [afr-lk-common.c:572:afr_unlock_inodelk_cbk] 2-dstore-replicate-1: <gfid:dfee4a39-dd8b-4ba4-abfc-c192d39b9a52>/dir.4/file.1: unlock failed on 2 unlock by f4f3a6fa487f0000
[2012-04-13 16:07:50.704372] E [client3_1-fops.c:1525:client3_1_finodelk_cbk] 2-dstore-client-5: XDR decoding failed
[2012-04-13 16:07:50.704445] W [client3_1-fops.c:1539:client3_1_finodelk_cbk] 2-dstore-client-5: remote operation failed: Invalid argument
[2012-04-13 16:07:50.704503] I [afr-lk-common.c:572:afr_unlock_inodelk_cbk] 2-dstore-replicate-1: <gfid:dfee4a39-dd8b-4ba4-abfc-c192d39b9a52>/dir.4/file.1: unlock failed on 2 unlock by 88dba6fa487f0000
[2012-04-13 16:07:50.706048] E [client3_1-fops.c:1525:client3_1_finodelk_cbk] 2-dstore-client-5: XDR decoding failed
[2012-04-13 16:07:50.706137] W [client3_1-fops.c:1539:client3_1_finodelk_cbk] 2-dstore-client-5: remote operation failed: Invalid argument
[2012-04-13 16:07:50.707292] I [client-handshake.c:1429:client_setvolume_cbk] 2-dstore-client-5: Connected to 192.168.2.37:24011, attached to remote volume '/export2/dstore2'.

Server Log:-
-----------

[2012-04-13 16:07:50.678671] E [server-helpers.c:763:server_alloc_frame] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x360) [0x7fd83044d102] (-->/usr/local
/lib/glusterfs/3.3.0qa34/xlator/protocol/server.so(server_finodelk+0xd4) [0x7fd827156f74] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/protocol/server.so(get_frame_fr
om_request+0x89) [0x7fd82713d4ff]))) 0-server: invalid argument: conn
[2012-04-13 16:07:50.682952] I [server-handshake.c:571:server_setvolume] 0-dstore-server: accepted client from 192.168.2.34:997 (version: 3.3.0qa34)
[2012-04-13 16:07:50.683696] I [server-resolve.c:481:server_resolve_fd] 0-: fd not found in context
[2012-04-13 16:07:50.683741] I [server3_1-fops.c:1320:server_flush_cbk] 0-dstore-server: 93885: FLUSH 1 (--) ==> -1 (Bad file descriptor)
[2012-04-13 16:07:50.806532] I [server3_1-fops.c:1523:server_open_cbk] 0-dstore-server: 93890: OPEN (null) (--) ==> -1 (No such file or directory)
[2012-04-13 16:07:50.822899] E [marker-quota-helper.c:230:mq_dict_set_contribution] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/debug/io-stats.so(io_stats_lookup+0x2
8c) [0x7fd827377b86] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/features/marker.so(marker_lookup+0x142) [0x7fd827595fc5] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlat
or/features/marker.so(mq_req_xattr+0x123) [0x7fd8275a11e6]))) 0-marker: invalid argument: loc->parent
[2012-04-13 16:07:50.823221] W [marker-quota.c:2047:mq_inspect_directory_xattr] 0-dstore-marker: cannot add a new contribution node
[2012-04-13 16:07:50.870252] E [marker-quota-helper.c:230:mq_dict_set_contribution] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/debug/io-stats.so(io_stats_lookup+0x2
8c) [0x7fd827377b86] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/features/marker.so(marker_lookup+0x142) [0x7fd827595fc5] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlat
or/features/marker.so(mq_req_xattr+0x123) [0x7fd8275a11e6]))) 0-marker: invalid argument: loc->parent
[2012-04-13 16:07:50.870551] W [marker-quota.c:2047:mq_inspect_directory_xattr] 0-dstore-marker: cannot add a new contribution node
[2012-04-13 16:07:51.116988] I [server3_1-fops.c:1523:server_open_cbk] 0-dstore-server: 93896: OPEN (null) (--) ==> -1 (No such file or directory)
[2012-04-13 16:07:51.127612] I [server3_1-fops.c:1320:server_flush_cbk] 0-dstore-server: 93899: FLUSH -2 (--) ==> -1 (No such file or directory)
[2012-04-13 16:07:51.331096] W [marker-quota.c:1284:mq_get_parent_inode_local] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/performance/io-threads.so(iot_inodelk_cbk+
0x158) [0x7fd8279c70a7] (-->/usr/local/lib/libglusterfs.so.0(default_inodelk_cbk+0x158) [0x7fd83068083e] (-->/usr/local/lib/glusterfs/3.3.0qa34/xlator/features/marker.
so(mq_inodelk_cbk+0x1d0) [0x7fd82759d912]))) 0-dstore-marker: contribution node list is empty which is an error
Comment 3 Shwetha Panduranga 2012-04-13 01:26:05 EDT
Created attachment 577223 [details]
brick log form which XDR decoding failure got reported
Comment 4 Shwetha Panduranga 2012-04-13 01:26:29 EDT
Created attachment 577224 [details]
Mount log file
Comment 5 Vijay Bellur 2012-05-18 08:50:28 EDT
Not consistently reproducible. Removing it from the blocker list.

Note You need to log in before you can comment on or make changes to this bug.