Bug 765130 (GLUSTER-3398)

Summary: crash due to pending frames
Product: [Community] GlusterFS Reporter: M S Vishwanath Bhat <vbhat>
Component: coreAssignee: Venky Shankar <vshankar>
Status: CLOSED DUPLICATE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: pre-releaseCC: gluster-bugs, mzywusko, pkarampu
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description M S Vishwanath Bhat 2011-08-11 15:27:52 UTC
While running a map reduce application, mount point got hung. I have pasted the client log below.


[2011-08-11 00:14:17.495328] I [client-handshake.c:916:client_setvolume_cbk] 0-hosdu-client-4: Connected to 10.1.11.33:24009, attached to remote volume '/data/brick'.
[2011-08-11 00:14:17.495365] I [client-handshake.c:916:client_setvolume_cbk] 0-hosdu-client-6: Connected to 10.1.11.35:24009, attached to remote volume '/data/brick'.
[2011-08-11 00:14:17.495381] I [afr-common.c:3291:afr_notify] 0-hosdu-replicate-3: Subvolume 'hosdu-client-6' came back up; going online.
[2011-08-11 00:14:17.495404] I [client-handshake.c:916:client_setvolume_cbk] 0-hosdu-client-7: Connected to 10.1.11.36:24009, attached to remote volume '/data/brick'.
[2011-08-11 00:14:17.499702] I [client-handshake.c:916:client_setvolume_cbk] 0-hosdu-client-1: Connected to 10.1.11.22:24009, attached to remote volume '/data/brick'.
[2011-08-11 00:14:17.505562] I [fuse-bridge.c:3336:fuse_graph_setup] 0-fuse: switched to graph 0
[2011-08-11 00:14:17.505757] I [fuse-bridge.c:2924:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2011-08-11 00:14:17.507336] I [afr-common.c:1632:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-0: added root inode
[2011-08-11 00:14:17.507443] I [afr-common.c:1632:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-1: added root inode
[2011-08-11 00:14:17.510505] I [afr-common.c:1632:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-2: added root inode
[2011-08-11 00:14:17.510565] I [afr-common.c:1632:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-3: added root inode
[2011-08-11 06:59:13.127159] W [socket.c:205:__socket_rwv] 0-hosdu-client-4: readv failed (Connection timed out)
[2011-08-11 06:59:13.141706] W [socket.c:1511:__socket_proto_state_machine] 0-hosdu-client-4: reading from socket failed. Error (Connection timed out), peer (10.1.11.33:24009)
[2011-08-11 06:59:13.141758] I [client.c:1883:client_rpc_notify] 0-hosdu-client-4: disconnected
[2011-08-11 06:59:35.277040] E [socket.c:1714:socket_connect_finish] 0-hosdu-client-4: connection to 10.1.11.33:24009 failed (No route to host)
[2011-08-11 07:39:31.608399] I [client3_1-fops.c:547:client3_1_rmdir_cbk] 0-hosdu-client-5: remote operation failed: Directory not empty
[2011-08-11 07:39:31.615043] I [client3_1-fops.c:289:client3_1_mkdir_cbk] 0-hosdu-client-6: remote operation failed: File exists
[2011-08-11 07:39:31.616316] I [client3_1-fops.c:289:client3_1_mkdir_cbk] 0-hosdu-client-5: remote operation failed: File exists
[2011-08-11 07:39:31.616516] E [afr-common.c:379:afr_inode_set_read_ctx] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fc6ffefc6a5] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_mkdir_cbk+0x8f0) [0x7fc6fa890400] (-->/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0x11c) [0x7fc6fa61aa5c]))) 0-: Assertion failed: read_child >= 0
[2011-08-11 07:39:31.628356] I [client3_1-fops.c:289:client3_1_mkdir_cbk] 0-hosdu-client-7: remote operation failed: File exists
[2011-08-11 07:39:31.628407] E [afr-common.c:379:afr_inode_set_read_ctx] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fc6ffefc6a5] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_mkdir_cbk+0x8f0) [0x7fc6fa890400] (-->/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0x11c) [0x7fc6fa61aa5c]))) 0-: Assertion failed: read_child >= 0
[2011-08-11 07:44:06.463144] I [afr-common.c:1135:afr_launch_self_heal] 0-hosdu-replicate-3: background  entry self-heal triggered. path: /tmp/hadoop-root/mapred/system
[2011-08-11 07:44:06.465936] I [afr-self-heal-common.c:1992:afr_self_heal_completion_cbk] 0-hosdu-replicate-3: background  entry self-heal completed on /tmp/hadoop-root/mapred/system
[2011-08-11 07:44:06.652971] W [client3_1-fops.c:4350:client3_1_getxattr] 0-hosdu-client-4: failed to send the fop: Transport endpoint is not connected
[2011-08-11 07:44:06.655843] W [client3_1-fops.c:4350:client3_1_getxattr] 0-hosdu-client-4: failed to send the fop: Transport endpoint is not connected
[2011-08-11 07:44:28.230041] I [afr-common.c:1135:afr_launch_self_heal] 0-hosdu-replicate-3: background  entry self-heal triggered. path: /tmp/hadoop-root/mapred/system
[2011-08-11 07:44:28.232255] I [afr-self-heal-common.c:1992:afr_self_heal_completion_cbk] 0-hosdu-replicate-3: background  entry self-heal completed on /tmp/hadoop-root/mapred/system
[2011-08-11 07:44:28.259581] I [client3_1-fops.c:547:client3_1_rmdir_cbk] 0-hosdu-client-5: remote operation failed: Directory not empty
[2011-08-11 07:44:28.264018] I [client3_1-fops.c:289:client3_1_mkdir_cbk] 
0-hosdu-client-6: remote operation failed: File exists
[2011-08-11 07:44:28.264072] I [client3_1-fops.c:289:client3_1_mkdir_cbk] 0-hosdu-client-7: remote operation failed: File exists
[2011-08-11 07:44:28.264151] E [afr-common.c:379:afr_inode_set_read_ctx] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fc6ffefc6a5] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_mkdir_cbk+0x8f0) [0x7fc6fa890400] (-->/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0x11c) [0x7fc6fa61aa5c]))) 0-: Assertion failed: read_child >= 0
[2011-08-11 07:44:28.266214] I [client3_1-fops.c:289:client3_1_mkdir_cbk] 0-hosdu-client-5: remote operation failed: File exists
[2011-08-11 07:44:28.266263] E [afr-common.c:379:afr_inode_set_read_ctx] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fc6ffefc6a5] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_mkdir_cbk+0x8f0) [0x7fc6fa890400] (-->/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0x11c) [0x7fc6fa61aa5c]))) 0-: Assertion failed: read_child >= 0
[2011-08-11 07:44:29.85593] W [client3_1-fops.c:4350:client3_1_getxattr] 0-hosdu-client-4: failed to send the fop: Transport endpoint is not connected
pending frames:
frame : type(1) op(GETXATTR)
frame : type(1) op(GETXATTR)

patchset: git://git.gluster.com/glusterfs.git
signal received: 6

Comment 1 M S Vishwanath Bhat 2011-08-11 16:08:37 UTC
After the above incident I killed the client process and remounted it. Not it's not allowing me to delete a directory. rm -rf on a empty directory says "directory not empty". Following error was seen in the client log. 

[2011-08-11 08:42:40.499847] I [afr-common.c:1135:afr_launch_self_heal] 0-hosdu-replicate-3: background  entry self-heal triggered. path: /PiEstimator_TMP_3_141592654/out/_temporary
[2011-08-11 08:42:40.502585] I [afr-self-heal-common.c:1992:afr_self_heal_completion_cbk] 0-hosdu-replicate-3: background  entry self-heal completed on /PiEstimator_TMP_3_141592654/out/_temporary
[2011-08-11 08:42:40.544005] I [client3_1-fops.c:547:client3_1_rmdir_cbk] 0-hosdu-client-5: remote operation failed: Directory not empty
[2011-08-11 08:42:40.549590] I [client3_1-fops.c:289:client3_1_mkdir_cbk] 0-hosdu-client-6: remote operation failed: File exists
[2011-08-11 08:42:40.549889] I [client3_1-fops.c:289:client3_1_mkdir_cbk] 0-hosdu-client-7: remote operation failed: File exists
[2011-08-11 08:42:40.550441] E [afr-common.c:379:afr_inode_set_read_ctx] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f34bf5476a5] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_mkdir_cbk+0x8f0) [0x7f34b9edb400] (-->/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0x11c) [0x7f34b9c65a5c]))) 0-: Assertion failed: read_child >= 0
[2011-08-11 08:42:40.554188] I [client3_1-fops.c:289:client3_1_mkdir_cbk] 0-hosdu-client-5: remote operation failed: File exists
[2011-08-11 08:42:40.554481] E [afr-common.c:379:afr_inode_set_read_ctx] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f34bf5476a5] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_mkdir_cbk+0x8f0) [0x7f34b9edb400] (-->/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0x11c) [0x7f34b9c65a5c]))) 0-: Assertion failed: read_child >= 0

Comment 2 Pranith Kumar K 2011-08-13 00:19:20 UTC
(In reply to comment #1)
> After the above incident I killed the client process and remounted it. Not it's
> not allowing me to delete a directory. rm -rf on a empty directory says
> "directory not empty". Following error was seen in the client log. 
> 
> [2011-08-11 08:42:40.499847] I [afr-common.c:1135:afr_launch_self_heal]
> 0-hosdu-replicate-3: background  entry self-heal triggered. path:
> /PiEstimator_TMP_3_141592654/out/_temporary
> [2011-08-11 08:42:40.502585] I
> [afr-self-heal-common.c:1992:afr_self_heal_completion_cbk] 0-hosdu-replicate-3:
> background  entry self-heal completed on
> /PiEstimator_TMP_3_141592654/out/_temporary
> [2011-08-11 08:42:40.544005] I [client3_1-fops.c:547:client3_1_rmdir_cbk]
> 0-hosdu-client-5: remote operation failed: Directory not empty
> [2011-08-11 08:42:40.549590] I [client3_1-fops.c:289:client3_1_mkdir_cbk]
> 0-hosdu-client-6: remote operation failed: File exists
> [2011-08-11 08:42:40.549889] I [client3_1-fops.c:289:client3_1_mkdir_cbk]
> 0-hosdu-client-7: remote operation failed: File exists
> [2011-08-11 08:42:40.550441] E [afr-common.c:379:afr_inode_set_read_ctx]
> (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f34bf5476a5]
> (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_mkdir_cbk+0x8f0)
> [0x7f34b9edb400]
> (-->/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0x11c)
> [0x7f34b9c65a5c]))) 0-: Assertion failed: read_child >= 0
> [2011-08-11 08:42:40.554188] I [client3_1-fops.c:289:client3_1_mkdir_cbk]
> 0-hosdu-client-5: remote operation failed: File exists
> [2011-08-11 08:42:40.554481] E [afr-common.c:379:afr_inode_set_read_ctx]
> (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f34bf5476a5]
> (-->/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_mkdir_cbk+0x8f0)
> [0x7f34b9edb400]
> (-->/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0x11c)
> [0x7f34b9c65a5c]))) 0-: Assertion failed: read_child >= 0

Johnny already sent a patch for this long ago, I think its not been accepted.
http://review.gluster.com/#change,104

The patch above needs to be merged

Pranith

Comment 3 Pranith Kumar K 2011-08-14 11:31:30 UTC

*** This bug has been marked as a duplicate of bug 3251 ***