Description of problem: ========================= On a pure replicate volume (1 x 2) while running dd on a file from fuse mounts , one of the dd's hung. Bring log messages: =================== [2013-09-06 12:06:07.514045] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2013-09-06 12:06:07.514155] E [server-helpers.c:779:server_alloc_frame] (-->/usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x103) [0x7f480a910773] (-->/usr/lib64/libgfrpc.so.0 (rpcsvc_handle_rpc_call+0x245) [0x7f480a910625] (-->/usr/lib64/glusterfs/3.4.0.31rhs/xlator/protocol/server.so(server3_3_finodelk+0x8a) [0x7f4804c3471a]))) 0-server: in valid argument: conn Version-Release number of selected component (if applicable): =============================================================== glusterfs 3.4.0.31rhs built on Sep 5 2013 08:23:59 How reproducible: ================== Executed the case only once. Steps to Reproduce: ===================== 1. Create a replicate volume (1 x 2). Start the volume. 2. Create 4 fuse mounts. From all the mount start dd on a file: "dd if=/dev/urandom of=./test_file1 bs=1K count=20480000" 3. While dd in progress, bring down a brick. 4. Bring back brick online while dd is still in progress. Actual results: ===================== dd on all 3 mounts were successful. dd on one of the mount hung. Expected results: ================== dd should be successfully completed. Additional info: =================== root@fan [Sep-06-2013-14:42:49] >gluster v info Volume Name: vol_dis_1_rep_2 Type: Replicate Volume ID: f5c43519-b5eb-4138-8219-723c064af71c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: fan.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_rep_2_b0 Brick2: mia.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_rep_2_b1 Options Reconfigured: cluster.self-heal-daemon: on performance.write-behind: on performance.stat-prefetch: off server.allow-insecure: on root@fan [Sep-06-2013-14:42:53] > root@fan [Sep-06-2013-14:42:53] >gluster v status Status of volume: vol_dis_1_rep_2 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick fan.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_ rep_2_b0 49152 Y 29411 Brick mia.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_ rep_2_b1 49152 Y 3625 NFS Server on localhost 2049 Y 2996 Self-heal Daemon on localhost N/A Y 3006 NFS Server on mia.lab.eng.blr.redhat.com 2049 Y 3637 Self-heal Daemon on mia.lab.eng.blr.redhat.com N/A Y 3645 There are no active volume tasks root@fan [Sep-06-2013-14:43:09] >
Mount process on which dd hung ================================ root@darrel [Sep-06-2013-14:48:41] >ps -ef | grep gm4 root 18605 1 4 10:14 ? 00:13:01 /usr/sbin/glusterfs --volfile-id=/vol_dis_1_rep_2 --volfile-server=mia /mnt/gm4 root 20028 19597 0 14:48 pts/0 00:00:00 grep gm4 SOS Reports and statedumps: =========================== http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1005272/
We were able to figure out why the mount could have hanged after looking at the logs in the bug https://bugzilla.redhat.com/show_bug.cgi?id=1005272 Similar logs are present in the sosreports: [2013-03-20 06:24:48.320459] E [server-helpers.c:763:server_alloc_frame] (-->/usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x93) [0x333160a8a3] (-->/usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x293) [0x333160a733] (-->/usr/lib64/glusterfs/3.3.0.6rhs/xlator/protocol/server.so(server_finodelk+0xf8) [0x7f88bde218d8]))) 0-server: invalid argument: conn [2013-03-20 06:24:48.320563] E [server-helpers.c:763:server_alloc_frame] (-->/usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x93) [0x333160a8a3] (-->/usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x293) [0x333160a733] (-->/usr/lib64/glusterfs/3.3.0.6rhs/xlator/protocol/server.so(server_finodelk+0xf8) [0x7f88bde218d8]))) 0-server: invalid argument: conn [2013-03-20 06:24:48.320690] E [server-helpers.c:763:server_alloc_frame] (-->/usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x93) [0x333160a8a3] (-->/usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x293) [0x333160a733] (-->/usr/lib64/glusterfs/3.3.0.6rhs/xlator/protocol/server.so(server_finodelk+0xf8) [0x7f88bde218d8]))) 0-server: invalid argument: conn marking 1005272 as duplicate of this bug. *** This bug has been marked as a duplicate of bug 923809 ***