Description of problem: At client ls operation hangs after the first brick is killed for distributed-disperse volume Version-Release number of selected component (if applicable): glusterfs 3.6.3beta1 built on Apr 1 2015 20:11:53 Repository revision: git://git.gluster.com/glusterfs.git How reproducible: Always Steps to Reproduce: 1. Create a distributed-disperse volume Volume Name: dht-ec-vol Type: Distributed-Disperse Volume ID: 5697b709-4187-47ce-970b-b8c19d610055 Status: Started Number of Bricks: 2 x (3 + 1) = 8 Transport-type: tcp Bricks: Brick1: cn4:/home/wtb/brick/bri1 Brick2: cn4:/home/wtb/brick/bri2 Brick3: cn4:/home/wtb/brick/bri3 Brick4: cn4:/home/wtb/brick/bri4 Brick5: cn4:/home/wtb/brick/bri5 Brick6: cn4:/home/wtb/brick/bri6 Brick7: cn4:/home/wtb/brick/bri7 Brick8: cn4:/home/wtb/brick/bri8 2. mount and create some files mkdir -p /mnt/wtb/dht-ec-client mount -t glusterfs cn4:/dht-ec-vol /mnt/wtb/dht-ec-client touch /mnt/wtb/dht-ec-client/file{0..99} 3. kill the first brick execute 'gluster v status dht-ec-vol' and find the pid of Brick1 kill this process 4. ls /mnt/wtb/dht-ec-client Actual results: ls hangs on and shows nothing Expected results: file{0..99} should be displayed. Additional info:
Ls operation works well if the other brick is killed.
root@pranithk-laptop - ~ 22:22:12 :) ⚡ glusterd && gluster volume create ec2 disperse 4 redundancy 1 `hostname`:/home/gfs/ec_{0..7} force && gluster volume start ec2 && mount -t glusterfs `hostname`:/ec2 /mnt/ec2 This configuration is not optimal on most workloads. Do you want to use it ? (y/n) y Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Do you still want to continue creating the volume? (y/n) y volume create: ec2: success: please start the volume to access data volume start: ec2: success root@pranithk-laptop - ~ 22:22:23 :) ⚡ cd /mnt/ec2 root@pranithk-laptop - /mnt/ec2 22:22:25 :) ⚡ mkdir -p dht-ec-client root@pranithk-laptop - /mnt/ec2 22:22:37 :) ⚡ cd dht-ec-client root@pranithk-laptop - /mnt/ec2/dht-ec-client 22:22:38 :) ⚡ touch file{0..99} root@pranithk-laptop - /mnt/ec2/dht-ec-client 22:22:55 :) ⚡ ls -l total 0 -rw-r--r--. 1 root root 0 May 9 22:22 file0 -rw-r--r--. 1 root root 0 May 9 22:22 file1 -rw-r--r--. 1 root root 0 May 9 22:22 file10 -rw-r--r--. 1 root root 0 May 9 22:22 file11 -rw-r--r--. 1 root root 0 May 9 22:22 file12 -rw-r--r--. 1 root root 0 May 9 22:22 file13 -rw-r--r--. 1 root root 0 May 9 22:22 file14 -rw-r--r--. 1 root root 0 May 9 22:22 file15 -rw-r--r--. 1 root root 0 May 9 22:22 file16 -rw-r--r--. 1 root root 0 May 9 22:22 file17 -rw-r--r--. 1 root root 0 May 9 22:22 file18 -rw-r--r--. 1 root root 0 May 9 22:22 file19 -rw-r--r--. 1 root root 0 May 9 22:22 file2 -rw-r--r--. 1 root root 0 May 9 22:22 file20 -rw-r--r--. 1 root root 0 May 9 22:22 file21 -rw-r--r--. 1 root root 0 May 9 22:22 file22 -rw-r--r--. 1 root root 0 May 9 22:22 file23 -rw-r--r--. 1 root root 0 May 9 22:22 file24 -rw-r--r--. 1 root root 0 May 9 22:22 file25 -rw-r--r--. 1 root root 0 May 9 22:22 file26 -rw-r--r--. 1 root root 0 May 9 22:22 file27 -rw-r--r--. 1 root root 0 May 9 22:22 file28 -rw-r--r--. 1 root root 0 May 9 22:22 file29 -rw-r--r--. 1 root root 0 May 9 22:22 file3 -rw-r--r--. 1 root root 0 May 9 22:22 file30 -rw-r--r--. 1 root root 0 May 9 22:22 file31 -rw-r--r--. 1 root root 0 May 9 22:22 file32 -rw-r--r--. 1 root root 0 May 9 22:22 file33 -rw-r--r--. 1 root root 0 May 9 22:22 file34 -rw-r--r--. 1 root root 0 May 9 22:22 file35 -rw-r--r--. 1 root root 0 May 9 22:22 file36 -rw-r--r--. 1 root root 0 May 9 22:22 file37 -rw-r--r--. 1 root root 0 May 9 22:22 file38 -rw-r--r--. 1 root root 0 May 9 22:22 file39 -rw-r--r--. 1 root root 0 May 9 22:22 file4 -rw-r--r--. 1 root root 0 May 9 22:22 file40 -rw-r--r--. 1 root root 0 May 9 22:22 file41 -rw-r--r--. 1 root root 0 May 9 22:22 file42 -rw-r--r--. 1 root root 0 May 9 22:22 file43 -rw-r--r--. 1 root root 0 May 9 22:22 file44 -rw-r--r--. 1 root root 0 May 9 22:22 file45 -rw-r--r--. 1 root root 0 May 9 22:22 file46 -rw-r--r--. 1 root root 0 May 9 22:22 file47 -rw-r--r--. 1 root root 0 May 9 22:22 file48 -rw-r--r--. 1 root root 0 May 9 22:22 file49 -rw-r--r--. 1 root root 0 May 9 22:22 file5 -rw-r--r--. 1 root root 0 May 9 22:22 file50 -rw-r--r--. 1 root root 0 May 9 22:22 file51 -rw-r--r--. 1 root root 0 May 9 22:22 file52 -rw-r--r--. 1 root root 0 May 9 22:22 file53 -rw-r--r--. 1 root root 0 May 9 22:22 file54 -rw-r--r--. 1 root root 0 May 9 22:22 file55 -rw-r--r--. 1 root root 0 May 9 22:22 file56 -rw-r--r--. 1 root root 0 May 9 22:22 file57 -rw-r--r--. 1 root root 0 May 9 22:22 file58 -rw-r--r--. 1 root root 0 May 9 22:22 file59 -rw-r--r--. 1 root root 0 May 9 22:22 file6 -rw-r--r--. 1 root root 0 May 9 22:22 file60 -rw-r--r--. 1 root root 0 May 9 22:22 file61 -rw-r--r--. 1 root root 0 May 9 22:22 file62 -rw-r--r--. 1 root root 0 May 9 22:22 file63 -rw-r--r--. 1 root root 0 May 9 22:22 file64 -rw-r--r--. 1 root root 0 May 9 22:22 file65 -rw-r--r--. 1 root root 0 May 9 22:22 file66 -rw-r--r--. 1 root root 0 May 9 22:22 file67 -rw-r--r--. 1 root root 0 May 9 22:22 file68 -rw-r--r--. 1 root root 0 May 9 22:22 file69 -rw-r--r--. 1 root root 0 May 9 22:22 file7 -rw-r--r--. 1 root root 0 May 9 22:22 file70 -rw-r--r--. 1 root root 0 May 9 22:22 file71 -rw-r--r--. 1 root root 0 May 9 22:22 file72 -rw-r--r--. 1 root root 0 May 9 22:22 file73 -rw-r--r--. 1 root root 0 May 9 22:22 file74 -rw-r--r--. 1 root root 0 May 9 22:22 file75 -rw-r--r--. 1 root root 0 May 9 22:22 file76 -rw-r--r--. 1 root root 0 May 9 22:22 file77 -rw-r--r--. 1 root root 0 May 9 22:22 file78 -rw-r--r--. 1 root root 0 May 9 22:22 file79 -rw-r--r--. 1 root root 0 May 9 22:22 file8 -rw-r--r--. 1 root root 0 May 9 22:22 file80 -rw-r--r--. 1 root root 0 May 9 22:22 file81 -rw-r--r--. 1 root root 0 May 9 22:22 file82 -rw-r--r--. 1 root root 0 May 9 22:22 file83 -rw-r--r--. 1 root root 0 May 9 22:22 file84 -rw-r--r--. 1 root root 0 May 9 22:22 file85 -rw-r--r--. 1 root root 0 May 9 22:22 file86 -rw-r--r--. 1 root root 0 May 9 22:22 file87 -rw-r--r--. 1 root root 0 May 9 22:22 file88 -rw-r--r--. 1 root root 0 May 9 22:22 file89 -rw-r--r--. 1 root root 0 May 9 22:22 file9 -rw-r--r--. 1 root root 0 May 9 22:22 file90 -rw-r--r--. 1 root root 0 May 9 22:22 file91 -rw-r--r--. 1 root root 0 May 9 22:22 file92 -rw-r--r--. 1 root root 0 May 9 22:22 file93 -rw-r--r--. 1 root root 0 May 9 22:22 file94 -rw-r--r--. 1 root root 0 May 9 22:22 file95 -rw-r--r--. 1 root root 0 May 9 22:22 file96 -rw-r--r--. 1 root root 0 May 9 22:22 file97 -rw-r--r--. 1 root root 0 May 9 22:22 file98 -rw-r--r--. 1 root root 0 May 9 22:22 file99 root@pranithk-laptop - /mnt/ec2/dht-ec-client 22:22:57 :) ⚡ /home/pk1/.scripts/gfs -c k -v ec2 -k 0 Dir: /var/lib/glusterd/vols/ec2/run/ kill -9 2800 root@pranithk-laptop - /mnt/ec2/dht-ec-client 22:23:02 :) ⚡ gluster v status Status of volume: ec2 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick pranithk-laptop:/home/gfs/ec_0 N/A N 2800 Brick pranithk-laptop:/home/gfs/ec_1 49156 Y 2811 Brick pranithk-laptop:/home/gfs/ec_2 49157 Y 2822 Brick pranithk-laptop:/home/gfs/ec_3 49158 Y 2833 Brick pranithk-laptop:/home/gfs/ec_4 49159 Y 2844 Brick pranithk-laptop:/home/gfs/ec_5 49160 Y 2855 Brick pranithk-laptop:/home/gfs/ec_6 49161 Y 2866 Brick pranithk-laptop:/home/gfs/ec_7 49162 Y 2877 NFS Server on localhost 2049 Y 2890 Task Status of Volume ec2 ------------------------------------------------------------------------------ There are no active volume tasks root@pranithk-laptop - /mnt/ec2/dht-ec-client 22:23:05 :) ⚡ ls file0 file14 file2 file25 file30 file36 file41 file47 file52 file58 file63 file69 file74 file8 file85 file90 file96 file1 file15 file20 file26 file31 file37 file42 file48 file53 file59 file64 file7 file75 file80 file86 file91 file97 file10 file16 file21 file27 file32 file38 file43 file49 file54 file6 file65 file70 file76 file81 file87 file92 file98 file11 file17 file22 file28 file33 file39 file44 file5 file55 file60 file66 file71 file77 file82 file88 file93 file99 file12 file18 file23 file29 file34 file4 file45 file50 file56 file61 file67 file72 file78 file83 file89 file94 file13 file19 file24 file3 file35 file40 file46 file51 file57 file62 file68 file73 file79 file84 file9 file95 root@pranithk-laptop - /mnt/ec2/dht-ec-client 22:23:11 :) ⚡
I tested the last 3.6 commit which is 24b3db07685fcc39f08e15b1b5e5cd1262b91590, the bug still exist. The following is my test script. ------------------------------------------------ # cat tests/bugs/bug-1208564.t #!/bin/bash . $(dirname $0)/../include.rc . $(dirname $0)/../volume.rc function file_count() { ls $1 |wc -l } cleanup TEST glusterd TEST pidof glusterd #TEST mkdir -p $B0/${V0}{0,1,2,3,4,5,6,7} TEST $CLI volume create $V0 disperse 4 redundancy 1 $H0:$B0/${v0}{0,1,2,3,4,5,6,7} EXPECT "$V0" volinfo_field $V0 'Volume Name' EXPECT 'Created' volinfo_field $V0 'Status' EXPECT '8' brick_count $V0 TEST $CLI volume start $V0 EXPECT_WITHIN $PROCESS_UP_TIMEOUT 'Started' volinfo_field $V0 'Status' TEST glusterfs --entry-timeout=0 --attribute-timeout=0 -s $H0 --volfile-id $V0 $M0 TEST touch $M0/file{0..99} TEST kill_brick $V0 $H0 $B0/0 EXPECT_WITHIN $PROCESS_UP_TIMEOUT '100' file_count $M0 cleanup -------------------------------------------- It stuck at the last file_count checking. More details for my test: ---------- OS version: Red Hat Enterprise Linux Server release 6.5 (Santiago) with libc-2.12 and CentOS release 6.6 (Final) with libc-2.12 # gluster --version glusterfs 3.6.4beta1 built on May 20 2015 22:16:38 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. #gluster v info all Volume Name: patchy Type: Distributed-Disperse Volume ID: 84374ec8-1aa5-4126-908d-c458d605d905 Status: Started Number of Bricks: 2 x (3 + 1) = 8 Transport-type: tcp Bricks: Brick1: mn1:/d/backends/0 Brick2: mn1:/d/backends/1 Brick3: mn1:/d/backends/2 Brick4: mn1:/d/backends/3 Brick5: mn1:/d/backends/4 Brick6: mn1:/d/backends/5 Brick7: mn1:/d/backends/6 Brick8: mn1:/d/backends/7 # gluster v status Status of volume: patchy Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick mn1:/d/backends/0 N/A N 16557 Brick mn1:/d/backends/1 49153 Y 16568 Brick mn1:/d/backends/2 49154 Y 16579 Brick mn1:/d/backends/3 49155 Y 16590 Brick mn1:/d/backends/4 49156 Y 16601 Brick mn1:/d/backends/5 49157 Y 16612 Brick mn1:/d/backends/6 49158 Y 16623 Brick mn1:/d/backends/7 49159 Y 16634 NFS Server on localhost 2049 Y 16652 Task Status of Volume patchy ------------------------------------------------------------------------------ There are no active volume tasks #df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/35000c5005e5a3337p5 1784110888 85086720 1608389928 6% / tmpfs 16416508 4 16416504 1% /dev/shm /dev/mapper/35000c5005e5a3337p1 999320 72752 874140 8% /boot /dev/mapper/35000c5005e5a3337p2 103081248 61028 97777340 1% /test mn1:patchy 10704665216 510520320 9650339456 6% /mnt/glusterfs/0 # ls /mnt/glusterfs/0/ `ls` command hang here PS: For the master branch the test passed.
Here is some log information from /var/log/glusterfs/mnt-glusterfs-0.log, which may be helpful. [2015-05-20 14:19:10.887997] I [client-handshake.c:188:client_set_lk_version_cbk] 0-patchy-client-4: Server lk version = 1 [2015-05-20 14:19:11.684083] W [socket.c:620:__socket_rwv] 0-patchy-client-0: readv on 192.168.1.31:49152 failed (No data available) [2015-05-20 14:19:11.684193] I [client.c:2215:client_rpc_notify] 0-patchy-client-0: disconnected from patchy-client-0. Client proces s will keep trying to connect to glusterd until brick's port is available [2015-05-20 14:19:11.688660] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.689057] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.689665] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.690359] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.691292] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.693141] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.693168] E [ec-common.c:443:ec_child_select] 0-patchy-disperse-0: Insufficient available childs for this request (have 0, need 1) [2015-05-20 14:19:12.789888] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:12.790018] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1)
This bug is being closed as GlusterFS-3.6 is nearing its End-Of-Life and only important security bugs will be fixed. If you still face this bug with the newer GlusterFS versions, please open a new bug.