Bug 1208564
Summary: | ls operation hangs after the first brick is killed for distributed-disperse volume | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Fang Huang <fanghuang.data> |
Component: | disperse | Assignee: | Pranith Kumar K <pkarampu> |
Status: | CLOSED EOL | QA Contact: | |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.6.2 | CC: | bugs, kaushal, rabhat |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 3.6.3 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-08-16 13:08:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Fang Huang
2015-04-02 14:19:44 UTC
Ls operation works well if the other brick is killed. root@pranithk-laptop - ~ 22:22:12 :) ⚡ glusterd && gluster volume create ec2 disperse 4 redundancy 1 `hostname`:/home/gfs/ec_{0..7} force && gluster volume start ec2 && mount -t glusterfs `hostname`:/ec2 /mnt/ec2 This configuration is not optimal on most workloads. Do you want to use it ? (y/n) y Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Do you still want to continue creating the volume? (y/n) y volume create: ec2: success: please start the volume to access data volume start: ec2: success root@pranithk-laptop - ~ 22:22:23 :) ⚡ cd /mnt/ec2 root@pranithk-laptop - /mnt/ec2 22:22:25 :) ⚡ mkdir -p dht-ec-client root@pranithk-laptop - /mnt/ec2 22:22:37 :) ⚡ cd dht-ec-client root@pranithk-laptop - /mnt/ec2/dht-ec-client 22:22:38 :) ⚡ touch file{0..99} root@pranithk-laptop - /mnt/ec2/dht-ec-client 22:22:55 :) ⚡ ls -l total 0 -rw-r--r--. 1 root root 0 May 9 22:22 file0 -rw-r--r--. 1 root root 0 May 9 22:22 file1 -rw-r--r--. 1 root root 0 May 9 22:22 file10 -rw-r--r--. 1 root root 0 May 9 22:22 file11 -rw-r--r--. 1 root root 0 May 9 22:22 file12 -rw-r--r--. 1 root root 0 May 9 22:22 file13 -rw-r--r--. 1 root root 0 May 9 22:22 file14 -rw-r--r--. 1 root root 0 May 9 22:22 file15 -rw-r--r--. 1 root root 0 May 9 22:22 file16 -rw-r--r--. 1 root root 0 May 9 22:22 file17 -rw-r--r--. 1 root root 0 May 9 22:22 file18 -rw-r--r--. 1 root root 0 May 9 22:22 file19 -rw-r--r--. 1 root root 0 May 9 22:22 file2 -rw-r--r--. 1 root root 0 May 9 22:22 file20 -rw-r--r--. 1 root root 0 May 9 22:22 file21 -rw-r--r--. 1 root root 0 May 9 22:22 file22 -rw-r--r--. 1 root root 0 May 9 22:22 file23 -rw-r--r--. 1 root root 0 May 9 22:22 file24 -rw-r--r--. 1 root root 0 May 9 22:22 file25 -rw-r--r--. 1 root root 0 May 9 22:22 file26 -rw-r--r--. 1 root root 0 May 9 22:22 file27 -rw-r--r--. 1 root root 0 May 9 22:22 file28 -rw-r--r--. 1 root root 0 May 9 22:22 file29 -rw-r--r--. 1 root root 0 May 9 22:22 file3 -rw-r--r--. 1 root root 0 May 9 22:22 file30 -rw-r--r--. 1 root root 0 May 9 22:22 file31 -rw-r--r--. 1 root root 0 May 9 22:22 file32 -rw-r--r--. 1 root root 0 May 9 22:22 file33 -rw-r--r--. 1 root root 0 May 9 22:22 file34 -rw-r--r--. 1 root root 0 May 9 22:22 file35 -rw-r--r--. 1 root root 0 May 9 22:22 file36 -rw-r--r--. 1 root root 0 May 9 22:22 file37 -rw-r--r--. 1 root root 0 May 9 22:22 file38 -rw-r--r--. 1 root root 0 May 9 22:22 file39 -rw-r--r--. 1 root root 0 May 9 22:22 file4 -rw-r--r--. 1 root root 0 May 9 22:22 file40 -rw-r--r--. 1 root root 0 May 9 22:22 file41 -rw-r--r--. 1 root root 0 May 9 22:22 file42 -rw-r--r--. 1 root root 0 May 9 22:22 file43 -rw-r--r--. 1 root root 0 May 9 22:22 file44 -rw-r--r--. 1 root root 0 May 9 22:22 file45 -rw-r--r--. 1 root root 0 May 9 22:22 file46 -rw-r--r--. 1 root root 0 May 9 22:22 file47 -rw-r--r--. 1 root root 0 May 9 22:22 file48 -rw-r--r--. 1 root root 0 May 9 22:22 file49 -rw-r--r--. 1 root root 0 May 9 22:22 file5 -rw-r--r--. 1 root root 0 May 9 22:22 file50 -rw-r--r--. 1 root root 0 May 9 22:22 file51 -rw-r--r--. 1 root root 0 May 9 22:22 file52 -rw-r--r--. 1 root root 0 May 9 22:22 file53 -rw-r--r--. 1 root root 0 May 9 22:22 file54 -rw-r--r--. 1 root root 0 May 9 22:22 file55 -rw-r--r--. 1 root root 0 May 9 22:22 file56 -rw-r--r--. 1 root root 0 May 9 22:22 file57 -rw-r--r--. 1 root root 0 May 9 22:22 file58 -rw-r--r--. 1 root root 0 May 9 22:22 file59 -rw-r--r--. 1 root root 0 May 9 22:22 file6 -rw-r--r--. 1 root root 0 May 9 22:22 file60 -rw-r--r--. 1 root root 0 May 9 22:22 file61 -rw-r--r--. 1 root root 0 May 9 22:22 file62 -rw-r--r--. 1 root root 0 May 9 22:22 file63 -rw-r--r--. 1 root root 0 May 9 22:22 file64 -rw-r--r--. 1 root root 0 May 9 22:22 file65 -rw-r--r--. 1 root root 0 May 9 22:22 file66 -rw-r--r--. 1 root root 0 May 9 22:22 file67 -rw-r--r--. 1 root root 0 May 9 22:22 file68 -rw-r--r--. 1 root root 0 May 9 22:22 file69 -rw-r--r--. 1 root root 0 May 9 22:22 file7 -rw-r--r--. 1 root root 0 May 9 22:22 file70 -rw-r--r--. 1 root root 0 May 9 22:22 file71 -rw-r--r--. 1 root root 0 May 9 22:22 file72 -rw-r--r--. 1 root root 0 May 9 22:22 file73 -rw-r--r--. 1 root root 0 May 9 22:22 file74 -rw-r--r--. 1 root root 0 May 9 22:22 file75 -rw-r--r--. 1 root root 0 May 9 22:22 file76 -rw-r--r--. 1 root root 0 May 9 22:22 file77 -rw-r--r--. 1 root root 0 May 9 22:22 file78 -rw-r--r--. 1 root root 0 May 9 22:22 file79 -rw-r--r--. 1 root root 0 May 9 22:22 file8 -rw-r--r--. 1 root root 0 May 9 22:22 file80 -rw-r--r--. 1 root root 0 May 9 22:22 file81 -rw-r--r--. 1 root root 0 May 9 22:22 file82 -rw-r--r--. 1 root root 0 May 9 22:22 file83 -rw-r--r--. 1 root root 0 May 9 22:22 file84 -rw-r--r--. 1 root root 0 May 9 22:22 file85 -rw-r--r--. 1 root root 0 May 9 22:22 file86 -rw-r--r--. 1 root root 0 May 9 22:22 file87 -rw-r--r--. 1 root root 0 May 9 22:22 file88 -rw-r--r--. 1 root root 0 May 9 22:22 file89 -rw-r--r--. 1 root root 0 May 9 22:22 file9 -rw-r--r--. 1 root root 0 May 9 22:22 file90 -rw-r--r--. 1 root root 0 May 9 22:22 file91 -rw-r--r--. 1 root root 0 May 9 22:22 file92 -rw-r--r--. 1 root root 0 May 9 22:22 file93 -rw-r--r--. 1 root root 0 May 9 22:22 file94 -rw-r--r--. 1 root root 0 May 9 22:22 file95 -rw-r--r--. 1 root root 0 May 9 22:22 file96 -rw-r--r--. 1 root root 0 May 9 22:22 file97 -rw-r--r--. 1 root root 0 May 9 22:22 file98 -rw-r--r--. 1 root root 0 May 9 22:22 file99 root@pranithk-laptop - /mnt/ec2/dht-ec-client 22:22:57 :) ⚡ /home/pk1/.scripts/gfs -c k -v ec2 -k 0 Dir: /var/lib/glusterd/vols/ec2/run/ kill -9 2800 root@pranithk-laptop - /mnt/ec2/dht-ec-client 22:23:02 :) ⚡ gluster v status Status of volume: ec2 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick pranithk-laptop:/home/gfs/ec_0 N/A N 2800 Brick pranithk-laptop:/home/gfs/ec_1 49156 Y 2811 Brick pranithk-laptop:/home/gfs/ec_2 49157 Y 2822 Brick pranithk-laptop:/home/gfs/ec_3 49158 Y 2833 Brick pranithk-laptop:/home/gfs/ec_4 49159 Y 2844 Brick pranithk-laptop:/home/gfs/ec_5 49160 Y 2855 Brick pranithk-laptop:/home/gfs/ec_6 49161 Y 2866 Brick pranithk-laptop:/home/gfs/ec_7 49162 Y 2877 NFS Server on localhost 2049 Y 2890 Task Status of Volume ec2 ------------------------------------------------------------------------------ There are no active volume tasks root@pranithk-laptop - /mnt/ec2/dht-ec-client 22:23:05 :) ⚡ ls file0 file14 file2 file25 file30 file36 file41 file47 file52 file58 file63 file69 file74 file8 file85 file90 file96 file1 file15 file20 file26 file31 file37 file42 file48 file53 file59 file64 file7 file75 file80 file86 file91 file97 file10 file16 file21 file27 file32 file38 file43 file49 file54 file6 file65 file70 file76 file81 file87 file92 file98 file11 file17 file22 file28 file33 file39 file44 file5 file55 file60 file66 file71 file77 file82 file88 file93 file99 file12 file18 file23 file29 file34 file4 file45 file50 file56 file61 file67 file72 file78 file83 file89 file94 file13 file19 file24 file3 file35 file40 file46 file51 file57 file62 file68 file73 file79 file84 file9 file95 root@pranithk-laptop - /mnt/ec2/dht-ec-client 22:23:11 :) ⚡ I tested the last 3.6 commit which is 24b3db07685fcc39f08e15b1b5e5cd1262b91590, the bug still exist. The following is my test script. ------------------------------------------------ # cat tests/bugs/bug-1208564.t #!/bin/bash . $(dirname $0)/../include.rc . $(dirname $0)/../volume.rc function file_count() { ls $1 |wc -l } cleanup TEST glusterd TEST pidof glusterd #TEST mkdir -p $B0/${V0}{0,1,2,3,4,5,6,7} TEST $CLI volume create $V0 disperse 4 redundancy 1 $H0:$B0/${v0}{0,1,2,3,4,5,6,7} EXPECT "$V0" volinfo_field $V0 'Volume Name' EXPECT 'Created' volinfo_field $V0 'Status' EXPECT '8' brick_count $V0 TEST $CLI volume start $V0 EXPECT_WITHIN $PROCESS_UP_TIMEOUT 'Started' volinfo_field $V0 'Status' TEST glusterfs --entry-timeout=0 --attribute-timeout=0 -s $H0 --volfile-id $V0 $M0 TEST touch $M0/file{0..99} TEST kill_brick $V0 $H0 $B0/0 EXPECT_WITHIN $PROCESS_UP_TIMEOUT '100' file_count $M0 cleanup -------------------------------------------- It stuck at the last file_count checking. More details for my test: ---------- OS version: Red Hat Enterprise Linux Server release 6.5 (Santiago) with libc-2.12 and CentOS release 6.6 (Final) with libc-2.12 # gluster --version glusterfs 3.6.4beta1 built on May 20 2015 22:16:38 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. #gluster v info all Volume Name: patchy Type: Distributed-Disperse Volume ID: 84374ec8-1aa5-4126-908d-c458d605d905 Status: Started Number of Bricks: 2 x (3 + 1) = 8 Transport-type: tcp Bricks: Brick1: mn1:/d/backends/0 Brick2: mn1:/d/backends/1 Brick3: mn1:/d/backends/2 Brick4: mn1:/d/backends/3 Brick5: mn1:/d/backends/4 Brick6: mn1:/d/backends/5 Brick7: mn1:/d/backends/6 Brick8: mn1:/d/backends/7 # gluster v status Status of volume: patchy Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick mn1:/d/backends/0 N/A N 16557 Brick mn1:/d/backends/1 49153 Y 16568 Brick mn1:/d/backends/2 49154 Y 16579 Brick mn1:/d/backends/3 49155 Y 16590 Brick mn1:/d/backends/4 49156 Y 16601 Brick mn1:/d/backends/5 49157 Y 16612 Brick mn1:/d/backends/6 49158 Y 16623 Brick mn1:/d/backends/7 49159 Y 16634 NFS Server on localhost 2049 Y 16652 Task Status of Volume patchy ------------------------------------------------------------------------------ There are no active volume tasks #df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/35000c5005e5a3337p5 1784110888 85086720 1608389928 6% / tmpfs 16416508 4 16416504 1% /dev/shm /dev/mapper/35000c5005e5a3337p1 999320 72752 874140 8% /boot /dev/mapper/35000c5005e5a3337p2 103081248 61028 97777340 1% /test mn1:patchy 10704665216 510520320 9650339456 6% /mnt/glusterfs/0 # ls /mnt/glusterfs/0/ `ls` command hang here PS: For the master branch the test passed. Here is some log information from /var/log/glusterfs/mnt-glusterfs-0.log, which may be helpful. [2015-05-20 14:19:10.887997] I [client-handshake.c:188:client_set_lk_version_cbk] 0-patchy-client-4: Server lk version = 1 [2015-05-20 14:19:11.684083] W [socket.c:620:__socket_rwv] 0-patchy-client-0: readv on 192.168.1.31:49152 failed (No data available) [2015-05-20 14:19:11.684193] I [client.c:2215:client_rpc_notify] 0-patchy-client-0: disconnected from patchy-client-0. Client proces s will keep trying to connect to glusterd until brick's port is available [2015-05-20 14:19:11.688660] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.689057] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.689665] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.690359] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.691292] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.693141] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.693168] E [ec-common.c:443:ec_child_select] 0-patchy-disperse-0: Insufficient available childs for this request (have 0, need 1) [2015-05-20 14:19:12.789888] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:12.790018] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) This bug is being closed as GlusterFS-3.6 is nearing its End-Of-Life and only important security bugs will be fixed. If you still face this bug with the newer GlusterFS versions, please open a new bug. |