Bug 1208564
| Summary: | ls operation hangs after the first brick is killed for distributed-disperse volume | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Fang Huang <fanghuang.data> |
| Component: | disperse | Assignee: | Pranith Kumar K <pkarampu> |
| Status: | CLOSED EOL | QA Contact: | |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.6.2 | CC: | bugs, kaushal, rabhat |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 3.6.3 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-08-16 13:08:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Ls operation works well if the other brick is killed. root@pranithk-laptop - ~
22:22:12 :) ⚡ glusterd && gluster volume create ec2 disperse 4 redundancy 1 `hostname`:/home/gfs/ec_{0..7} force && gluster volume start ec2 && mount -t glusterfs `hostname`:/ec2 /mnt/ec2
This configuration is not optimal on most workloads. Do you want to use it ? (y/n) y
Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
Do you still want to continue creating the volume? (y/n) y
volume create: ec2: success: please start the volume to access data
volume start: ec2: success
root@pranithk-laptop - ~
22:22:23 :) ⚡ cd /mnt/ec2
root@pranithk-laptop - /mnt/ec2
22:22:25 :) ⚡ mkdir -p dht-ec-client
root@pranithk-laptop - /mnt/ec2
22:22:37 :) ⚡ cd dht-ec-client
root@pranithk-laptop - /mnt/ec2/dht-ec-client
22:22:38 :) ⚡ touch file{0..99}
root@pranithk-laptop - /mnt/ec2/dht-ec-client
22:22:55 :) ⚡ ls -l
total 0
-rw-r--r--. 1 root root 0 May 9 22:22 file0
-rw-r--r--. 1 root root 0 May 9 22:22 file1
-rw-r--r--. 1 root root 0 May 9 22:22 file10
-rw-r--r--. 1 root root 0 May 9 22:22 file11
-rw-r--r--. 1 root root 0 May 9 22:22 file12
-rw-r--r--. 1 root root 0 May 9 22:22 file13
-rw-r--r--. 1 root root 0 May 9 22:22 file14
-rw-r--r--. 1 root root 0 May 9 22:22 file15
-rw-r--r--. 1 root root 0 May 9 22:22 file16
-rw-r--r--. 1 root root 0 May 9 22:22 file17
-rw-r--r--. 1 root root 0 May 9 22:22 file18
-rw-r--r--. 1 root root 0 May 9 22:22 file19
-rw-r--r--. 1 root root 0 May 9 22:22 file2
-rw-r--r--. 1 root root 0 May 9 22:22 file20
-rw-r--r--. 1 root root 0 May 9 22:22 file21
-rw-r--r--. 1 root root 0 May 9 22:22 file22
-rw-r--r--. 1 root root 0 May 9 22:22 file23
-rw-r--r--. 1 root root 0 May 9 22:22 file24
-rw-r--r--. 1 root root 0 May 9 22:22 file25
-rw-r--r--. 1 root root 0 May 9 22:22 file26
-rw-r--r--. 1 root root 0 May 9 22:22 file27
-rw-r--r--. 1 root root 0 May 9 22:22 file28
-rw-r--r--. 1 root root 0 May 9 22:22 file29
-rw-r--r--. 1 root root 0 May 9 22:22 file3
-rw-r--r--. 1 root root 0 May 9 22:22 file30
-rw-r--r--. 1 root root 0 May 9 22:22 file31
-rw-r--r--. 1 root root 0 May 9 22:22 file32
-rw-r--r--. 1 root root 0 May 9 22:22 file33
-rw-r--r--. 1 root root 0 May 9 22:22 file34
-rw-r--r--. 1 root root 0 May 9 22:22 file35
-rw-r--r--. 1 root root 0 May 9 22:22 file36
-rw-r--r--. 1 root root 0 May 9 22:22 file37
-rw-r--r--. 1 root root 0 May 9 22:22 file38
-rw-r--r--. 1 root root 0 May 9 22:22 file39
-rw-r--r--. 1 root root 0 May 9 22:22 file4
-rw-r--r--. 1 root root 0 May 9 22:22 file40
-rw-r--r--. 1 root root 0 May 9 22:22 file41
-rw-r--r--. 1 root root 0 May 9 22:22 file42
-rw-r--r--. 1 root root 0 May 9 22:22 file43
-rw-r--r--. 1 root root 0 May 9 22:22 file44
-rw-r--r--. 1 root root 0 May 9 22:22 file45
-rw-r--r--. 1 root root 0 May 9 22:22 file46
-rw-r--r--. 1 root root 0 May 9 22:22 file47
-rw-r--r--. 1 root root 0 May 9 22:22 file48
-rw-r--r--. 1 root root 0 May 9 22:22 file49
-rw-r--r--. 1 root root 0 May 9 22:22 file5
-rw-r--r--. 1 root root 0 May 9 22:22 file50
-rw-r--r--. 1 root root 0 May 9 22:22 file51
-rw-r--r--. 1 root root 0 May 9 22:22 file52
-rw-r--r--. 1 root root 0 May 9 22:22 file53
-rw-r--r--. 1 root root 0 May 9 22:22 file54
-rw-r--r--. 1 root root 0 May 9 22:22 file55
-rw-r--r--. 1 root root 0 May 9 22:22 file56
-rw-r--r--. 1 root root 0 May 9 22:22 file57
-rw-r--r--. 1 root root 0 May 9 22:22 file58
-rw-r--r--. 1 root root 0 May 9 22:22 file59
-rw-r--r--. 1 root root 0 May 9 22:22 file6
-rw-r--r--. 1 root root 0 May 9 22:22 file60
-rw-r--r--. 1 root root 0 May 9 22:22 file61
-rw-r--r--. 1 root root 0 May 9 22:22 file62
-rw-r--r--. 1 root root 0 May 9 22:22 file63
-rw-r--r--. 1 root root 0 May 9 22:22 file64
-rw-r--r--. 1 root root 0 May 9 22:22 file65
-rw-r--r--. 1 root root 0 May 9 22:22 file66
-rw-r--r--. 1 root root 0 May 9 22:22 file67
-rw-r--r--. 1 root root 0 May 9 22:22 file68
-rw-r--r--. 1 root root 0 May 9 22:22 file69
-rw-r--r--. 1 root root 0 May 9 22:22 file7
-rw-r--r--. 1 root root 0 May 9 22:22 file70
-rw-r--r--. 1 root root 0 May 9 22:22 file71
-rw-r--r--. 1 root root 0 May 9 22:22 file72
-rw-r--r--. 1 root root 0 May 9 22:22 file73
-rw-r--r--. 1 root root 0 May 9 22:22 file74
-rw-r--r--. 1 root root 0 May 9 22:22 file75
-rw-r--r--. 1 root root 0 May 9 22:22 file76
-rw-r--r--. 1 root root 0 May 9 22:22 file77
-rw-r--r--. 1 root root 0 May 9 22:22 file78
-rw-r--r--. 1 root root 0 May 9 22:22 file79
-rw-r--r--. 1 root root 0 May 9 22:22 file8
-rw-r--r--. 1 root root 0 May 9 22:22 file80
-rw-r--r--. 1 root root 0 May 9 22:22 file81
-rw-r--r--. 1 root root 0 May 9 22:22 file82
-rw-r--r--. 1 root root 0 May 9 22:22 file83
-rw-r--r--. 1 root root 0 May 9 22:22 file84
-rw-r--r--. 1 root root 0 May 9 22:22 file85
-rw-r--r--. 1 root root 0 May 9 22:22 file86
-rw-r--r--. 1 root root 0 May 9 22:22 file87
-rw-r--r--. 1 root root 0 May 9 22:22 file88
-rw-r--r--. 1 root root 0 May 9 22:22 file89
-rw-r--r--. 1 root root 0 May 9 22:22 file9
-rw-r--r--. 1 root root 0 May 9 22:22 file90
-rw-r--r--. 1 root root 0 May 9 22:22 file91
-rw-r--r--. 1 root root 0 May 9 22:22 file92
-rw-r--r--. 1 root root 0 May 9 22:22 file93
-rw-r--r--. 1 root root 0 May 9 22:22 file94
-rw-r--r--. 1 root root 0 May 9 22:22 file95
-rw-r--r--. 1 root root 0 May 9 22:22 file96
-rw-r--r--. 1 root root 0 May 9 22:22 file97
-rw-r--r--. 1 root root 0 May 9 22:22 file98
-rw-r--r--. 1 root root 0 May 9 22:22 file99
root@pranithk-laptop - /mnt/ec2/dht-ec-client
22:22:57 :) ⚡ /home/pk1/.scripts/gfs -c k -v ec2 -k 0
Dir: /var/lib/glusterd/vols/ec2/run/
kill -9 2800
root@pranithk-laptop - /mnt/ec2/dht-ec-client
22:23:02 :) ⚡ gluster v status
Status of volume: ec2
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick pranithk-laptop:/home/gfs/ec_0 N/A N 2800
Brick pranithk-laptop:/home/gfs/ec_1 49156 Y 2811
Brick pranithk-laptop:/home/gfs/ec_2 49157 Y 2822
Brick pranithk-laptop:/home/gfs/ec_3 49158 Y 2833
Brick pranithk-laptop:/home/gfs/ec_4 49159 Y 2844
Brick pranithk-laptop:/home/gfs/ec_5 49160 Y 2855
Brick pranithk-laptop:/home/gfs/ec_6 49161 Y 2866
Brick pranithk-laptop:/home/gfs/ec_7 49162 Y 2877
NFS Server on localhost 2049 Y 2890
Task Status of Volume ec2
------------------------------------------------------------------------------
There are no active volume tasks
root@pranithk-laptop - /mnt/ec2/dht-ec-client
22:23:05 :) ⚡ ls
file0 file14 file2 file25 file30 file36 file41 file47 file52 file58 file63 file69 file74 file8 file85 file90 file96
file1 file15 file20 file26 file31 file37 file42 file48 file53 file59 file64 file7 file75 file80 file86 file91 file97
file10 file16 file21 file27 file32 file38 file43 file49 file54 file6 file65 file70 file76 file81 file87 file92 file98
file11 file17 file22 file28 file33 file39 file44 file5 file55 file60 file66 file71 file77 file82 file88 file93 file99
file12 file18 file23 file29 file34 file4 file45 file50 file56 file61 file67 file72 file78 file83 file89 file94
file13 file19 file24 file3 file35 file40 file46 file51 file57 file62 file68 file73 file79 file84 file9 file95
root@pranithk-laptop - /mnt/ec2/dht-ec-client
22:23:11 :) ⚡
I tested the last 3.6 commit which is 24b3db07685fcc39f08e15b1b5e5cd1262b91590, the bug still exist. The following is my test script.
------------------------------------------------
# cat tests/bugs/bug-1208564.t
#!/bin/bash
. $(dirname $0)/../include.rc
. $(dirname $0)/../volume.rc
function file_count() {
ls $1 |wc -l
}
cleanup
TEST glusterd
TEST pidof glusterd
#TEST mkdir -p $B0/${V0}{0,1,2,3,4,5,6,7}
TEST $CLI volume create $V0 disperse 4 redundancy 1 $H0:$B0/${v0}{0,1,2,3,4,5,6,7}
EXPECT "$V0" volinfo_field $V0 'Volume Name'
EXPECT 'Created' volinfo_field $V0 'Status'
EXPECT '8' brick_count $V0
TEST $CLI volume start $V0
EXPECT_WITHIN $PROCESS_UP_TIMEOUT 'Started' volinfo_field $V0 'Status'
TEST glusterfs --entry-timeout=0 --attribute-timeout=0 -s $H0 --volfile-id $V0 $M0
TEST touch $M0/file{0..99}
TEST kill_brick $V0 $H0 $B0/0
EXPECT_WITHIN $PROCESS_UP_TIMEOUT '100' file_count $M0
cleanup
--------------------------------------------
It stuck at the last file_count checking.
More details for my test:
----------
OS version:
Red Hat Enterprise Linux Server release 6.5 (Santiago) with libc-2.12
and CentOS release 6.6 (Final) with libc-2.12
# gluster --version
glusterfs 3.6.4beta1 built on May 20 2015 22:16:38
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
#gluster v info all
Volume Name: patchy
Type: Distributed-Disperse
Volume ID: 84374ec8-1aa5-4126-908d-c458d605d905
Status: Started
Number of Bricks: 2 x (3 + 1) = 8
Transport-type: tcp
Bricks:
Brick1: mn1:/d/backends/0
Brick2: mn1:/d/backends/1
Brick3: mn1:/d/backends/2
Brick4: mn1:/d/backends/3
Brick5: mn1:/d/backends/4
Brick6: mn1:/d/backends/5
Brick7: mn1:/d/backends/6
Brick8: mn1:/d/backends/7
# gluster v status
Status of volume: patchy
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick mn1:/d/backends/0 N/A N 16557
Brick mn1:/d/backends/1 49153 Y 16568
Brick mn1:/d/backends/2 49154 Y 16579
Brick mn1:/d/backends/3 49155 Y 16590
Brick mn1:/d/backends/4 49156 Y 16601
Brick mn1:/d/backends/5 49157 Y 16612
Brick mn1:/d/backends/6 49158 Y 16623
Brick mn1:/d/backends/7 49159 Y 16634
NFS Server on localhost 2049 Y 16652
Task Status of Volume patchy
------------------------------------------------------------------------------
There are no active volume tasks
#df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/35000c5005e5a3337p5
1784110888 85086720 1608389928 6% /
tmpfs 16416508 4 16416504 1% /dev/shm
/dev/mapper/35000c5005e5a3337p1
999320 72752 874140 8% /boot
/dev/mapper/35000c5005e5a3337p2
103081248 61028 97777340 1% /test
mn1:patchy 10704665216 510520320 9650339456 6% /mnt/glusterfs/0
# ls /mnt/glusterfs/0/
`ls` command hang here
PS: For the master branch the test passed.
Here is some log information from /var/log/glusterfs/mnt-glusterfs-0.log, which may be helpful. [2015-05-20 14:19:10.887997] I [client-handshake.c:188:client_set_lk_version_cbk] 0-patchy-client-4: Server lk version = 1 [2015-05-20 14:19:11.684083] W [socket.c:620:__socket_rwv] 0-patchy-client-0: readv on 192.168.1.31:49152 failed (No data available) [2015-05-20 14:19:11.684193] I [client.c:2215:client_rpc_notify] 0-patchy-client-0: disconnected from patchy-client-0. Client proces s will keep trying to connect to glusterd until brick's port is available [2015-05-20 14:19:11.688660] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.689057] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.689665] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.690359] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.691292] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.693141] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:11.693168] E [ec-common.c:443:ec_child_select] 0-patchy-disperse-0: Insufficient available childs for this request (have 0, need 1) [2015-05-20 14:19:12.789888] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) [2015-05-20 14:19:12.790018] W [ec-common.c:407:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unava ilable (1) This bug is being closed as GlusterFS-3.6 is nearing its End-Of-Life and only important security bugs will be fixed. If you still face this bug with the newer GlusterFS versions, please open a new bug. |
Description of problem: At client ls operation hangs after the first brick is killed for distributed-disperse volume Version-Release number of selected component (if applicable): glusterfs 3.6.3beta1 built on Apr 1 2015 20:11:53 Repository revision: git://git.gluster.com/glusterfs.git How reproducible: Always Steps to Reproduce: 1. Create a distributed-disperse volume Volume Name: dht-ec-vol Type: Distributed-Disperse Volume ID: 5697b709-4187-47ce-970b-b8c19d610055 Status: Started Number of Bricks: 2 x (3 + 1) = 8 Transport-type: tcp Bricks: Brick1: cn4:/home/wtb/brick/bri1 Brick2: cn4:/home/wtb/brick/bri2 Brick3: cn4:/home/wtb/brick/bri3 Brick4: cn4:/home/wtb/brick/bri4 Brick5: cn4:/home/wtb/brick/bri5 Brick6: cn4:/home/wtb/brick/bri6 Brick7: cn4:/home/wtb/brick/bri7 Brick8: cn4:/home/wtb/brick/bri8 2. mount and create some files mkdir -p /mnt/wtb/dht-ec-client mount -t glusterfs cn4:/dht-ec-vol /mnt/wtb/dht-ec-client touch /mnt/wtb/dht-ec-client/file{0..99} 3. kill the first brick execute 'gluster v status dht-ec-vol' and find the pid of Brick1 kill this process 4. ls /mnt/wtb/dht-ec-client Actual results: ls hangs on and shows nothing Expected results: file{0..99} should be displayed. Additional info: