Description of problem: tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t is failing when we run this test case by applying https://review.gluster.org/#/c/glusterfs/+/21336/ on master branch Version-Release number of selected component (if applicable): How reproducible: Always Additional info: Logs can be found at https://build.gluster.org/job/centos7-regression/3150/consoleFull
Reason for why this test case is not failing on master: -> test case 1 #!/bin/bash 2 3 . $(dirname $0)/../../include.rc 4 . $(dirname $0)/../../cluster.rc 5 . $(dirname $0)/../../volume.rc 6 7 function peer_count { 8 eval \$CLI_$1 peer status | grep 'Peer in Cluster (Connected)' | wc -l 9 } 10 11 cleanup; 12 13 #bug-1454418 - Setting Port number in specific range 14 sysctl net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156" 15 16 TEST launch_cluster 3; 17 18 #bug-1223213 19 20 # Fool the cluster to operate with 3.5 version even though binary's op-version 21 # is > 3.5. This is to ensure 3.5 code path is hit to test that volume status 22 # works when a node is upgraded from 3.5 to 3.7 or higher as mgmt_v3 lock is 23 # been introduced in 3.6 version and onwards 24 25 GD1_WD=$($CLI_1 system getwd) 26 $CLI_1 system uuid get 27 Old_op_version=$(cat ${GD1_WD}/glusterd.info | grep operating-version | cut -d '=' -f 2) 28 29 TEST sed -rnie "'s/(operating-version=)\w+/\130500/gip'" ${GD1_WD}/glusterd.info 30 31 TEST kill_glusterd 1 32 TEST start_glusterd 1 33 34 TEST $CLI_1 peer probe $H2; 35 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 1 36 37 TEST `sed -i "s/"30500"/${Old_op_version}/g" ${GD1_WD}/glusterd.info` 38 39 TEST kill_glusterd 1 40 TEST start_glusterd 1 41 42 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 1 43 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 2 44 45 #bug-1454418 46 sysctl net.ipv4.ip_local_reserved_ports=" 47 " 48 49 TEST $CLI_1 volume create $V0 $H1:$B1/$V0 $H2:$B2/$V0 50 TEST $CLI_1 volume start $V0 51 52 #bug-888752 - volume status --xml from peer in the cluster 53 54 TEST $CLI_1 volume status $V0 $H2:$B2/$V0 --xml 55 56 TEST $CLI_1 volume stop $V0 57 TEST $CLI_1 volume delete $V0 58 59 TEST $CLI_1 volume create $V0 $H1:$B1/$V0 60 TEST $CLI_1 volume create $V1 $H1:$B1/$V1 61 62 TEST $CLI_1 peer probe $H3; 63 EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count 1 64 65 TEST $CLI_1 volume start $V0 66 TEST $CLI_1 volume start $V1 67 68 #bug-1173414 - validate mgmt-v3-remote-lock-failure 69 70 for i in {1..20} 71 do 72 $CLI_1 volume set $V0 diagnostics.client-log-level DEBUG & 73 $CLI_1 volume set $V1 barrier on 74 $CLI_2 volume set $V0 diagnostics.client-log-level DEBUG & 75 $CLI_2 volume set $V1 barrier on 76 done 77 78 EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count 1 79 TEST $CLI_1 volume status 80 TEST $CLI_2 volume status 81 82 #bug-1293414 - validate peer detach 83 84 # peers hosting bricks cannot be detached 85 TEST ! $CLI_2 peer detach $H1 86 EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count 1 87 88 # peer not hosting bricks should be detachable 89 TEST $CLI_2 peer detach $H3 90 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 1 91 92 #bug-1344407 - deleting a volume when peer is down should fail 93 94 TEST kill_glusterd 2 95 TEST ! $CLI_1 volume delete $V0 96 97 cleanup At line number 59, we have 2 nodes in cluster. and we are executing below commands. 59 TEST $CLI_1 volume create $V0 $H1:$B1/$V0 60 TEST $CLI_1 volume create $V1 $H1:$B1/$V1 61 62 TEST $CLI_1 peer probe $H3; 63 EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count 1 I executed the same steps on my setup: [root@server1 glusterfs]# gluster pe stat Number of Peers: 0 [root@server1 glusterfs]# gluster pe probe server2 peer probe: success. [root@server1 glusterfs]# gluster v create test-vol1 server1:/tmp/b11 volume create: test-vol1: success: please start the volume to access data [root@server1 glusterfs]# gluster v create test-vol2 server1:/tmp/b12 volume create: test-vol2: success: please start the volume to access data [root@server1 glusterfs]# gluster pe probe server3 peer probe: success. [root@server1 glusterfs] # Now, checking output of "gluster v info" and "gluster pe stat" from all 3 nodes in cluster. From node1: [root@server1 glusterfs]# gluster v info Volume Name: test-vol1 Type: Distribute Volume ID: be908175-34bf-4376-b28e-23f142457c67 Status: Created Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: server1:/tmp/b11 Options Reconfigured: transport.address-family: inet nfs.disable: on cluster.brick-multiplex: on Volume Name: test-vol2 Type: Distribute Volume ID: 882c245c-d435-4a23-98f6-399a7caedec0 Status: Created Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: server1:/tmp/b12 Options Reconfigured: transport.address-family: inet nfs.disable: on cluster.brick-multiplex: on [root@server1 glusterfs]# gluster pe stat Number of Peers: 2 Hostname: server2 Uuid: 917311b8-0b5c-4f22-aa1d-b1216ab192e5 State: Peer in Cluster (Connected) Hostname: server3 Uuid: 6c68eb99-c9f4-4590-a022-7ef2081705b3 State: Peer in Cluster (Connected) [root@server1 glusterfs]# From node2: [root@server2 glusterfs]# gluster v info Volume Name: test-vol1 Type: Distribute Volume ID: be908175-34bf-4376-b28e-23f142457c67 Status: Created Snapshot Count: 0 Xlator 1: BD Capability 1: thin Capability 2: offload_copy Capability 3: offload_snapshot Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: server1:/tmp/b11 Brick1 VG: Options Reconfigured: transport.address-family: inet nfs.disable: on cluster.brick-multiplex: on Volume Name: test-vol2 Type: Distribute Volume ID: 882c245c-d435-4a23-98f6-399a7caedec0 Status: Created Snapshot Count: 0 Xlator 1: BD Capability 1: thin Capability 2: offload_copy Capability 3: offload_snapshot Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: server1:/tmp/b12 Brick1 VG: Options Reconfigured: transport.address-family: inet nfs.disable: on cluster.brick-multiplex: on [root@server2 glusterfs]# gluster pe stat Number of Peers: 2 Hostname: server1 Uuid: 8a75c6c4-865d-4805-bbdf-403234e9b5e3 State: Peer in Cluster (Connected) Hostname: server3 Uuid: 6c68eb99-c9f4-4590-a022-7ef2081705b3 State: Peer Rejected (Connected) [root@server2 glusterfs]# From node3: [root@server3 glusterfs]# gluster v info Volume Name: test-vol1 Type: Distribute Volume ID: be908175-34bf-4376-b28e-23f142457c67 Status: Created Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: server1:/tmp/b11 Options Reconfigured: nfs.disable: on transport.address-family: inet cluster.brick-multiplex: on Volume Name: test-vol2 Type: Distribute Volume ID: 882c245c-d435-4a23-98f6-399a7caedec0 Status: Created Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: server1:/tmp/b12 Options Reconfigured: nfs.disable: on transport.address-family: inet cluster.brick-multiplex: on [root@server3 glusterfs]# gluster pe stat Number of Peers: 2 Hostname: server1 Uuid: 8a75c6c4-865d-4805-bbdf-403234e9b5e3 State: Peer in Cluster (Connected) Hostname: server2 Uuid: 917311b8-0b5c-4f22-aa1d-b1216ab192e5 State: Peer Rejected (Connected) [root@server3 glusterfs]# We are seeing BD xlator related things in output of "gluster v info" in node2 because of https://bugzilla.redhat.com/show_bug.cgi?id=1635820 As we issued peer probe to node3 from node1, the data from node1 is synced to node3. So both are in connected state. Because of https://bugzilla.redhat.com/show_bug.cgi?id=1635820 node2 is having caps=15 in its info file of the volumes. When node3 is performing handshake with node2, since node2 is having a caps filed in info file, node2 will go into rejected state. So, when we issue peer status from node2/node3, we see node3/node2 are in rejected state. Now, at line #85, we are issueing TEST ! $CLI_2 peer detach $H1 The above command is success since, when we issue "gluster pe detach node1" from node2, it will fail saying "peer detach: failed: One of the peers is probably down. Check with 'peer status'" and we have not(!) before the command. We are seeing the above error because at node2's peer status node3 is in rejected state. We expect above command to fail saying "Brick(s) with the peer node1 exist in cluster". -> Now, why this test case is failing with https://review.gluster.org/#/c/glusterfs/+/21336/ https://review.gluster.org/#/c/glusterfs/+/21336/ addresses https://bugzilla.redhat.com/show_bug.cgi?id=1635820. So all the nodes will be connected state after executing commands from line #59 to #63. We expect peer detach at line #85 to fail saying "Brick(s) with the peer node1 exist in cluster". but the peer detach is success, so the test case is failing. -> Here's why the peer detach is success: Patch https://review.gluster.org/#/c/glusterfs/+/19135/ has optimised glusterd test cases by clubbing the similar test cases into a single test case. https://review.gluster.org/#/c/glusterfs/+/19135/15/tests/bugs/glusterd/bug-1293414-import-brickinfo-uuid.t test case has been deleted and added as a part of tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t In the original test case, we create a volume with two bricks, each on a separate node(node1 & node2). From another node in cluster(node3), we try to detach a node which is hosting bricks. It fails. In the new test, we created volume with single brick on node1. and from another node in cluster, we tried to detach node1. we expect peer detach to fail, but peer detach was success as the node is hosting all the bricks of volume. To fix this issue, we have to change the test case to reflect the original test case scenario.
REVIEW: https://review.gluster.org/21368 (tests: correction in tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t) posted (#2) for review on master by Sanju Rakonde
COMMIT: https://review.gluster.org/21368 committed in master by "Atin Mukherjee" <amukherj> with a commit message- tests: correction in tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t Patch https://review.gluster.org/#/c/glusterfs/+/19135/ has optimised glusterd test cases by clubbing the similar test cases into a single test case. https://review.gluster.org/#/c/glusterfs/+/19135/15/tests/bugs/glusterd/bug-1293414-import-brickinfo-uuid.t test case has been deleted and added as a part of tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t In the original test case, we create a volume with two bricks, each on a separate node(N1 & N2). From another node in cluster(N3), we try to detach a node which is hosting bricks. It fails. In the new test, we created volume with single brick on N1. and from another node in cluster, we tried to detach N1. we expect peer detach to fail, but peer detach was success as the node is hosting all the bricks of volume. Now, changing the new test case to cover the original test case scenario. Please refer https://bugzilla.redhat.com/show_bug.cgi?id=1642597#c1 to understand why the new test case is not failing in centos-regression. fixes: bz#1642597 Change-Id: Ifda12b5677143095f263fbb97a6808573f513234 Signed-off-by: Sanju Rakonde <srakonde>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report. glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html [2] https://www.gluster.org/pipermail/gluster-users/