Bug 1643075 - tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t failing
Summary: tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t failing
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 4.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Sanju
QA Contact:
URL:
Whiteboard:
Depends On: 1642597 1643078
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-25 13:18 UTC by Sanju
Modified: 2020-01-09 17:32 UTC (History)
1 user (show)

Fixed In Version: glusterfs-4.1.6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1642597
Environment:
Last Closed: 2018-11-29 15:26:07 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 21490 0 None Merged tests: correction in tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t 2018-10-30 19:19:55 UTC

Description Sanju 2018-10-25 13:18:35 UTC
+++ This bug was initially created as a clone of Bug #1642597 +++

Description of problem:
tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t is failing when we run this test case by applying https://review.gluster.org/#/c/glusterfs/+/21336/ on master branch

Version-Release number of selected component (if applicable):


How reproducible:
Always

Additional info:
Logs can be found at https://build.gluster.org/job/centos7-regression/3150/consoleFull

--- Additional comment from Sanju on 2018-10-25 00:51:49 IST ---

Reason for why this test case is not failing on master:

-> test case
  1 #!/bin/bash
  2 
  3 . $(dirname $0)/../../include.rc
  4 . $(dirname $0)/../../cluster.rc
  5 . $(dirname $0)/../../volume.rc
  6 
  7 function peer_count {
  8 eval \$CLI_$1 peer status | grep 'Peer in Cluster (Connected)' | wc -l
  9 }
 10 
 11 cleanup;
 12 
 13 #bug-1454418 -  Setting Port number in specific range
 14 sysctl net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156"
 15 
 16 TEST launch_cluster 3;
 17 
 18 #bug-1223213
 19 
 20 # Fool the cluster to operate with 3.5 version even though binary's op-version
 21 # is > 3.5. This is to ensure 3.5 code path is hit to test that volume status
 22 # works when a node is upgraded from 3.5 to 3.7 or higher as mgmt_v3 lock is
 23 # been introduced in 3.6 version and onwards
 24 
 25 GD1_WD=$($CLI_1 system getwd)
 26 $CLI_1 system uuid get
 27 Old_op_version=$(cat ${GD1_WD}/glusterd.info | grep operating-version | cut -d '=' -f 2)
 28 
 29 TEST sed -rnie "'s/(operating-version=)\w+/\130500/gip'" ${GD1_WD}/glusterd.info
 30 
 31 TEST kill_glusterd 1
 32 TEST start_glusterd 1
 33 
 34 TEST $CLI_1 peer probe $H2;
 35 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 1
 36 
 37 TEST `sed -i "s/"30500"/${Old_op_version}/g" ${GD1_WD}/glusterd.info`
 38 
 39 TEST kill_glusterd 1
 40 TEST start_glusterd 1
 41 
 42 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 1
 43 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 2
 44 
 45 #bug-1454418
 46 sysctl net.ipv4.ip_local_reserved_ports="
 47 "
 48 
 49 TEST $CLI_1 volume create $V0 $H1:$B1/$V0 $H2:$B2/$V0
 50 TEST $CLI_1 volume start $V0
 51 
 52 #bug-888752 - volume status --xml from peer in the cluster
 53 
 54 TEST $CLI_1 volume status $V0 $H2:$B2/$V0 --xml
 55 
 56 TEST $CLI_1 volume stop $V0
 57 TEST $CLI_1 volume delete $V0
 58 
 59 TEST $CLI_1 volume create $V0 $H1:$B1/$V0
 60 TEST $CLI_1 volume create $V1 $H1:$B1/$V1
 61 
 62 TEST $CLI_1 peer probe $H3;
 63 EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count 1
 64 
 65 TEST $CLI_1 volume start $V0
 66 TEST $CLI_1 volume start $V1
 67 
 68 #bug-1173414 - validate mgmt-v3-remote-lock-failure
 69 
 70 for i in {1..20}
 71 do
 72 $CLI_1 volume set $V0 diagnostics.client-log-level DEBUG &
 73 $CLI_1 volume set $V1 barrier on
 74 $CLI_2 volume set $V0 diagnostics.client-log-level DEBUG &
 75 $CLI_2 volume set $V1 barrier on
 76 done
 77 
 78 EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count 1
 79 TEST $CLI_1 volume status
 80 TEST $CLI_2 volume status
 81 
 82 #bug-1293414 - validate peer detach
 83 
 84 # peers hosting bricks cannot be detached
 85 TEST ! $CLI_2 peer detach $H1
 86 EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count 1
 87 
 88 # peer not hosting bricks should be detachable
 89 TEST $CLI_2 peer detach $H3
 90 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 1
 91 
 92 #bug-1344407 - deleting a volume when peer is down should fail
 93 
 94 TEST kill_glusterd 2
 95 TEST ! $CLI_1 volume delete $V0
 96 
 97 cleanup

At line number 59, we have 2 nodes in cluster. and we are executing below commands.
 59 TEST $CLI_1 volume create $V0 $H1:$B1/$V0
 60 TEST $CLI_1 volume create $V1 $H1:$B1/$V1
 61 
 62 TEST $CLI_1 peer probe $H3;
 63 EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count 1

I executed the same steps on my setup:
[root@server1 glusterfs]# gluster pe stat
Number of Peers: 0
[root@server1 glusterfs]# gluster pe probe server2
peer probe: success. 
[root@server1 glusterfs]# gluster v create test-vol1 server1:/tmp/b11
volume create: test-vol1: success: please start the volume to access data
[root@server1 glusterfs]# gluster v create test-vol2 server1:/tmp/b12
volume create: test-vol2: success: please start the volume to access data
[root@server1 glusterfs]# gluster pe probe server3
peer probe: success. 
[root@server1 glusterfs] #

Now, checking output of "gluster v info" and "gluster pe stat" from all 3 nodes in cluster.
From node1:
[root@server1 glusterfs]# gluster v info
 
Volume Name: test-vol1
Type: Distribute
Volume ID: be908175-34bf-4376-b28e-23f142457c67
Status: Created
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/tmp/b11
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on
 
Volume Name: test-vol2
Type: Distribute
Volume ID: 882c245c-d435-4a23-98f6-399a7caedec0
Status: Created
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/tmp/b12
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on
[root@server1 glusterfs]# gluster pe stat
Number of Peers: 2

Hostname: server2
Uuid: 917311b8-0b5c-4f22-aa1d-b1216ab192e5
State: Peer in Cluster (Connected)

Hostname: server3
Uuid: 6c68eb99-c9f4-4590-a022-7ef2081705b3
State: Peer in Cluster (Connected)
[root@server1 glusterfs]#

From node2:
[root@server2 glusterfs]# gluster v info
 
Volume Name: test-vol1
Type: Distribute
Volume ID: be908175-34bf-4376-b28e-23f142457c67
Status: Created
Snapshot Count: 0
Xlator 1: BD
Capability 1: thin
Capability 2: offload_copy
Capability 3: offload_snapshot
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/tmp/b11
Brick1 VG: 
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on
 
Volume Name: test-vol2
Type: Distribute
Volume ID: 882c245c-d435-4a23-98f6-399a7caedec0
Status: Created
Snapshot Count: 0
Xlator 1: BD
Capability 1: thin
Capability 2: offload_copy
Capability 3: offload_snapshot
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/tmp/b12
Brick1 VG: 
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on
[root@server2 glusterfs]# gluster pe stat
Number of Peers: 2

Hostname: server1
Uuid: 8a75c6c4-865d-4805-bbdf-403234e9b5e3
State: Peer in Cluster (Connected)

Hostname: server3
Uuid: 6c68eb99-c9f4-4590-a022-7ef2081705b3
State: Peer Rejected (Connected)
[root@server2 glusterfs]# 

From node3:
[root@server3 glusterfs]# gluster v info
 
Volume Name: test-vol1
Type: Distribute
Volume ID: be908175-34bf-4376-b28e-23f142457c67
Status: Created
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/tmp/b11
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
cluster.brick-multiplex: on
 
Volume Name: test-vol2
Type: Distribute
Volume ID: 882c245c-d435-4a23-98f6-399a7caedec0
Status: Created
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/tmp/b12
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
cluster.brick-multiplex: on
[root@server3 glusterfs]# gluster pe stat
Number of Peers: 2

Hostname: server1
Uuid: 8a75c6c4-865d-4805-bbdf-403234e9b5e3
State: Peer in Cluster (Connected)

Hostname: server2
Uuid: 917311b8-0b5c-4f22-aa1d-b1216ab192e5
State: Peer Rejected (Connected)
[root@server3 glusterfs]# 

We are seeing BD xlator related things in output of "gluster v info" in node2 because of https://bugzilla.redhat.com/show_bug.cgi?id=1635820

As we issued peer probe to node3 from node1, the data from node1 is synced to node3. So both are in connected state.

Because of https://bugzilla.redhat.com/show_bug.cgi?id=1635820 node2 is having caps=15 in its info file of the volumes. When node3 is performing handshake with node2, since node2 is having a caps filed in info file, node2 will go into rejected state. So, when we issue peer status from node2/node3, we see node3/node2 are in rejected state.

Now, at line #85, we are issueing
TEST ! $CLI_2 peer detach $H1

The above command is success since, when we issue "gluster pe detach node1" from node2, it will fail saying "peer detach: failed: One of the peers is probably down. Check with 'peer status'" and we have not(!) before the command. We are seeing the above error because at node2's peer status node3 is in rejected state. We expect above command to fail saying "Brick(s) with the peer node1 exist in cluster".

-> Now, why this test case is failing with https://review.gluster.org/#/c/glusterfs/+/21336/
https://review.gluster.org/#/c/glusterfs/+/21336/ addresses https://bugzilla.redhat.com/show_bug.cgi?id=1635820. So all the nodes will be connected state after executing commands from line #59 to #63. We expect peer detach at line #85 to fail saying "Brick(s) with the peer node1 exist in cluster". but the peer detach is success, so the test case is failing.

-> Here's why the peer detach is success:
Patch https://review.gluster.org/#/c/glusterfs/+/19135/ has
optimised glusterd test cases by clubbing the similar test
cases into a single test case.

https://review.gluster.org/#/c/glusterfs/+/19135/15/tests/bugs/glusterd/bug-1293414-import-brickinfo-uuid.t
test case has been deleted and added as a part of
tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t

In the original test case, we create a volume with two bricks,
each on a separate node(node1 & node2). From another node in cluster(node3),
we try to detach a node which is hosting bricks. It fails.

In the new test, we created volume with single brick on node1.
and from another node in cluster, we tried to detach node1. we
expect peer detach to fail, but peer detach was success as
the node is hosting all the bricks of volume.

To fix this issue, we have to change the test case to reflect the original test case scenario.

--- Additional comment from Worker Ant on 2018-10-25 01:00:42 IST ---

REVIEW: https://review.gluster.org/21368 (tests: correction in tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t) posted (#2) for review on master by Sanju Rakonde

--- Additional comment from Worker Ant on 2018-10-25 12:29:20 IST ---

COMMIT: https://review.gluster.org/21368 committed in master by "Atin Mukherjee" <amukherj> with a commit message- tests: correction in tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t

Patch https://review.gluster.org/#/c/glusterfs/+/19135/ has
optimised glusterd test cases by clubbing the similar test
cases into a single test case.

https://review.gluster.org/#/c/glusterfs/+/19135/15/tests/bugs/glusterd/bug-1293414-import-brickinfo-uuid.t
test case has been deleted and added as a part of
tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t

In the original test case, we create a volume with two bricks,
each on a separate node(N1 & N2). From another node in cluster(N3),
we try to detach a node which is hosting bricks. It fails.

In the new test, we created volume with single brick on N1.
and from another node in cluster, we tried to detach N1. we
expect peer detach to fail, but peer detach was success as
the node is hosting all the bricks of volume.

Now, changing the new test case to cover the original test case scenario.

Please refer https://bugzilla.redhat.com/show_bug.cgi?id=1642597#c1 to
understand why the new test case is not failing in centos-regression.

fixes: bz#1642597

Change-Id: Ifda12b5677143095f263fbb97a6808573f513234
Signed-off-by: Sanju Rakonde <srakonde>

Comment 1 Worker Ant 2018-10-25 13:20:34 UTC
REVIEW: https://review.gluster.org/21490 (tests: correction in tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t) posted (#1) for review on release-4.1 by Sanju Rakonde

Comment 2 Worker Ant 2018-10-30 19:19:54 UTC
REVIEW: https://review.gluster.org/21490 (tests: correction in tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t) posted (#1) for review on release-4.1 by Sanju Rakonde

Comment 3 Shyamsundar 2018-11-29 15:26:07 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.1.6, please open a new bug report.

glusterfs-4.1.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-November/000116.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.