Bug 1847081 - add-brick functionality is completely broken
Summary: add-brick functionality is completely broken
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.5
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: RHGS 3.5.z Batch Update 3
Assignee: Srijan Sivakumar
QA Contact: milind
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-15 15:45 UTC by Kshithij Iyer
Modified: 2020-12-17 04:52 UTC (History)
8 users (show)

Fixed In Version: glusterfs-6.0-40
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-17 04:51:50 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:5603 0 None None None 2020-12-17 04:52:24 UTC

Description Kshithij Iyer 2020-06-15 15:45:44 UTC
Description of problem:
On a 6 node gluster cluster, create a 1x3 volume and start it. Post this add 3 bricks to the volume.

###############################################################################
CLI output
###############################################################################
[root@rhsqaci-vm49 glusterd]# gluster v create rep2 replica 3 rhsqaci-vm49.lab.eng.blr.redhat.com:/bricks/brick2/rep2 rhsqaci-vm33.lab.eng.blr.redhat.com:/bricks/brick2/rep2 rhsqaci-vm54.lab.eng.blr.redhat.com:/bricks/brick2/rep2
volume create: rep2: success: please start the volume to access data
[root@rhsqaci-vm49 glusterd]# gluster v start rep2
volume start: rep2: success
[root@rhsqaci-vm49 glusterd]# gluster v info rep2
 
Volume Name: rep2
Type: Replicate
Volume ID: c2c36abb-f158-41f5-8292-cccffed0872e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhsqaci-vm49.lab.eng.blr.redhat.com:/bricks/brick2/rep2
Brick2: rhsqaci-vm33.lab.eng.blr.redhat.com:/bricks/brick2/rep2
Brick3: rhsqaci-vm54.lab.eng.blr.redhat.com:/bricks/brick2/rep2
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
[root@rhsqaci-vm49 glusterd]# gluster v add-brick rep2 rhsqaci-vm34.lab.eng.blr.redhat.com:/bricks/brick2/rep2 rhsqaci-vm04.lab.eng.blr.redhat.com:/bricks/brick2/rep2 rhsqaci-vm06.lab.eng.blr.redhat.com:/bricks/brick2/rep2
volume add-brick: failed: 
[root@rhsqaci-vm49 glusterd]# gluster v info rep2
 
Volume Name: rep2
Type: Replicate
Volume ID: c2c36abb-f158-41f5-8292-cccffed0872e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhsqaci-vm49.lab.eng.blr.redhat.com:/bricks/brick2/rep2
Brick2: rhsqaci-vm33.lab.eng.blr.redhat.com:/bricks/brick2/rep2
Brick3: rhsqaci-vm54.lab.eng.blr.redhat.com:/bricks/brick2/rep2
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
###############################################################################

###############################################################################
cmd_history.log
###############################################################################
[2020-06-15 12:47:03.383007]  : v create rep2 replica 3 rhsqaci-vm49.lab.eng.blr.redhat.com:/bricks/brick2/rep2 rhsqaci-vm33.lab.eng.blr.redhat.com:/bricks/brick2/rep2 rhsqaci-vm54.lab.eng.blr.redhat.com:/bricks/brick2/rep2 : SUCCESS
[2020-06-15 12:47:17.551139]  : v start rep2 : SUCCESS
[2020-06-15 12:49:11.397924]  : v add-brick rep2 rhsqaci-vm34.lab.eng.blr.redhat.com:/bricks/brick2/rep2 rhsqaci-vm04.lab.eng.blr.redhat.com:/bricks/brick2/rep2 rhsqaci-vm06.lab.eng.blr.redhat.com:/bricks/brick2/rep2 : FAILED :
###############################################################################


###############################################################################
glusterd.log
###############################################################################
[2020-06-15 12:47:17.557139] I [run.c:242:runner_log] (-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0xee8ea) [0x7f3b41dc88ea] -->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0xee3b5) [0x7f3b41dc83b5] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f3b4dc4fb25] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=rep2 --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2020-06-15 12:47:17.580351] I [run.c:242:runner_log] (-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0xee8ea) [0x7f3b41dc88ea] -->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0xee3b5) [0x7f3b41dc83b5] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f3b4dc4fb25] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=rep2 --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2020-06-15 12:47:17.594051] I [run.c:242:runner_log] (-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0xee8ea) [0x7f3b41dc88ea] -->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0xee3b5) [0x7f3b41dc83b5] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f3b4dc4fb25] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S31ganesha-start.sh --volname=rep2 --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2020-06-15 12:47:57.684981] I [MSGID: 106488] [glusterd-handler.c:1568:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[2020-06-15 12:49:11.389515] I [MSGID: 106482] [glusterd-brick-ops.c:322:__glusterd_handle_add_brick] 0-management: Received add brick req
[2020-06-15 12:49:11.395460] E [MSGID: 106061] [glusterd-utils.c:14879:glusterd_check_brick_order] 0-management: Bricks check : Could not retrieve replica count
[2020-06-15 12:49:11.395516] E [MSGID: 106271] [glusterd-brick-ops.c:1589:glusterd_op_stage_add_brick] 0-management: Not adding brick because of bad brick order.
[2020-06-15 12:49:11.395551] W [MSGID: 106121] [glusterd-mgmt.c:169:gd_mgmt_v3_pre_validate_fn] 0-management: ADD-brick prevalidation failed.
[2020-06-15 12:49:11.395577] E [MSGID: 106121] [glusterd-mgmt.c:1083:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Add brick on local node
[2020-06-15 12:49:11.395601] E [MSGID: 106121] [glusterd-mgmt.c:2472:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed
The message "I [MSGID: 106488] [glusterd-handler.c:1568:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2020-06-15 12:47:57.684981] and [2020-06-15 12:47:57.688013]
[2020-06-15 14:08:33.659544] I [MSGID: 106488] [glusterd-handler.c:1568:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
###############################################################################

Version-Release number of selected component (if applicable):
glusterfs-6.0-38

How reproducible:
Always

Steps to Reproduce:
1.Create a 6 node gluster cluster.
2.Create and start a volume of any type let's say 1x3.
3.Add bricks 3 bricks to this volume.

Actual results:
Add brick operation fails 

Expected results:
Add brick should pass and bricks should be added to volume


Additional info:
- From the logs it looks like this issue is introduced due to the fix sent for Bug 1524457.
- This issue is observed in all types of volumes. 
- This bug is causing all add brick testcases to fail with the below log:
###############################################################################
2020-06-15 04:54:12,992 INFO (expand_volume) Adding bricks to the volume: testvol_distributed-dispersed
2020-06-15 04:54:12,992 INFO (run) root.eng.blr.redhat.com (cp): gluster volume add-brick testvol_distributed-dispersed   rhsqaci-vm49.lab.eng.blr.redhat.com:/bricks/brick2/testvol_distributed-dispersed_brick12 rhsqaci-vm22.lab.eng.blr.redhat.com:/bricks/brick2/testvol_distributed-dispersed_brick13 rhsqaci-vm56.lab.eng.blr.redhat.com:/bricks/brick2/testvol_distributed-dispersed_brick14 rhsqaci-vm23.lab.eng.blr.redhat.com:/bricks/brick2/testvol_distributed-dispersed_brick15 rhsqaci-vm24.lab.eng.blr.redhat.com:/bricks/brick2/testvol_distributed-dispersed_brick16 rhsqaci-vm25.lab.eng.blr.redhat.com:/bricks/brick2/testvol_distributed-dispersed_brick17
2020-06-15 04:54:12,993 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.eng.blr.redhat.com
2020-06-15 04:54:13,149 INFO (_log_results) ^[[34;1mRETCODE (root.eng.blr.redhat.com): 2^[[0m
2020-06-15 04:54:13,150 INFO (_log_results) ^[[31;1mSTDERR (root.eng.blr.redhat.com)...
volume add-brick: failed:
^[[0m
2020-06-15 04:54:13,150 ERROR (expand_volume) Failed to add bricks to the volume: volume add-brick: failed:

2020-06-15 04:54:13,151 INFO (tearDown) Wait for IO to complete as IO validation did not succeed in test method
###############################################################################

Comment 11 milind 2020-09-28 06:19:00 UTC
========================================================
[node1.example.com]# gluster volume create myvolume replica 3 node1:/bricks/brick0/myvolume   node2:/bricks/brick0/myvolume node3:/bricks/brick0/myvolume


[node1.example.com]# gluster v info
Volume Name: myvolume
Type: Replicate
Volume ID: 2b0862a7-705c-466b-b7d6-420520e7e1e5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.16.159.52:/bricks/brick0/myvolume
Brick2: 10.16.159.143:/bricks/brick0/myvolume
Brick3: 10.16.159.78:/bricks/brick0/myvolume
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

[node.example.com]# gluster v add-brick myvolume 10.16.159.52:/bricks/brick2/myvolume 10.16.159.143:/bricks/brick2/myvolume 10.16.159.78:/bricks/brick2/myvolume
volume add-brick: success

--------------------------------------------

No add-brick test cases are failing on CI hence marking the bug as verified 
===================================================================

Comment 13 errata-xmlrpc 2020-12-17 04:51:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5603


Note You need to log in before you can comment on or make changes to this bug.