Bug 1572599

Summary: converting distribute to distribute-replicate is failing
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijay Avuthu <vavuthu>
Component: glusterdAssignee: Nikhil Ladha <nladha>
Status: CLOSED WONTFIX QA Contact: Bala Konda Reddy M <bmekala>
Severity: medium Docs Contact:
Priority: low    
Version: rhgs-3.4CC: amukherj, ksubrahm, rhs-bugs, sheggodu, srakonde, storage-qa-internal, vavuthu, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-10 11:59:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vijay Avuthu 2018-04-27 11:53:27 UTC
Description of problem:

converting distribute ( 2 bricks )  to distribute-replicate ( 2 * 3 ) is failing with with "Commit failed on <node>"

Version-Release number of selected component (if applicable):

glusterfs-3.12.2-8.el7rhgs.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Create distribute ( 2 bricks )volume and start
2. add bricks so that it will convert to Distribute-Replicate ( 2 * 3 ) volume.


> 
# gluster vol info
 
Volume Name: dist
Type: Distribute
Volume ID: f648ecbc-5763-4ee0-9496-b1c5379cc480
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.70.35.61:/bricks/brick1/b0
Brick2: 10.70.35.174:/bricks/brick1/b1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.localtime-logging: disable
# 

> 
#gluster vol add-brick dist replica 3 10.70.35.17:/bricks/brick1/b2 10.70.35.163:/bricks/brick1/b3 10.70.35.136:/bricks/brick1/b4 10.70.35.214:/bricks/brick1/b5

volume add-brick: failed: Commit failed on dhcp35-17.lab.eng.blr.redhat.com. Please check log file for details.
Commit failed on dhcp35-214.lab.eng.blr.redhat.com. Please check log file for details.
Commit failed on dhcp35-163.lab.eng.blr.redhat.com. Please check log file for details.
Commit failed on dhcp35-136.lab.eng.blr.redhat.com. Please check log file for details.
#

# gluster vol status 
Status of volume: dist
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.61:/bricks/brick1/b0         49152     0          Y       6697 
Brick 10.70.35.17:/bricks/brick1/b2         N/A       N/A        N       N/A  
Brick 10.70.35.163:/bricks/brick1/b3        N/A       N/A        N       N/A  
Brick 10.70.35.174:/bricks/brick1/b1        49152     0          Y       5628 
Brick 10.70.35.136:/bricks/brick1/b4        N/A       N/A        N       N/A  
Brick 10.70.35.214:/bricks/brick1/b5        N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       6842 
Self-heal Daemon on dhcp35-214.lab.eng.blr.
redhat.com                                  N/A       N/A        N       N/A  
Self-heal Daemon on dhcp35-17.lab.eng.blr.r
edhat.com                                   N/A       N/A        N       N/A  
Self-heal Daemon on dhcp35-163.lab.eng.blr.
redhat.com                                  N/A       N/A        N       N/A  
Self-heal Daemon on dhcp35-136.lab.eng.blr.
redhat.com                                  N/A       N/A        N       N/A  
Self-heal Daemon on dhcp35-174.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       5744 
 
Task Status of Volume dist
------------------------------------------------------------------------------
There are no active volume tasks
 
# 

> glusterd logs from failed node:

[2018-04-27 11:19:31.807111] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x39f05) [0x7fd5810c2f05] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe29dd) [0x7fd58116b9dd] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fd58c672805] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=dist --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd
[2018-04-27 11:19:31.807293] I [MSGID: 106578] [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: replica-count is set 3
[2018-04-27 11:19:31.807340] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 2, need to change it
[2018-04-27 11:19:32.960558] E [MSGID: 106054] [glusterd-utils.c:13609:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.add-brick : Read-only file system [Read-only file system]
[2018-04-27 11:19:32.992604] E [MSGID: 106074] [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add bricks
[2018-04-27 11:19:32.992704] E [MSGID: 106123] [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit failed.
[2018-04-27 11:19:32.992755] E [MSGID: 106123] [glusterd-mgmt-handler.c:603:glusterd_handle_commit_fn] 0-management: commit failed on operation Add brick
[2018-04-27 11:21:29.338876] I [MSGID: 106488] [glusterd-handler.c:1549:__glusterd_handle_cli_get_volume] 0-management: Received get vol req

Additional info:


sos reports:

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/vavuthu/volume_convertion/

Comment 15 Atin Mukherjee 2018-11-27 03:04:46 UTC
Vijay - the needinfo is pending since May. Can you please clear it at earliest?

Comment 17 Atin Mukherjee 2018-12-21 08:09:17 UTC
Karthik - I'd like to see this bug to get addressed in 3.4 BU5.Please plan it accordingly.