Bug 1275884

Summary: replace-brick commit force refuses to work when it cannot resolve source brick
Product: [Community] GlusterFS Reporter: Earl Ruby <earl>
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: urgent Docs Contact:
Priority: medium    
Version: 3.7.5CC: amukherj, bugs, earl, hans, nbalacha, rgowdapp, smohan, spalai
Target Milestone: ---Keywords: Triaged
Target Release: ---Flags: earl: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 966207 Environment:
Last Closed: 2017-03-08 11:00:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Earl Ruby 2015-10-28 04:33:40 UTC
+++ This bug was initially created as a clone of Bug #966207 +++

Description of problem:

After a peer probe from the cluster to a new server, a replace-brick on either the new server, or any other existing server in the cluster, fails if the new server cannot (DNS) resolve the old server. replace-brick commit force bails out with :
brick: oldserver:/brick/a does not exist in volume: vol01

Version-Release number of selected component (if applicable):

3.3git-v3.3.2qa2-1-g45a9d1e

Actual results:

brick: oldserver:/brick/a does not exist in volume: vol01

Expected results:

replace-brick commit-force should work. the oldserver is irrelevant (and down in my case)

--- Additional comment from Earl Ruby on 2015-10-21 19:12:21 EDT ---

Still happening on GlusterFS 3.7. I created a 3 node cluster (gfs1, gfs2, gfs3), 3 bricks each, then killed gfs2. I added a 4th host (gfs4) and used "peer probe" to add it to the cluster. I can't detach gfs2 because it has bricks in the cluster, and I can't replace gfs2 bricks with gfs4 bricks because it claims that the gfs2 bricks don't exist. Check it out:

root@gfs1:~# gluster volume info
Volume Name: volume1
Type: Distributed-Replicate
Volume ID: 168a6859-91a5-451e-9c32-ab531b79df16
Status: Started
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: gfs1:/data/brick1/files
Brick2: gfs2:/data/brick1/files
Brick3: gfs3:/data/brick1/files
Brick4: gfs1:/data/brick2/files
Brick5: gfs2:/data/brick2/files
Brick6: gfs3:/data/brick2/files
Brick7: gfs1:/data/brick3/files
Brick8: gfs2:/data/brick3/files
Brick9: gfs3:/data/brick3/files
Options Reconfigured:
nfs.disable: off
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on

root@gfs1:~# gluster volume replace-brick volume1 gfs2:/data/brick1/files gfs4:/data/brick1/files commit force
volume replace-brick: failed: brick: gfs2:/data/brick1/files does not exist in volume: volume1

I worked around the problem by adding gfs2 to /etc/hosts on gfs1, gfs3, and gfs4. Once I did that I could replace the bricks.

--- Additional comment from Kaleb KEITHLEY on 2015-10-22 11:40:20 EDT ---

pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.

--- Additional comment from Earl Ruby on 2015-10-27 19:21:19 EDT ---

Kaleb: 

You might have the ability to re-open this bug or set the version number, I do not.

I did state that the problem is still present with GlusterFS 3.7 in my comment above.

Comment 1 Anuradha 2016-06-15 08:50:03 UTC
Atin,

I'm not sure why this is a replicate bug. How would glusterd take care of not being able to resolve hosts in other scenarios?

Thanks,
Anuradha.

Comment 2 Atin Mukherjee 2016-06-15 09:36:39 UTC
(In reply to Anuradha from comment #1)
> Atin,
> 
> I'm not sure why this is a replicate bug. How would glusterd take care of
> not being able to resolve hosts in other scenarios?
> 
> Thanks,
> Anuradha.

Since you recently worked on the replace brick work flow I thought of moving it to you, but I do agree that I shouldn't have changed the component :-\

After going through the details of bug, this has nothing to do with replace brick since glusterd is not able to resolve the hostname.

Comment 3 Atin Mukherjee 2016-06-15 09:38:54 UTC
Could you please attach the logs for this scenario?

Comment 4 Kaushal 2017-03-08 11:00:33 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.