Bug 1305268

Summary:	Unable to add reconstructed hostname to cluster
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Peter Portante <pportant>
Component:	doc-Administration_Guide	Assignee:	storage-doc
Status:	CLOSED WONTFIX	QA Contact:	SATHEESARAN <sasundar>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.0	CC:	amukherj, perfbz, pportant, sasundar, smohan, surs, vbellur
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-16 18:18:58 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Peter Portante 2016-02-06 14:35:42 UTC

Following the steps as documented here: https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/sect-Replacing_Hosts.html#Replacing_a_Host_Machine_with_the_Same_Hostname

Step 12 is failing in my environment:

# mount -t glusterfs gprfs012:/pbench /mnt/p2
Mount failed. Please check the log file for more details.
# cat /var/log/glusterfs/mnt-p2.log
[2016-02-06 14:31:55.202117] I [MSGID: 100030] [glusterfsd.c:2019:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.6.0.53 (args: /usr/sbin/glusterfs --volfile-server=localhost --volfile-id=/pbench /mnt/p2)
[2016-02-06 14:31:55.210756] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2016-02-06 14:31:55.210844] E [socket.c:2213:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)
[2016-02-06 14:31:55.210869] E [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Transport endpoint is not connected)
[2016-02-06 14:31:55.210880] I [glusterfsd-mgmt.c:1817:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-02-06 14:31:55.211031] W [glusterfsd.c:1183:cleanup_and_exit] (--> 0-: received signum (1), shutting down
[2016-02-06 14:31:55.211060] I [fuse-bridge.c:5584:fini] 0-fuse: Unmounting '/mnt/p2'.


The instructions seem wrong.  In restoring the host as described, there does not appear to be a step where we get the vol files for the volume restored.

Comment 2 SATHEESARAN 2016-02-08 11:37:36 UTC

(In reply to Peter Portante from comment #0)
> Following the steps as documented here:
> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/
> Administration_Guide/sect-Replacing_Hosts.
> html#Replacing_a_Host_Machine_with_the_Same_Hostname
> 
> Step 12 is failing in my environment:
> 
> # mount -t glusterfs gprfs012:/pbench /mnt/p2
> Mount failed. Please check the log file for more details.
> # cat /var/log/glusterfs/mnt-p2.log
> [2016-02-06 14:31:55.202117] I [MSGID: 100030] [glusterfsd.c:2019:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.6.0.53
> (args: /usr/sbin/glusterfs --volfile-server=localhost --volfile-id=/pbench
> /mnt/p2)
> [2016-02-06 14:31:55.210756] I
> [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with
> index 1
> [2016-02-06 14:31:55.210844] E [socket.c:2213:socket_connect_finish]
> 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)
> [2016-02-06 14:31:55.210869] E [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
> 0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Transport
> endpoint is not connected)
> [2016-02-06 14:31:55.210880] I [glusterfsd-mgmt.c:1817:mgmt_rpc_notify]
> 0-glusterfsd-mgmt: Exhausted all volfile servers
> [2016-02-06 14:31:55.211031] W [glusterfsd.c:1183:cleanup_and_exit] (--> 0-:
> received signum (1), shutting down
> [2016-02-06 14:31:55.211060] I [fuse-bridge.c:5584:fini] 0-fuse: Unmounting
> '/mnt/p2'.
> 
> 
> The instructions seem wrong.  In restoring the host as described, there does
> not appear to be a step where we get the vol files for the volume restored.

Hi Peter,

How many nodes are there in the gluster cluster ?

As per step 12, you are trying to fuse mount the volume as  :
# mount -t glusterfs gprfs012:/pbench /mnt/p2

Does this machine gprfs012 is up ? Does glusterd is running on this machine ?

I have tested the steps and it seem to work for me

Comment 3 Peter Portante 2016-02-08 11:59:50 UTC

> How many nodes are there in the gluster cluster ?

6

> Does this machine gprfs012 is up ?

Yes

> Does glusterd is running on this machine ?

No

Which is why I am concerned about these steps.

Here is my cluster state:

gprfs001 - Connected
gprfs002 - Connected
gprfs009 - Connected
gprfs010 - Connected
gprfs011 - Connected
gprfs012 - Rejected

For the documentation, where the document is using "sys0" I am using "gprfs012".  Where the documentation is using "sys1", I used "gprfs011".

However, the documentation, for this example, should explicitly use at least a three-node cluster to differentiate itself from the following sub-section which describes the two-node setup, see "Replacing a host with the same Hostname in a two-node Red Hat Storage Trusted Storage Pool".

To that end, introducing a "sys2" node explicitly in these instructions, e.g. in Step 10 where it says, "Select any other host in the cluster other than the node (sys1.example.com) selected in step 4", I would suggest it should say "sys2". Then if you lay out the three node scenario at the start of these steps with their names, and tell the reader they can pick any three nodes of their cluster N >= 3 cluster to match, it would really help.

If I perform this mount command on another node in the cluster, e.g. gprfs011, then the mount works.

So if that is what these instructions expect, then for step 12, where it says "server-name" we should replace that with "sys1 or sys2".

Comment 4 Atin Mukherjee 2016-02-08 13:09:40 UTC

Sas,

Could you check this once?

~Atin

Comment 5 Peter Portante 2016-02-09 00:56:25 UTC

I reviewed this with Vjay Bellur and he told me that step twelve could be performed on any other member of the cluster.  Once that was performed, then we were able to complete the steps and self-heal is on its way to full recover.

He said he would update this bugzilla with what needs to be changed in the documentation.

Comment 6 SATHEESARAN 2016-02-09 02:17:40 UTC

(In reply to Peter Portante from comment #5)
> I reviewed this with Vjay Bellur and he told me that step twelve could be
> performed on any other member of the cluster.  Once that was performed, then
> we were able to complete the steps and self-heal is on its way to full
> recover.
> 
> He said he would update this bugzilla with what needs to be changed in the
> documentation.

Thanks Peter.

Hence this bug should be a doc correction / enhancement.

Comment 7 Vijay Bellur 2016-02-11 14:42:12 UTC

After a round of discussions, we are considering automating the entire node replacement procedure using ansible based gdeploy. Here is the current consideration from Sachidananda on how the procedure would look like:

"We can write a module in gdeploy to do this.
As far as user is concerned he/she has to write a simple config file. 
Config file will look more or less like this:

[hosts]
<ip of any healthy machine>

[replace-host:<ip-of-new-machine>]
disk=sdb
mountpoint=/mnt/gluster/brick1

[volume]
<volname>


gdeploy -c replace-host.conf


This will take care of copying the files from the healthy machine, setting up uuid, setting backend with thin-p/thick-p which ever is needed, mount and start self-heal."

Peter - please feel free to share your inputs on this procedure.

Comment 8 Peter Portante 2016-02-11 15:15:10 UTC

(In reply to Vijay Bellur from comment #7)
> "We can write a module in gdeploy to do this.

Great idea!

> As far as user is concerned he/she has to write a simple config file. 
> Config file will look more or less like this:
> 
> [hosts]
> <ip of any healthy machine>

Could we derive the healthy vs. unhealthy machined status from "gluster peer status"?  And further automate this procedure so that it is driven from a healthy node?

Ideally, I'd like to see a tool that does, "gluster recover failed host", which automates the entire process, no config file writing.

But this as a first step to that would be welcomed.

> 
> [replace-host:<ip-of-new-machine>]
> disk=sdb
> mountpoint=/mnt/gluster/brick1

In this case, all bricks on the host need to be healed, so I am assuming the bricks to be healed could be derived and then checked that they are ready for incorporation back into their bricks?

> 
> [volume]
> <volname>
> 
> 
> gdeploy -c replace-host.conf

This is a great first start.  Looking forward to it.

Comment 9 Sachidananda Urs 2016-02-12 07:47:45 UTC

(In reply to Peter Portante from comment #8)
> (In reply to Vijay Bellur from comment #7)
> > "We can write a module in gdeploy to do this.
> 
> Great idea!
> 
> > As far as user is concerned he/she has to write a simple config file. 
> > Config file will look more or less like this:
> > 
> > [hosts]
> > <ip of any healthy machine>
> 
> Could we derive the healthy vs. unhealthy machined status from "gluster peer
> status"?  And further automate this procedure so that it is driven from a
> healthy node?
> 

This tool is run from a node (maybe from your laptop/desktop) which need not
be part of the cluster. And the configuration file is written by the user,
and not auto-generated. So, this particular procedure cannot be automated.

> Ideally, I'd like to see a tool that does, "gluster recover failed host",
> which automates the entire process, no config file writing.
> 
> But this as a first step to that would be welcomed.
> 
> > 
> > [replace-host:<ip-of-new-machine>]
> > disk=sdb
> > mountpoint=/mnt/gluster/brick1
> 
> In this case, all bricks on the host need to be healed, so I am assuming the
> bricks to be healed could be derived and then checked that they are ready
> for incorporation back into their bricks?
> 

We can mention more than one disk in the configuration. That should solve this.

> > 
> > [volume]
> > <volname>
> > 
> > 
> > gdeploy -c replace-host.conf
> 
> This is a great first start.  Looking forward to it.

Comment 10 Peter Portante 2016-02-12 14:55:31 UTC

(In reply to Sachidananda Urs from comment #9)
> This tool is run from a node (maybe from your laptop/desktop) which need not
> be part of the cluster. And the configuration file is written by the user,
> and not auto-generated. So, this particular procedure cannot be automated.

For this first iteration, I can see a manual process, but I am unclear as to why it cannot be automated in the future.  Don't we have all the information from gluster peer status?

Comment 11 Sachidananda Urs 2016-02-13 13:25:15 UTC

(In reply to Peter Portante from comment #10)
> (In reply to Sachidananda Urs from comment #9)
> > This tool is run from a node (maybe from your laptop/desktop) which need not
> > be part of the cluster. And the configuration file is written by the user,
> > and not auto-generated. So, this particular procedure cannot be automated.
> 
> For this first iteration, I can see a manual process, but I am unclear as to
> why it cannot be automated in the future.  Don't we have all the information
> from gluster peer status?

This has to do more with how gdeploy(Ansible underneath) works.
The config file is static, the minimum requirements it should contain include the hostname/ip of the nodes, device (sdb, vda, vdb, ...).

With the given set of hosts we can detect which peer has to be replaced (by running gluster peer status), however it is required to know which of the backend disks should be used. Which needs to be explicitly mentioned.

Comment 12 Peter Portante 2016-02-13 15:27:18 UTC

(In reply to Sachidananda Urs from comment #11)
> This has to do more with how gdeploy(Ansible underneath) works.
> The config file is static, the minimum requirements it should contain
> include the hostname/ip of the nodes, device (sdb, vda, vdb, ...).

Yes, the config file should describe the full layout of the gluster cluster and device assignments in ansible variables.

> With the given set of hosts we can detect which peer has to be replaced (by
> running gluster peer status), however it is required to know which of the
> backend disks should be used. Which needs to be explicitly mentioned.

Not sure I follow this.  If we have the full configuration described, then can't gdeploy detect the host that needs rebuilding and execute the necessary steps?

Comment 13 Sachidananda Urs 2016-02-13 15:51:25 UTC

(In reply to Peter Portante from comment #12)
> (In reply to Sachidananda Urs from comment #11)
> 
> > With the given set of hosts we can detect which peer has to be replaced (by
> > running gluster peer status), however it is required to know which of the
> > backend disks should be used. Which needs to be explicitly mentioned.
> 
> Not sure I follow this.  If we have the full configuration described, then
> can't gdeploy detect the host that needs rebuilding and execute the
> necessary steps?

Host can be detected, but we cannot assume which disk on the host to be used,
which we expect the user to explicitly mention.

For example if the config file is:

Option 1

[hosts]
host1
host2
host3

[volume]
name=vol1
action=replace-node
disk=sdb

===========================

Or 

Option 2

[hosts]
host1
host2
host3

[volume]
name=vol1

[replace-node]
disk=sdb

============================

Option 1 is not intuitive (as of now) because volume does not have sub-command replace-node yet.

With replace-node we can figure out which node to replace with peer status. However, what if there are multiple nodes that are down and user wants to replace subset of them?