Bug 1540910

Summary:	Lack of maintenance instructions around cluster restart
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	David Sundqvist <dsundqvi>
Component:	doc-Maintaining_RHHI	Assignee:	Laura Bailey <lbailey>
Status:	CLOSED CURRENTRELEASE	QA Contact:	SATHEESARAN <sasundar>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	rhhi-1.1	CC:	asriram, bkunal, inetkach, lbailey, rhs-bugs, sabose, sasundar
Target Milestone:	---
Target Release:	RHHI-V 1.5
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-12-17 09:32:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1534399

Description David Sundqvist 2018-02-01 09:59:58 UTC

Description of problem:
RHHI adds a level of complexity to shutting down and bringing up an entire cluster, and there is no documentation around the proper procedure to do it or checks to perform to ensure that startup will be free from issues.

Version-Release number of selected component (if applicable):
rhhi-1.1

How reproducible:
Always

Steps to Reproduce:
1. Need to do cluster maintenance
2. Look for proper procedure
3. Find none

Actual results:
No docs

Expected results:
A helpful list of steps and checks

Additional info:

Comment 2 SATHEESARAN 2018-02-01 15:43:29 UTC

Sahina/Laura,

Can we take this doc bug for RHHI 2.0, as it looks necessary to have this step documented for proper shutting down of POD ?

Comment 7 Sahina Bose 2018-02-20 09:16:56 UTC

I think the steps would be:

1. Enable Global HA maintenance
2. Shutdown all VMs
3. Shutdown Hosted Engine VM
3. Shutdown the hosts.

To bring it back online
1. Start the hosts
2. Start the glusterd process on all hosts (if not running). Ensure that gluster volume status shows all bricks as online
3. Start HostedEngine VM
4. Start all VMs

Comment 14 SATHEESARAN 2018-07-17 09:46:18 UTC

(In reply to Sahina Bose from comment #7)
> I think the steps would be:
> 
> 1. Enable Global HA maintenance
> 2. Shutdown all VMs
> 3. Shutdown Hosted Engine VM
> 3. Shutdown the hosts.
> 
> To bring it back online
> 1. Start the hosts
> 2. Start the glusterd process on all hosts (if not running). Ensure that
> gluster volume status shows all bricks as online
> 3. Start HostedEngine VM
> 4. Start all VMs

Tested the same. Here are more refined steps.

Powering off the cluster
------------------------
1. Enable Global maintenance of hosted engine VM.
Go to cockpit UI ( http://<hostname>:9090
Virtualization -> HostedEngine -> "Put this cluster in to Global maintenance"

The same can be achieved via CLI on any of the host in the cluster with hosted-engine deployed.
# hosted-engine --set-maintenance --mode=global

2. Shut down all the VMs from RHV Manager UI

3. Shutdown the HostedEngine VM from the host where its running
# hosted-engine --vm-poweroff

4. Shutdown the hosts
# shutdown -h now

Powering-up the cluster
-----------------------
1. Power on all the hosts in the cluster
2. Perform sanity check of the machines
       a. glusterd service should be started
           # systemctl status glusterd
        Note that if glusterd service is not running, please start the service
           # systemctl start glusterd
     
       b. All the networks should have the IP
          There is a known issue ( BZ 1590264 ), if there are no IPs on that network edit /etc/sysconfig/network-scripts/ifcfg-<interface>, and add
BOOTPROTO=dhcp
       c. Gluster peers in the cluster should be connected on all the hosts in the cluster. Check that using:
          # gluster peer status
       d. Check gluster volume status, bricks should be up.
          # gluster volume status
3. Start the Hosted Engine VM from one of the node.
    # hosted-engine --vm-start

4. Verify that the HostedEngine VM is up.
    # hosted-engine --vm-status

5. Remove the hostedengine VM from 'Global Maintenance'
    Login in to cockpit UI of one of the host ( http://<hostname>:9090 ).
    Navigate to 'Virtualization' -> 'HostedEngine' -> 'Remove this host from maintenance'

    This can be done using command line interface on one of hosts using the following command:
    # hosted-engine --set-maintenance --mode=none

6. Once the engine is up, RHV Manager UI should be reachable, make sure all the storage domains are up and then start the VMs

Comment 15 SATHEESARAN 2018-07-17 09:58:44 UTC

(In reply to SATHEESARAN from comment #14)
> (In reply to Sahina Bose from comment #7)
> > I think the steps would be:
> > 
> > 1. Enable Global HA maintenance
> > 2. Shutdown all VMs
> > 3. Shutdown Hosted Engine VM
> > 3. Shutdown the hosts.
> > 
> > To bring it back online
> > 1. Start the hosts
> > 2. Start the glusterd process on all hosts (if not running). Ensure that
> > gluster volume status shows all bricks as online
> > 3. Start HostedEngine VM
> > 4. Start all VMs
> 
> Tested the same. Here are more refined steps.
> 
> Powering off the cluster
> ------------------------
> 1. Enable Global maintenance of hosted engine VM.
> Go to cockpit UI ( http://<hostname>:9090
> Virtualization -> HostedEngine -> "Put this cluster in to Global maintenance"
> 
> The same can be achieved via CLI on any of the host in the cluster with
> hosted-engine deployed.
> # hosted-engine --set-maintenance --mode=global
> 
> 2. Shut down all the VMs from RHV Manager UI
> 
> 3. Shutdown the HostedEngine VM from the host where its running
> # hosted-engine --vm-poweroff

We can safely opt for shutting down the VM rather than powering it off.
# hosted-engine --vm-shutdown