1540910 – Lack of maintenance instructions around cluster restart

Bug 1540910 - Lack of maintenance instructions around cluster restart

Summary: Lack of maintenance instructions around cluster restart

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	doc-Maintaining_RHHI
Sub Component:
Version:	rhhi-1.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHHI-V 1.5
Assignee:	Laura Bailey
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1534399
TreeView+	depends on / blocked

Reported:	2018-02-01 09:59 UTC by David Sundqvist
Modified:	2022-03-13 14:40 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-12-17 09:32:06 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description David Sundqvist 2018-02-01 09:59:58 UTC

Description of problem:
RHHI adds a level of complexity to shutting down and bringing up an entire cluster, and there is no documentation around the proper procedure to do it or checks to perform to ensure that startup will be free from issues.

Version-Release number of selected component (if applicable):
rhhi-1.1

How reproducible:
Always

Steps to Reproduce:
1. Need to do cluster maintenance
2. Look for proper procedure
3. Find none

Actual results:
No docs

Expected results:
A helpful list of steps and checks

Additional info:

Comment 2 SATHEESARAN 2018-02-01 15:43:29 UTC

Sahina/Laura,

Can we take this doc bug for RHHI 2.0, as it looks necessary to have this step documented for proper shutting down of POD ?

Comment 7 Sahina Bose 2018-02-20 09:16:56 UTC

I think the steps would be:

1. Enable Global HA maintenance
2. Shutdown all VMs
3. Shutdown Hosted Engine VM
3. Shutdown the hosts.

To bring it back online
1. Start the hosts
2. Start the glusterd process on all hosts (if not running). Ensure that gluster volume status shows all bricks as online
3. Start HostedEngine VM
4. Start all VMs

Comment 14 SATHEESARAN 2018-07-17 09:46:18 UTC

(In reply to Sahina Bose from comment #7)
> I think the steps would be:
> 
> 1. Enable Global HA maintenance
> 2. Shutdown all VMs
> 3. Shutdown Hosted Engine VM
> 3. Shutdown the hosts.
> 
> To bring it back online
> 1. Start the hosts
> 2. Start the glusterd process on all hosts (if not running). Ensure that
> gluster volume status shows all bricks as online
> 3. Start HostedEngine VM
> 4. Start all VMs

Tested the same. Here are more refined steps.

Powering off the cluster
------------------------
1. Enable Global maintenance of hosted engine VM.
Go to cockpit UI ( http://<hostname>:9090
Virtualization -> HostedEngine -> "Put this cluster in to Global maintenance"

The same can be achieved via CLI on any of the host in the cluster with hosted-engine deployed.
# hosted-engine --set-maintenance --mode=global

2. Shut down all the VMs from RHV Manager UI

3. Shutdown the HostedEngine VM from the host where its running
# hosted-engine --vm-poweroff

4. Shutdown the hosts
# shutdown -h now

Powering-up the cluster
-----------------------
1. Power on all the hosts in the cluster
2. Perform sanity check of the machines
       a. glusterd service should be started
           # systemctl status glusterd
        Note that if glusterd service is not running, please start the service
           # systemctl start glusterd
     
       b. All the networks should have the IP
          There is a known issue ( BZ 1590264 ), if there are no IPs on that network edit /etc/sysconfig/network-scripts/ifcfg-<interface>, and add
BOOTPROTO=dhcp
       c. Gluster peers in the cluster should be connected on all the hosts in the cluster. Check that using:
          # gluster peer status
       d. Check gluster volume status, bricks should be up.
          # gluster volume status
3. Start the Hosted Engine VM from one of the node.
    # hosted-engine --vm-start

4. Verify that the HostedEngine VM is up.
    # hosted-engine --vm-status

5. Remove the hostedengine VM from 'Global Maintenance'
    Login in to cockpit UI of one of the host ( http://<hostname>:9090 ).
    Navigate to 'Virtualization' -> 'HostedEngine' -> 'Remove this host from maintenance'

    This can be done using command line interface on one of hosts using the following command:
    # hosted-engine --set-maintenance --mode=none

6. Once the engine is up, RHV Manager UI should be reachable, make sure all the storage domains are up and then start the VMs

Comment 15 SATHEESARAN 2018-07-17 09:58:44 UTC

(In reply to SATHEESARAN from comment #14)
> (In reply to Sahina Bose from comment #7)
> > I think the steps would be:
> > 
> > 1. Enable Global HA maintenance
> > 2. Shutdown all VMs
> > 3. Shutdown Hosted Engine VM
> > 3. Shutdown the hosts.
> > 
> > To bring it back online
> > 1. Start the hosts
> > 2. Start the glusterd process on all hosts (if not running). Ensure that
> > gluster volume status shows all bricks as online
> > 3. Start HostedEngine VM
> > 4. Start all VMs
> 
> Tested the same. Here are more refined steps.
> 
> Powering off the cluster
> ------------------------
> 1. Enable Global maintenance of hosted engine VM.
> Go to cockpit UI ( http://<hostname>:9090
> Virtualization -> HostedEngine -> "Put this cluster in to Global maintenance"
> 
> The same can be achieved via CLI on any of the host in the cluster with
> hosted-engine deployed.
> # hosted-engine --set-maintenance --mode=global
> 
> 2. Shut down all the VMs from RHV Manager UI
> 
> 3. Shutdown the HostedEngine VM from the host where its running
> # hosted-engine --vm-poweroff

We can safely opt for shutting down the VM rather than powering it off.
# hosted-engine --vm-shutdown

Note You need to log in before you can comment on or make changes to this bug.