Description of problem: RHHI adds a level of complexity to shutting down and bringing up an entire cluster, and there is no documentation around the proper procedure to do it or checks to perform to ensure that startup will be free from issues. Version-Release number of selected component (if applicable): rhhi-1.1 How reproducible: Always Steps to Reproduce: 1. Need to do cluster maintenance 2. Look for proper procedure 3. Find none Actual results: No docs Expected results: A helpful list of steps and checks Additional info:
Sahina/Laura, Can we take this doc bug for RHHI 2.0, as it looks necessary to have this step documented for proper shutting down of POD ?
I think the steps would be: 1. Enable Global HA maintenance 2. Shutdown all VMs 3. Shutdown Hosted Engine VM 3. Shutdown the hosts. To bring it back online 1. Start the hosts 2. Start the glusterd process on all hosts (if not running). Ensure that gluster volume status shows all bricks as online 3. Start HostedEngine VM 4. Start all VMs
(In reply to Sahina Bose from comment #7) > I think the steps would be: > > 1. Enable Global HA maintenance > 2. Shutdown all VMs > 3. Shutdown Hosted Engine VM > 3. Shutdown the hosts. > > To bring it back online > 1. Start the hosts > 2. Start the glusterd process on all hosts (if not running). Ensure that > gluster volume status shows all bricks as online > 3. Start HostedEngine VM > 4. Start all VMs Tested the same. Here are more refined steps. Powering off the cluster ------------------------ 1. Enable Global maintenance of hosted engine VM. Go to cockpit UI ( http://<hostname>:9090 Virtualization -> HostedEngine -> "Put this cluster in to Global maintenance" The same can be achieved via CLI on any of the host in the cluster with hosted-engine deployed. # hosted-engine --set-maintenance --mode=global 2. Shut down all the VMs from RHV Manager UI 3. Shutdown the HostedEngine VM from the host where its running # hosted-engine --vm-poweroff 4. Shutdown the hosts # shutdown -h now Powering-up the cluster ----------------------- 1. Power on all the hosts in the cluster 2. Perform sanity check of the machines a. glusterd service should be started # systemctl status glusterd Note that if glusterd service is not running, please start the service # systemctl start glusterd b. All the networks should have the IP There is a known issue ( BZ 1590264 ), if there are no IPs on that network edit /etc/sysconfig/network-scripts/ifcfg-<interface>, and add BOOTPROTO=dhcp c. Gluster peers in the cluster should be connected on all the hosts in the cluster. Check that using: # gluster peer status d. Check gluster volume status, bricks should be up. # gluster volume status 3. Start the Hosted Engine VM from one of the node. # hosted-engine --vm-start 4. Verify that the HostedEngine VM is up. # hosted-engine --vm-status 5. Remove the hostedengine VM from 'Global Maintenance' Login in to cockpit UI of one of the host ( http://<hostname>:9090 ). Navigate to 'Virtualization' -> 'HostedEngine' -> 'Remove this host from maintenance' This can be done using command line interface on one of hosts using the following command: # hosted-engine --set-maintenance --mode=none 6. Once the engine is up, RHV Manager UI should be reachable, make sure all the storage domains are up and then start the VMs
(In reply to SATHEESARAN from comment #14) > (In reply to Sahina Bose from comment #7) > > I think the steps would be: > > > > 1. Enable Global HA maintenance > > 2. Shutdown all VMs > > 3. Shutdown Hosted Engine VM > > 3. Shutdown the hosts. > > > > To bring it back online > > 1. Start the hosts > > 2. Start the glusterd process on all hosts (if not running). Ensure that > > gluster volume status shows all bricks as online > > 3. Start HostedEngine VM > > 4. Start all VMs > > Tested the same. Here are more refined steps. > > Powering off the cluster > ------------------------ > 1. Enable Global maintenance of hosted engine VM. > Go to cockpit UI ( http://<hostname>:9090 > Virtualization -> HostedEngine -> "Put this cluster in to Global maintenance" > > The same can be achieved via CLI on any of the host in the cluster with > hosted-engine deployed. > # hosted-engine --set-maintenance --mode=global > > 2. Shut down all the VMs from RHV Manager UI > > 3. Shutdown the HostedEngine VM from the host where its running > # hosted-engine --vm-poweroff We can safely opt for shutting down the VM rather than powering it off. # hosted-engine --vm-shutdown