Bug 1793398

Summary: RHHI-V deployment failed using the ansible playbook execution from command line
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: SATHEESARAN <sasundar>
Component: rhhiAssignee: Gobinda Das <godas>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhhiv-1.6CC: asriram, godas, kvanbesi, pasik, rhs-bugs
Target Milestone: ---   
Target Release: RHHI-V 1.8   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: gluster-ansible-roles-1.0.5-7.el8rhgs Doc Type: Bug Fix
Doc Text:
Previously, running the deployment playbook from the command line interface failed because of incorrect values for variables (he_ansible_host_name and he_mem_size_MB). Variable values have been updated and the deployment playbook now runs correctly.
Story Points: ---
Clone Of: 1769382 Environment:
Last Closed: 2020-08-04 14:51:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1769382    
Bug Blocks: 1722076, 1779975    

Description SATHEESARAN 2020-01-21 09:22:07 UTC
Description of problem:
-----------------------
When using the ansible role to deploy the hosted engine this times out during the "Wait for local vm" task.

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. follow the steps in the "Automating RHHI for Virtualization deployment" document

Actual results:

TASK [ovirt.hosted_engine_setup : Wait for the local VM] ******************************************************************************************************************************************************
fatal: [localhost -> zplk1028.adm.siverek.enedis.fr]: FAILED! => {"changed": false, "elapsed": 186, "msg": "timed out waiting for ping module test success: Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this.  Please add this host's fingerprint to your known_hosts file to manage this host."}

Expected results:
-----------------
Deployment succeeds

Additional info:
----------------
This appears to be an issue zhere ssh-ing in to the vm (which was correctly started) was impossible.
As a workaround I logged in to the hosted engine VM frm a different window to make sure the host key was in known_hosts. This made the script progres, but it landed in the next issue...

TASK [ovirt.hosted_engine_setup : Add an entry for this host on /etc/hosts on the local VM] *******************************************************************************************************************
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: u\"hostvars['zplk0023']\" is undefined\n\nThe error appears to be in '/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/bootstrap_local_vm/03_engine_initial_tasks.yml': line 8, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n      timeout: 180\n  - name: Add an entry for this host on /etc/hosts on the local VM\n    ^ here\n"}


Fix is required in gluster-ansible-roles to remove the param 'he_ansible_host_name' from /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/he_gluster_vars.json


[root@ ~]# cat /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/he_gluster_vars.json
{
  "he_appliance_password": "encrypt-password-using-ansible-vault",
  "he_admin_password": "UI-password-for-login",
  "he_domain_type": "glusterfs",
  "he_fqdn": "FQDN-for-Hosted-Engine",
  "he_vm_mac_addr": "Valid MAC address",
  "he_default_gateway": "Valid Gateway",
  "he_mgmt_network": "ovirtmgmt",
  "he_ansible_host_name": "host1",    <<------------------- This needs to be removed
  "he_storage_domain_name": "HostedEngine",
  "he_storage_domain_path": "/engine",
  "he_storage_domain_addr": "host1",
  "he_mount_options": "backup-volfile-servers=host2:host3",
  "he_bridge_if": "interface name for bridge creation",
  "he_enable_hc_gluster_service": true,
  "he_mem_size_MB": "4096",
  "he_cluster": "Default"
}

Comment 1 SATHEESARAN 2020-01-21 09:32:22 UTC
@Anjana,

This needs to be documented as a known_issue for RHV 4.3.8 based RHHI-V 1.7

Problem: RHHI-V deployment fails using ansible playbook from commandline
Workaround: Remove the entry 'he_ansible_host_name: host1' from /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/he_gluster_vars.json
and proceed with the deployment

Comment 2 SATHEESARAN 2020-01-21 12:46:39 UTC
@Gobinda, Also make sure that the HE VM is allocated with 16GB of ram, with this change in vars
"he_mem_size_MB": "16384"

Comment 5 Gobinda Das 2020-02-19 09:22:43 UTC
Clearing need info as it's documented as known_issue with correct message

Comment 6 SATHEESARAN 2020-03-20 13:18:28 UTC
Tested with gluster-ansible-roles-1.0.5-7.el8rhgs

Contents of he_gluster_vars.json file below:

[root@ ~]# cat /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/he_gluster_vars.json 
{
  "he_appliance_password": "encrypt-password-using-ansible-vault",
  "he_admin_password": "UI-password-for-login",
  "he_domain_type": "glusterfs",
  "he_fqdn": "FQDN-for-Hosted-Engine",
  "he_vm_mac_addr": "Valid MAC address",
  "he_default_gateway": "Valid Gateway",
  "he_mgmt_network": "ovirtmgmt",
  "he_storage_domain_name": "HostedEngine",
  "he_storage_domain_path": "/engine",
  "he_storage_domain_addr": "host1-backend-network-FQDN",
  "he_mount_options": "backup-volfile-servers=host2-backend-network-FQDN:host3-backend-network-FQDN",
  "he_bridge_if": "interface name for bridge creation",
  "he_enable_hc_gluster_service": true,
  "he_mem_size_MB": "16384",
  "he_cluster": "Default"
}

1. he_ansible_host_name is removed
2. he_mem_size_MB is updated to 16384

Ansible based CLI deployment is successful

Comment 9 Gobinda Das 2020-07-15 06:25:12 UTC
Doc text looks good.

Comment 11 errata-xmlrpc 2020-08-04 14:51:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHHI for Virtualization 1.8 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3314