Bug 1769382

Summary: RHHI-V deployment failed using the ansible playbook execution from command line
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Krist van Besien <kvanbesi>
Component: gluster-ansibleAssignee: Gobinda Das <godas>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhhiv-1.6CC: bshetty, godas, lsurette, mtessun, rhs-bugs, sabose, sasundar, stirabos
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.5.z Batch Update 2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: gluster-ansible-roles-1.0.5-7.el8rhgs Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1793398 (view as bug list) Environment:
rhhiv, rhel8
Last Closed: 2020-06-16 05:57:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1755481, 1793398    

Description Krist van Besien 2019-11-06 14:01:14 UTC
Description of problem:

When using the ansible role to deploy the hosted engine this times out during the "Wait for local vm" task.




How reproducible:

Always

Steps to Reproduce:
1. follow the steps in the "Automating RHHI for Virtualization deployment" document

Actual results:

TASK [ovirt.hosted_engine_setup : Wait for the local VM] ******************************************************************************************************************************************************
fatal: [localhost -> zplk1028.adm.siverek.enedis.fr]: FAILED! => {"changed": false, "elapsed": 186, "msg": "timed out waiting for ping module test success: Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this.  Please add this host's fingerprint to your known_hosts file to manage this host."}



Expected results:

Deployment succeeds

Additional info:

This appears to be an issue zhere ssh-ing in to the vm (which was correctly started) was impossible.
As a workaround I logged in to the hosted engine VM frm a different window to make sure the host key was in known_hosts. This made the script progres, but it landed in the next issue...

TASK [ovirt.hosted_engine_setup : Add an entry for this host on /etc/hosts on the local VM] *******************************************************************************************************************
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: u\"hostvars['zplk0023']\" is undefined\n\nThe error appears to be in '/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/bootstrap_local_vm/03_engine_initial_tasks.yml': line 8, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n      timeout: 180\n  - name: Add an entry for this host on /etc/hosts on the local VM\n    ^ here\n"}

Comment 1 Sandro Bonazzola 2019-11-06 15:34:08 UTC
Can you please provide "rpm -qa |grep ovirt" output? trying to understand which version is affected.

Comment 2 Sandro Bonazzola 2019-11-20 08:20:08 UTC
Closing with insufficient data resolution. If you can provide requested information please reopen.

Comment 3 SATHEESARAN 2020-01-20 09:54:50 UTC
(In reply to Sandro Bonazzola from comment #1)
> Can you please provide "rpm -qa |grep ovirt" output? trying to understand
> which version is affected.

I see the exact same problem.
Here are the details that was requested earlier:
[root@rhsqa-grafton7-nic2 defaults]# rpm -qa | grep ovirt
ovirt-ansible-repositories-1.1.5-1.el7ev.noarch
ovirt-hosted-engine-ha-2.3.5-1.el7ev.noarch
ovirt-vmconsole-1.0.7-3.el7ev.noarch
python-ovirt-engine-sdk4-4.3.2-1.el7ev.x86_64
ovirt-host-deploy-common-1.8.2-1.el7ev.noarch
ovirt-ansible-hosted-engine-setup-1.0.28-1.el7ev.noarch
cockpit-ovirt-dashboard-0.13.8-1.el7ev.noarch
ovirt-vmconsole-host-1.0.7-3.el7ev.noarch
cockpit-machines-ovirt-195-1.el7.noarch
python2-ovirt-node-ng-nodectl-4.3.6-0.20190820.0.el7ev.noarch
ovirt-provider-ovn-driver-1.2.22-1.el7ev.noarch
ovirt-hosted-engine-setup-2.3.12-1.el7ev.noarch
ovirt-node-ng-nodectl-4.3.6-0.20190820.0.el7ev.noarch
ovirt-imageio-daemon-1.5.2-0.el7ev.noarch
ovirt-host-4.3.4-1.el7ev.x86_64
python2-ovirt-setup-lib-1.2.0-1.el7ev.noarch
ovirt-ansible-engine-setup-1.1.9-1.el7ev.noarch
ovirt-host-dependencies-4.3.4-1.el7ev.x86_64
ovirt-imageio-common-1.5.2-0.el7ev.x86_64
python2-ovirt-host-deploy-1.8.2-1.el7ev.noarch

Comment 4 Evgeny Slutsky 2020-01-20 13:20:16 UTC
Hi,
in the extra vars the following vars were configured:

{
  "he_appliance_password": "****",
  "he_admin_password": "****",
  "he_domain_type": "glusterfs",
  "he_fqdn": "hostedenginesm3.lab.eng.blr.redhat.com",
  "he_vm_mac_addr": "00:47:55:20:49:01",
  "he_default_gateway": "10.70.37.254",
  "he_mgmt_network": "ovirtmgmt",
  "he_ansible_host_name": "rhsqa-grafton7-nic2.lab.eng.blr.redhat.com",
  "he_storage_domain_name": "HostedEngine",
  "he_storage_domain_path": "/engine",
  "he_storage_domain_addr": "rhsqa-grafton7.lab.eng.blr.redhat.com",
  "he_mount_options": "backup-volfile-servers=rhsqa-grafton8.lab.eng.blr.redhat.com:rhsqa-grafton9.lab.eng.blr.redhat.com",
  "he_bridge_if": "enp129s0f0",
  "he_enable_hc_gluster_service": true,
  "he_mem_size_MB": "16384",
  "he_cluster": "Default"
}

its appears that `he_ansible_host_name` cannot be changed to something other then default `localhost` , because it relies on the ansible facts (which are gathered for localhost) .
when removing this line `"he_ansible_host_name": "rhsqa-grafton7-nic2.lab.eng.blr.redhat.com",`  deployment  progressed.

@simone  can you confirm please.

Comment 5 SATHEESARAN 2020-01-21 07:08:40 UTC
(In reply to Sandro Bonazzola from comment #1)
> Can you please provide "rpm -qa |grep ovirt" output? trying to understand
> which version is affected.

As I have provided the required information, removing the needinfo on Krist van Besien

Comment 6 SATHEESARAN 2020-01-21 08:43:05 UTC
Removed 'he_ansible_host_name' from extra vars and it lead to the success.
Fix needs to be done at gluster-ansible-roles package that provides this extra vars file.

[root@ ~]# rpm -qf /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/he_gluster_vars.json 
gluster-ansible-roles-1.0.5-7.el7rhgs.noarch

So I will move this bug as part of gluster-ansible component

Comment 7 SATHEESARAN 2020-01-21 08:44:03 UTC
*** Bug 1755481 has been marked as a duplicate of this bug. ***

Comment 8 SATHEESARAN 2020-01-21 09:17:06 UTC
Fix is required in gluster-ansible-roles to remove the param 'he_ansible_host_name' from /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/he_gluster_vars.json


[root@ ~]# cat /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/he_gluster_vars.json
{
  "he_appliance_password": "encrypt-password-using-ansible-vault",
  "he_admin_password": "UI-password-for-login",
  "he_domain_type": "glusterfs",
  "he_fqdn": "FQDN-for-Hosted-Engine",
  "he_vm_mac_addr": "Valid MAC address",
  "he_default_gateway": "Valid Gateway",
  "he_mgmt_network": "ovirtmgmt",
  "he_ansible_host_name": "host1",    <<------------------- This needs to be removed
  "he_storage_domain_name": "HostedEngine",
  "he_storage_domain_path": "/engine",
  "he_storage_domain_addr": "host1",
  "he_mount_options": "backup-volfile-servers=host2:host3",
  "he_bridge_if": "interface name for bridge creation",
  "he_enable_hc_gluster_service": true,
  "he_mem_size_MB": "4096",
  "he_cluster": "Default"
}

Comment 9 SATHEESARAN 2020-01-21 12:46:57 UTC
@Gobinda, Also make sure that the HE VM is allocated with 16GB of ram, with this change in vars
"he_mem_size_MB": "16384"

Comment 17 SATHEESARAN 2020-03-20 13:17:35 UTC
Tested with gluster-ansible-roles-1.0.5-7.el8rhgs

Contents of he_gluster_vars.json file below:

[root@ ~]# cat /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/he_gluster_vars.json 
{
  "he_appliance_password": "encrypt-password-using-ansible-vault",
  "he_admin_password": "UI-password-for-login",
  "he_domain_type": "glusterfs",
  "he_fqdn": "FQDN-for-Hosted-Engine",
  "he_vm_mac_addr": "Valid MAC address",
  "he_default_gateway": "Valid Gateway",
  "he_mgmt_network": "ovirtmgmt",
  "he_storage_domain_name": "HostedEngine",
  "he_storage_domain_path": "/engine",
  "he_storage_domain_addr": "host1-backend-network-FQDN",
  "he_mount_options": "backup-volfile-servers=host2-backend-network-FQDN:host3-backend-network-FQDN",
  "he_bridge_if": "interface name for bridge creation",
  "he_enable_hc_gluster_service": true,
  "he_mem_size_MB": "16384",
  "he_cluster": "Default"
}

1. he_ansible_host_name is removed
2. he_mem_size_MB is updated to 16384

Ansible based CLI deployment is successful

Comment 20 Simone Tiraboschi 2020-05-28 09:50:36 UTC
Removing old needinfo

Comment 22 errata-xmlrpc 2020-06-16 05:57:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2575