Bug 1797023

Summary: In RHV Metric Store, installer vm does not take the static ip mentioned in the configuration file
Product: Red Hat Enterprise Virtualization Manager Reporter: amashah
Component: ovirt-engine-metricsAssignee: Shirly Radco <sradco>
Status: CLOSED ERRATA QA Contact: Guilherme Santos <gdeolive>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.3.7CC: emarcus, gchakkar, pelauter, sradco
Target Milestone: ovirt-4.3.9-1Keywords: ZStream
Target Release: 4.3.9-1Flags: lsvaty: testing_plan_complete-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-metrics-1.3.7-1.el7ev Doc Type: Bug Fix
Doc Text:
When deploying Metrics Store, the deployment would hang while trying to obtain the FQDN from the virtual machine. In this release, the deployment does not hang, and completes successfully.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-02 17:09:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Metrics RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description amashah 2020-01-31 18:43:26 UTC
Description of problem:
When deploying the metrics store, deployment hangs while trying to collect VM facts (fqdn).

Version-Release number of selected component (if applicable):
4.3.6, 4.3.7

How reproducible:
100%

Steps to Reproduce:
1. try and deploy metrics store on 4.3.7
2. wait for it to get stuck collecting facts
3.

Actual results:
Deployment fails and unable to deploy metrics store.

Expected results:
Metrics store successfully deploys.

Additional info:
This occurred in both 4.3.6 and also after upgrading to 4.3.7

with rhv-m 4.3.6, it failed with: 
ovirt-engine-metrics-1.3.4.1-1.el7ev.noarch
ansible-2.8.5-2.el7ae.noarch

after updating to 4.3.7, the following packages were installed:
ansible 2.9.4


~~~
TASK [/usr/share/ansible/roles/oVirt.metrics/roles/oVirt.origin-on-ovirt : Login to oVirt] *********************************
task path: /usr/share/ansible/roles/oVirt.metrics/roles/oVirt.origin-on-ovirt/tasks/create_openshift_bastion_vm.yml:9
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: root
<localhost> EXEC /bin/sh -c 'echo ~root && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1580494688.11-260154395636840 `" && echo ansible-tmp-1580494688.11-260154395636840="` echo /root/.ansible/tmp/ansible-tmp-1580494688.11-260154395636840 `" ) && sleep 0'
Using module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/ovirt/ovirt_auth.py
<localhost> PUT /root/.ansible/tmp/ansible-local-21049kad7jC/tmpaYZrsQ TO /root/.ansible/tmp/ansible-tmp-1580494688.11-260154395636840/AnsiballZ_ovirt_auth.py
<localhost> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1580494688.11-260154395636840/ /root/.ansible/tmp/ansible-tmp-1580494688.11-260154395636840/AnsiballZ_ovirt_auth.py && sleep 0'
<localhost> EXEC /bin/sh -c '/usr/bin/python /root/.ansible/tmp/ansible-tmp-1580494688.11-260154395636840/AnsiballZ_ovirt_auth.py && sleep 0'
<localhost> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1580494688.11-260154395636840/ > /dev/null 2>&1 && sleep 0'
ok: [localhost] => {
    "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", 
    "changed": false
}

TASK [/usr/share/ansible/roles/oVirt.metrics/roles/oVirt.origin-on-ovirt : Collect oVirt VM facts] *************************
task path: /usr/share/ansible/roles/oVirt.metrics/roles/oVirt.origin-on-ovirt/tasks/create_openshift_bastion_vm.yml:20
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: root
<localhost> EXEC /bin/sh -c 'echo ~root && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1580494688.85-14579113196712 `" && echo ansible-tmp-1580494688.85-14579113196712="` echo /root/.ansible/tmp/ansible-tmp-1580494688.85-14579113196712 `" ) && sleep 0'
Using module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/ovirt/_ovirt_vm_facts.py
<localhost> PUT /root/.ansible/tmp/ansible-local-21049kad7jC/tmpXip2zL TO /root/.ansible/tmp/ansible-tmp-1580494688.85-14579113196712/AnsiballZ__ovirt_vm_facts.py
<localhost> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1580494688.85-14579113196712/ /root/.ansible/tmp/ansible-tmp-1580494688.85-14579113196712/AnsiballZ__ovirt_vm_facts.py && sleep 0'
<localhost> EXEC /bin/sh -c '/usr/bin/python /root/.ansible/tmp/ansible-tmp-1580494688.85-14579113196712/AnsiballZ__ovirt_vm_facts.py && sleep 0'
<localhost> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1580494688.85-14579113196712/ > /dev/null 2>&1 && sleep 0'
FAILED - RETRYING: Collect oVirt VM facts (5 retries left).Result was: {
    "attempts": 1, 
    "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", 
    "changed": false, 
    "retries": 6
}

...

fatal: [localhost]: FAILED! => {
    "attempts": 5, 

~~~


This is what it's trying to do:


~~~
     20   - name: Collect oVirt VM facts
     21     ovirt_vm_facts:
     22       auth: '{{ ovirt_auth }}'
     23       pattern: 'name={{ openshift_ovirt_bastion_machine_name }} and cluster={{ ovirt_cluster_name }}'
     24       fetch_nested: true
     25       nested_attributes: ips
     26     no_log: true
     27     until: ansible_facts.ovirt_vms[0].fqdn is defined
     28     retries: 5
     29     delay: 60
~~~


When no_log is changed, you can see the he VM facts are listed (metrics-store-installer VM is also there and working fine), but there is no fqdn attribute.

I found something on an ovirt mailing list:
https://www.mail-archive.com/users@ovirt.org/msg59000.html

Which indicates this is an issue with Ansible 2.8, but we see this with both Ansible 2.8 and 2.9. 

We cannot downgrade to Ansible 2.7 because there ovirt-engine-metrics has a dependency for newer ansible.

I am unable to find a bugzilla so creating a new one to track the issue.

There is a reference to an upstream patch here: 
https://gerrit.ovirt.org/#/c/105813/


I tried that and it did not work, still hangs on collecting facts.

Comment 7 Shirly Radco 2020-03-12 08:29:57 UTC
*** Bug 1810554 has been marked as a duplicate of this bug. ***

Comment 10 Guilherme Santos 2020-03-18 10:12:51 UTC
So, I tested this guy and it does work under the some conditions that I described here: bz #1814568
Moreover, I would also add that depending of the environment some considerations also need to be taken. 
In my case, my test environment has a DHCP server with pre-set IP Addresses for the MAC Addresses in the MAC Range. 
So, when setting the custom MAC for the VM, didn't matter which IP I would choose, it always got the IP pre-set in the DHCP server for that MAC.

Overall, possible extras steps may be taken in consideration depending on how the network is configured in the target environment. (Another common consideration is regarding the DNS, as the target static IP may also need to be added in DNS server in advance of the installation) 

Not sure if I can considered this guy verified...

Comment 11 Guilherme Santos 2020-03-18 10:41:06 UTC
Verified on:
ovirt-engine-metrics-1.3.7-1.el7ev

Steps:
1. Followed the metrics tutorial to deploying considering the static IP scenario, bz #1814568 and comment #10 considerations

Results:
Metrics deployed (both vms) with arbitrary pre-set IPs and MACs

Comment 14 errata-xmlrpc 2020-04-02 17:09:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1309