Bug 1679227

Summary: Creation of bastion sometimes fails because of missing FQDN
Product: [oVirt] ovirt-engine-metrics Reporter: Jan Zmeskal <jzmeskal>
Component: GenericAssignee: Shirly Radco <sradco>
Status: CLOSED CURRENTRELEASE QA Contact: Ivana Saranova <isaranov>
Severity: high Docs Contact:
Priority: unspecified    
Version: 1.2.0.2CC: bugs, eslutsky, isaranov, lleistne, sradco
Target Milestone: ovirt-4.3.5Flags: sradco: ovirt-4.3?
lleistne: testing_ack+
Target Release: 1.3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-metrics-1.3.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-30 14:08:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Metrics RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1715513    
Bug Blocks: 1599589, 1710361    

Description Jan Zmeskal 2019-02-20 16:13:14 UTC
Description of problem:
Happens during creation of bastion. There is a task in oVirt.metrics-store-installation that sets fact of bastion FQDN. Specifically I mean this task: https://gerrit.ovirt.org/#/c/97643/48/roles/oVirt.metrics-store-installation/tasks/main.yml@69

However, it might happen that at that point the VM still has not received a FQDN. If that is the case, the playbook fails with this error:
TASK [oVirt.metrics/roles/oVirt.metrics-store-installation : Set fact of the bastion machine FQDN] *******************
task path: /usr/share/ansible/roles/oVirt.metrics/roles/oVirt.metrics-store-installation/tasks/main.yml:69
fatal: [localhost]: FAILED! => {
    "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'fqdn'\n\nThe error appears to have been in '/usr/share/ansible/roles/oVirt.metrics/roles/oVirt.metrics-store-installation/tasks/main.yml': line 69, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Set fact of the bastion machine FQDN\n  ^ here\n"
}

This needs to be fix. Quick and dirty workaround exists. You can put this after task "ovirt_auth":
- pause:
    minutes: 3

oVirt then has enough time to grant FQDN to the VM.


Version-Release number of selected component (if applicable):
ovirt-engine-metrics-1.2.1-0.0.master.20190220121053.el7.noarch (patchset 48)

How reproducible:
50 %

Steps to Reproduce:
Reproduction steps included in description


Additional info:
- Ansible log attached
- my config.yml: http://pastebin.test.redhat.com/718677

Comment 2 Jan Zmeskal 2019-02-25 13:33:25 UTC
Moving to POST since the patch this BZ depends on (https://gerrit.ovirt.org/#/c/97643/) has not been merged yet. Once it's merged, this can be moved to MODIFIED. Once it's a part of consumable package, this can be moved to ON_QA and verified by QA.

Comment 3 Ivana Saranova 2019-02-26 13:23:31 UTC
Steps:
1) Edit the config file according to README
2) Run the script configure_ovirt_machines_for_metrics.sh

Result:
TASK [oVirt.metrics/roles/oVirt.metrics-store-installation : Set fact of the bastion machine FQDN] ****** fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'fqdn'\n\nThe error appears to have been in '/usr/share/ansible/roles/oVirt.metrics/roles/oVirt.metrics-store-installation/tasks/main.yml': line 76, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Set fact of the bastion machine FQDN\n ^ here\n"} PLAY RECAP ********************************************************************************************** localhost : ok=44 changed=4 unreachable=0 failed=1 

Tested in: ovirt-engine-metrics-1.2.1-0.0.master.20190225200554.el7.noarch (patchset 56)

Comment 4 Sandro Bonazzola 2019-03-12 12:54:24 UTC
4.3.1 has been released, please re-target this bug as soon as possible.

Comment 5 Evgeny Slutsky 2019-03-28 12:31:38 UTC
the issue reproduced again, during the metrics deployment even after applying the  patch.
https://paste.fedoraproject.org/paste/kEbUoPKMh5BiahumQYPsQQ

Comment 6 Ivana Saranova 2019-05-30 14:31:41 UTC
Cannot verify due to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1715513

Comment 7 Ivana Saranova 2019-06-24 11:25:50 UTC
Steps:
1) Install metrics store according to the documentation
2) Check that bastion installation does not fail on missing FQDN

Result:
Playbook waits for bastion's IP and has 30 retries, which seems to be sufficient time for the FQDN to show up. Playbook does not fail and installation of the metrics store was successful.

Verified in:
ovirt-engine-4.3.4.3-0.1.el7.noarch
ovirt-engine-metrics-1.3.2-1.el7ev.noarch
ovirt-ansible-vm-infra-1.1.18-1.el7ev.noarch
ovirt-ansible-image-template-1.1.11-1.el7ev.noarch

Comment 8 Sandro Bonazzola 2019-07-30 14:08:44 UTC
This bugzilla is included in oVirt 4.3.5 release, published on July 30th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.