Bug 1944600 - deploy using 'ovirt-ansible-collection' fail on rhvh
Summary: deploy using 'ovirt-ansible-collection' fail on rhvh
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine-sdk-python
Classification: oVirt
Component: General
Version: ---
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.4.5-1
: 4.4.10
Assignee: Ori Liel
QA Contact: Guilherme Santos
URL:
Whiteboard:
Depends On:
Blocks: 1857841
TreeView+ depends on / blocked
 
Reported: 2021-03-30 10:24 UTC by Roni
Modified: 2021-04-15 07:41 UTC (History)
18 users (show)

Fixed In Version: python3-ovirt-engine-sdk4-4.4.10-1.el8.x86_64
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-15 07:41:55 UTC
oVirt Team: Infra
Embargoed:
pm-rhel: ovirt-4.4+
pm-rhel: devel_ack+
pm-rhel: testing_ack+


Attachments (Terms of Use)

Description Roni 2021-03-30 10:24:17 UTC
Created attachment 1767605 [details]
logs

Description of problem:
deploy using 'ovirt-ansible-collection' fail on RHVH

Version-Release number of selected component (if applicable):
rhvh-4.4.4.2-0.20210307.0+1

How reproducible:
100%

Steps to Reproduce:
1. Provision host with rhvh rhvh-4.4.4.2-0.20210307
2. Deploy hoste_engine using ovirt-ansible-collection role: 'hosted_engine_setup'
3.

Actual results:
Deploy fails when it tries to deploy HE on the first host (not when adding new hosts)
It seems that the host has been rebooted while it not expected to, because
the parameter 'reboot_after_installation' is set to 'false'

NOTE:
Martin Nacas found that the issue is fixed after upgrading the package:
python3-ovirt-engine-sdk4-4.4.7-1.el8.x86_64
to:
python3-ovirt-engine-sdk4-4.4.10-1.el8.x86_64
Meaning with v10 the host is not rebooted as expected

Expected results:
Deploy should pass, the host should not be rebooted

Additional info:
Below are Ansible console logs on failure + see attached full ovirt logs

18:55:07 TASK [ovirt.ovirt.hosted_engine_setup : Obtain SSO token using username/password credentials] ***
18:55:13 ok: [lynx25.lab.eng.tlv2.redhat.com]
18:55:13 
18:55:13 TASK [ovirt.ovirt.hosted_engine_setup : Wait for the host to be up] ************
18:55:19 FAILED - RETRYING: Wait for the host to be up (120 retries left).
18:55:35 FAILED - RETRYING: Wait for the host to be up (119 retries left).
18:55:51 FAILED - RETRYING: Wait for the host to be up (118 retries left).
18:56:06 FAILED - RETRYING: Wait for the host to be up (117 retries left).
18:56:22 FAILED - RETRYING: Wait for the host to be up (116 retries left).
18:56:37 FAILED - RETRYING: Wait for the host to be up (115 retries left).
18:56:53 FAILED - RETRYING: Wait for the host to be up (114 retries left).
18:57:09 FAILED - RETRYING: Wait for the host to be up (113 retries left).
18:57:24 FAILED - RETRYING: Wait for the host to be up (112 retries left).
18:57:42 FAILED - RETRYING: Wait for the host to be up (111 retries left).
18:57:58 FAILED - RETRYING: Wait for the host to be up (110 retries left).
18:58:13 FAILED - RETRYING: Wait for the host to be up (109 retries left).
18:58:29 FAILED - RETRYING: Wait for the host to be up (108 retries left).
18:58:44 FAILED - RETRYING: Wait for the host to be up (107 retries left).
18:59:00 FAILED - RETRYING: Wait for the host to be up (106 retries left).
18:59:15 FAILED - RETRYING: Wait for the host to be up (105 retries left).
18:59:31 FAILED - RETRYING: Wait for the host to be up (104 retries left).
18:59:46 FAILED - RETRYING: Wait for the host to be up (103 retries left).
19:00:02 FAILED - RETRYING: Wait for the host to be up (102 retries left).
19:00:21 FAILED - RETRYING: Wait for the host to be up (101 retries left).
19:04:00 fatal: [lynx25.lab.eng.tlv2.redhat.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host lynx25.lab.eng.tlv2.redhat.com port 22: No route to host", "unreachable": true}
19:04:00 
19:04:00 RUNNING HANDLER [ci-map : yum-clean-all] ***************************************
19:08:43 fatal: [lynx25.lab.eng.tlv2.redhat.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Connection timed out during banner exchange", "unreachable": true}
19:08:43 
19:08:43 PLAY RECAP *********************************************************************
19:08:43 hosted-engine-07.lab.eng.tlv2.redhat.com : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
19:08:43 localhost                  : ok=5    changed=2    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
19:08:43 lynx25.lab.eng.tlv2.redhat.com : ok=315  changed=109  unreachable=2    failed=0    skipped=146  rescued=0    ignored=4   
19:08:43 lynx26.lab.eng.tlv2.redhat.com : ok=23   changed=11   unreachable=0    failed=0    skipped=12   rescued=0    ignored=1   
19:08:43 lynx27.lab.eng.tlv2.redhat.com : ok=23   changed=11   unreachable=0    failed=0

Comment 8 Wei Wang 2021-04-02 07:41:37 UTC
Try to reproduce this bug with rhvh-4.4.4.2-0.20210307.0+1

1. Clean install rhvh-4.4.4.2-0.20210307.0+1
2. yum install rhvm-appliance-4.4-20201117.0.el8ev.x86_64
3. cd /usr/share/ansible/collections/ansible_collections/redhat/rhv/roles/hosted_engine_setup/examples
4. Modify passwords.yml and nfs_deployment.json
5. ansible-playbook with hosted_engine_deploy_localhost.yml

Result:
Hosted engine deploy successful.
[root@hp-xxxxxx-xx examples]# hosted-engine --vm-status


--== Host hp-xxxxxx-xx.lab.eng.pek2.redhat.com (id: 1) status ==--

Host ID                            : 1
Host timestamp                     : 7201
Score                              : 3400
Engine status                      : {"vm": "up", "health": "good", "detail": "Up"}
Hostname                           : hp-xxxxxx-xx.lab.eng.pek2.redhat.com
Local maintenance                  : False
stopped                            : False
crc32                              : 37b4d8e7
conf_on_shared_storage             : True
local_conf_timestamp               : 7201
Status up-to-date                  : True
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=7201 (Fri Apr  2 12:45:06 2021)
	host-id=1
	score=3400
	vm_conf_refresh_time=7201 (Fri Apr  2 12:45:06 2021)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineUp
	stopped=False

QE cannnot reproduce this bug. I am not sure my steps are right or not.

@Roni,
Could you please to check my steps? Or give me the detail steps? Thanks!

Comment 9 Roni 2021-04-03 06:41:31 UTC
Hi Wei

The question is not if the whole process was succeeded or not, because it depends on your boot process of your host
you will need a host with a long boot process to see the issue.
Instead please check if the host has been rebooted, while it should not be expected to because the 
key 'reboot_after_installation' is set to 'false'

Note that this bug already fixed with 
python3-ovirt-engine-sdk4-4.4.10-1.el8.x86_64

Thx
Roni

Comment 10 Guilherme Santos 2021-04-07 19:04:48 UTC
I'm verifying it as it has been fixed already in python3-ovirt-engine-sdk4-4.4.10-1.el8.x86_64, present in ovirt-engine-4.4.5.11


Note You need to log in before you can comment on or make changes to this bug.