Bug 1784010

Summary: [rhv-4.4.0-9] Right after adding host to engine - Failed to execute Ansible host-deploy role: null with host unreachable
Product: [oVirt] ovirt-engine Reporter: Avihai <aefrat>
Component: ovirt-host-deploy-ansibleAssignee: Dana <delfassy>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Matyáš <pmatyas>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.4.0CC: bugs, mperina, omachace
Target Milestone: ovirt-4.4.0Flags: pm-rhel: ovirt-4.4+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-20 20:04:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1766595    
Attachments:
Description Flags
engine vdsm and host deploy logs
none
ansible_runner_error_log none

Description Avihai 2019-12-16 13:46:37 UTC
Created attachment 1645593 [details]
engine vdsm and host deploy logs

Description of problem:
Engine upgraded from rhv-4.3.8.1 to rhv-4.4.0.9.
Host reprovisioned to rhel8.1 and upgraded according to 4.4.0.9 build mail.

I see issues with adding host to engine at ovirt-host-deploy stage , it fails right away after starting to run host-deploy with no route to host errors.
I checked and connectivity between engine and host is good(both ping and ssh).

I tried with 2 hosts and saw the same issue.

On both hosts ping/ssh connectivity from engine to host is OK:

[root@storage-ge-08 ~]# ping storage-ge8-vdsm2.scl.lab.tlv.redhat.com
PING storage-ge8-vdsm2.scl.lab.tlv.redhat.com (10.35.82.80) 56(84) bytes of data.
64 bytes from storage-ge8-vdsm2.scl.lab.tlv.redhat.com (10.35.82.80): icmp_seq=1 ttl=63 time=1.07 ms
64 bytes from storage-ge8-vdsm2.scl.lab.tlv.redhat.com (10.35.82.80): icmp_seq=2 ttl=63 time=0.772 ms
^C
--- storage-ge8-vdsm2.scl.lab.tlv.redhat.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.772/0.925/1.079/0.156 ms
[root@storage-ge-08 ~]# ssh root.lab.tlv.redhat.com
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Mon Dec 16 13:48:54 2019 from 10.35.162.7
[root@storage-ge8-vdsm2 ~]# 


Engine log:
2019-12-16 13:40:17,458+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-3737) [34f4b46a] EVENT_ID: VDS_ANSIBLE_INSTALL_STARTED(560), Ansibl
e host-deploy playbook execution has started on host host_mixed_2.
2019-12-16 13:40:19,046+02 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [5b933867] Connecting to storage-ge8-vdsm1.scl.lab.tlv.redhat.com/10.35.82.79
2019-12-16 13:40:19,048+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-62) [] Unable to RefreshCapabilities: NoRouteToH
ostException: No route to host
2019-12-16 13:40:19,049+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-62) [] Command 'GetCapabilitiesAs
yncVDSCommand(HostName = host_mixed_11, VdsIdAndVdsVDSCommandParametersBase:{hostId='9d3b15b0-0723-4671-b88a-80de4c666ec0', vds='Host[host_mixed_11,9d3b15b0-0723-4671-b88a-80de4c666ec0]'})' execution failed: ja
va.net.NoRouteToHostException: No route to host
2019-12-16 13:40:22,056+02 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [5b933867] Connecting to storage-ge8-vdsm1.scl.lab.tlv.redhat.com/10.35.82.79
2019-12-16 13:40:22,058+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-80) [] Unable to RefreshCapabilities: NoRouteToH
ostException: No route to host
2019-12-16 13:40:22,059+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-80) [] Command 'GetCapabilitiesAs
yncVDSCommand(HostName = host_mixed_11, VdsIdAndVdsVDSCommandParametersBase:{hostId='9d3b15b0-0723-4671-b88a-80de4c666ec0', vds='Host[host_mixed_11,9d3b15b0-0723-4671-b88a-80de4c666ec0]'})' execution failed: ja
va.net.NoRouteToHostException: No route to host
2019-12-16 13:40:24,071+02 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-3737) [34f4b46a] Host installation failed for host '1e38ff69-3c73-4690-8e
ac-ea20300bdc12', 'host_mixed_2': Failed to execute Ansible host-deploy role: null. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20191216134017-storage-ge8-vds
m2.scl.lab.tlv.redhat.com-34f4b46a.log
2019-12-16 13:40:24,075+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-3737) [34f4b46a] START, SetVdsStatusVDSCommand(HostName = host_mixed_2, SetVdsSta
tusVDSCommandParameters:{hostId='1e38ff69-3c73-4690-8eac-ea20300bdc12', status='InstallFailed', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 7ad559c0
2019-12-16 13:40:24,082+02 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-3737) [34f4b46a] FINISH, SetVdsStatusVDSCommand, return: , log id: 7ad559c0
2019-12-16 13:40:24,092+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-3737) [34f4b46a] EVENT_ID: VDS_INSTALL_FAILED(505), Host host_mixed
_2 installation failed. Failed to execute Ansible host-deploy role: null. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20191216134017-storage-ge8-vdsm2.scl.lab
.tlv.redhat.com-34f4b46a.log.

Host deploy log: 
cat /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20191216134017-storage-ge8-vdsm2.scl.lab.tlv.redhat.com-34f4b46a.log.
cat: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20191216134017-storage-ge8-vdsm2.scl.lab.tlv.redhat.com-34f4b46a.log.: No such file or directory
[root@storage-ge-08 ~]# cat /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20191216134017-storage-ge8-vdsm2.scl.lab.tlv.redhat.com-34f4b46a.log
2019-12-16 13:40:24 IST - TASK [Gathering Facts] *********************************************************
2019-12-16 13:40:24 IST - PLAY RECAP *********************************************************************
storage-ge8-vdsm2.scl.lab.tlv.redhat.com : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0

Version-Release number of selected component (if applicable):
Engine:
ovirt-engine-4.4.0-0.9.master.el7.noarch
ansible-2.9.2-1.el7ae.noarch

Host:
vdsm-4.40.0-164.git38a19bb.el8ev.x86_64
libvirt-client-5.6.0-6.module+el8.1.0+4244+9aa4e6bb.x86_64
libvirt-lock-sanlock-5.6.0-6.module+el8.1.0+4244+9aa4e6bb.x86_64
qemu-img-4.1.0-14.module+el8.1.0+4548+ed1300f4.x86_64


How reproducible:
100% 

Steps to Reproduce:
1.Upgrade engine and hosts from rhv-4.3.8.1 -> rhv-4.4.0.9
2.Add host to engine via webadmin


Actual results:
ovirt-host-deploy fails right away after starting to run host-deploy with no route to host errors.

Host goes to unresponsive.

Expected results:


Additional info:

Comment 1 Dana 2019-12-16 14:56:31 UTC
Hi Avihai,
Can you please copy the content of /etc/ansible-runner-service/config.yaml and the output of journalctl -u ansible-runner-service?
Thanks!

Comment 2 Avihai 2019-12-17 11:12:10 UTC
(In reply to Dana from comment #1)
> Hi Avihai,
> Can you please copy the content of /etc/ansible-runner-service/config.yaml
> and the output of journalctl -u ansible-runner-service?
> Thanks!
Sure , here goes :

[root@storage-ge-08 ~]# cat /etc/ansible-runner-service/config.yaml

version: 1
playbooks_root_dir: '/usr/share/ovirt-engine/ansible-runner-service-project'
ssh_private_key: '/etc/pki/ovirt-engine/keys/engine_id_rsa'
port: 50001
target_user: root
 
[root@storage-ge-08 ~]# journalctl -u ansible-runner-service
-- No entries --

Comment 3 Dana 2019-12-18 15:59:38 UTC
Can you please attach the log from /var/log/httpd/ansible_runner_service_error_log?

Comment 4 Avihai 2019-12-19 06:34:41 UTC
Created attachment 1646320 [details]
ansible_runner_error_log

Comment 5 RHV bug bot 2020-01-08 14:47:41 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Tag 'ovirt-engine-4.4.0' doesn't contain patch 'https://gerrit.ovirt.org/106063']
gitweb: https://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=shortlog;h=refs/tags/ovirt-engine-4.4.0

For more info please contact: infra

Comment 6 RHV bug bot 2020-01-08 15:17:06 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Tag 'ovirt-engine-4.4.0' doesn't contain patch 'https://gerrit.ovirt.org/106063']
gitweb: https://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=shortlog;h=refs/tags/ovirt-engine-4.4.0

For more info please contact: infra

Comment 7 Petr Matyáš 2020-01-20 15:53:52 UTC
Verified on ovirt-host-deploy-common-1.9.0-0.0.master.20191128124417.gitd2b9fa5.el7ev.noarch

Comment 8 Sandro Bonazzola 2020-05-20 20:04:03 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.