Description of problem: Fetch host facts failed with ansible deployment via CLI. from the CLI: """ [ INFO ] TASK [Remove host-deploy configuration file] [ INFO ] changed: [localhost] Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: Please specify the nfs version you would like to use (auto, v3, v4, v4_1)[auto]: v3 Please specify the full shared storage connection path to use (example: host:/path): 10.66.148.11:/home/jiawu/nfs3 If needed, specify additional mount options for the connection to the hosted-engine storagedomain []: [ INFO ] Creating Storage Domain [ INFO ] TASK [Gathering Facts] [ INFO ] ok: [localhost] [ INFO ] TASK [include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [Obtain SSO token using username/password credentials] [ INFO ] ok: [localhost] [ INFO ] TASK [Fetch host facts] [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "ibm-x3650m5-06.lab.eng.pek2.redhat.com", "affinity_labels": [], "auto_numa_status": "disable", "certificate": {"organization": "lab.eng.pek2.redhat.com", "subject": "O=lab.eng.pek2.redhat.com,CN=ibm-x3650m5-06.lab.eng.pek2.redhat.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/6703ac4c-210a-11e8-821c-5254005d2164", "id": "6703ac4c-210a-11e8-821c-5254005d2164"}, "comment": "", "cpu": {"name": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz", "speed": 2200.0, "topology": {"cores": 10, "sockets": 1, "threads": 2}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"family": "System X", "manufacturer": "LENOVO", "product_name": "System x3650 M5: -[8871AC1]-", "serial_number": "J33R5G8", "supported_rng_sources": ["hwrng", "random"], "uuid": "C7DD25D8-E277-11E7-B946-0894EF59EF94", "version": "13"}, "hooks": [], "href": "/ovirt-engine/api/hosts/9ade7ed6-a693-445c-854e-5e7878b5c8b8", "id": "9ade7ed6-a693-445c-854e-5e7878b5c8b8", "iscsi": {"initiator": "iqn.1994-05.com.redhat:6687044bc65d"}, "katello_errata": [], "kdump_status": "disabled", "ksm": {"enabled": false}, "libvirt_version": {"build": 0, "full_version": "libvirt-3.9.0-12.el7", "major": 3, "minor": 9, "revision": 0}, "max_scheduling_memory": 32567721984, "memory": 32972472320, "name": "ibm-x3650m5-06.lab.eng.pek2.redhat.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": "", "reported_kernel_cmdline": "BOOT_IMAGE=/rhvh-4.2.1.4-0.20180305.0+1/vmlinuz-3.10.0-845.el7.x86_64 root=/dev/rhvh_ibm-x3650m5-06/rhvh-4.2.1.4-0.20180305.0+1 ro crashkernel=auto rd.lvm.lv=rhvh_ibm-x3650m5-06/swap rd.lvm.lv=rhvh_ibm-x3650m5-06/rhvh-4.2.1.4-0.20180305.0+1 rhgb quiet LANG=en_US.UTF-8 img.bootid=rhvh-4.2.1.4-0.20180305.0+1", "type": "RHEL", "version": {"full_version": "7.5 - 1.4.el7", "major": 7, "minor": 5}}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {"mode": "enforcing"}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:625KHS+TT2BvUNftxNpGuESOvZ7Ito31fBKTnmAiZBI", "port": 22}, "statistics": [], "status": "non_responsive", "storage_connection_extensions": [], "summary": {"active": 1, "migrating": 0, "total": 1}, "tags": [], "transparent_huge_pages": {"enabled": true}, "type": "ovirt_node", "unmanaged_networks": [], "update_available": false, "version": {"build": 20, "full_version": "vdsm-4.20.20-1.el7ev", "major": 4, "minor": 20, "revision": 0}}]}, "attempts": 50, "changed": false} Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: Please specify the nfs version you would like to use (auto, v3, v4, v4_1)[auto]: Please specify the full shared storage connection path to use (example: host:/path): 10.66.148.11:/home/jiawu/nfs3 If needed, specify additional mount options for the connection to the hosted-engine storagedomain []: """ Version-Release number of selected component (if applicable): cockpit-dashboard-160-3.el7.x86_64 cockpit-system-160-3.el7.noarch cockpit-bridge-160-3.el7.x86_64 cockpit-ws-160-3.el7.x86_64 cockpit-storaged-160-3.el7.noarch cockpit-ovirt-dashboard-0.11.14-0.1.el7ev.noarch cockpit-160-3.el7.x86_64 rhvm-appliance-4.2-20180202.0.el7.noarch ovirt-hosted-engine-setup-2.2.12-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.6-1.el7ev.noarch ansible-2.4.3.0-1.el7ae.noarch rhvh-4.2.1.4-0.20180305.0+1 How reproducible: 30% Steps to Reproduce: 1. Clean install latest RHVH4.2 with ks(rhvh-4.2.1.4-0.20180305.0+1) 2. Deploy HE via CLI based ansible deployment Actual results: The same of the description. Expected results: Ansible deployment successfully Additional info: With redeployment, fetch host facts also failed. From the CLI: [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": []}, "attempts": 20, "changed": false} [ INFO ] TASK [include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [Remove local vm dir] [ INFO ] changed: [localhost] [ INFO ] TASK [Notify the user about a failure] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
Created attachment 1405093 [details] /var/log/*
Deployment process: http://pastebin.test.redhat.com/562322 logs see attachment 1405093 [details]
The issue is here: "status": "non_responsive" The host went up, then after some time the engine VM lost network communication with the host that has been marked as non_responsive. The host is still able to start connection to the engine VM and check its status. It's exactly a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1549642 That can lead to different results according to when engine-host communication got broken. *** This bug has been marked as a duplicate of bug 1549642 ***