Bug 1552027 - Fetch host facts failed with ansible deployment via CLI
Summary: Fetch host facts failed with ansible deployment via CLI
Keywords:
Status: CLOSED DUPLICATE of bug 1549642
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Simone Tiraboschi
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-06 11:09 UTC by Yihui Zhao
Modified: 2018-03-07 08:18 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-03-07 08:18:27 UTC
oVirt Team: Node
Embargoed:


Attachments (Terms of Use)
/var/log/* (472.58 KB, application/x-bzip)
2018-03-07 02:48 UTC, Yihui Zhao
no flags Details

Description Yihui Zhao 2018-03-06 11:09:59 UTC
Description of problem: 
Fetch host facts failed with ansible deployment via CLI.

from the CLI:
"""
[ INFO  ] TASK [Remove host-deploy configuration file]
[ INFO  ] changed: [localhost]
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 
          Please specify the nfs version you would like to use (auto, v3, v4, v4_1)[auto]: v3
          Please specify the full shared storage connection path to use (example: host:/path): 10.66.148.11:/home/jiawu/nfs3
          If needed, specify additional mount options for the connection to the hosted-engine storagedomain []: 
[ INFO  ] Creating Storage Domain
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Obtain SSO token using username/password credentials]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch host facts]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "ibm-x3650m5-06.lab.eng.pek2.redhat.com", "affinity_labels": [], "auto_numa_status": "disable", "certificate": {"organization": "lab.eng.pek2.redhat.com", "subject": "O=lab.eng.pek2.redhat.com,CN=ibm-x3650m5-06.lab.eng.pek2.redhat.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/6703ac4c-210a-11e8-821c-5254005d2164", "id": "6703ac4c-210a-11e8-821c-5254005d2164"}, "comment": "", "cpu": {"name": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz", "speed": 2200.0, "topology": {"cores": 10, "sockets": 1, "threads": 2}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"family": "System X", "manufacturer": "LENOVO", "product_name": "System x3650 M5: -[8871AC1]-", "serial_number": "J33R5G8", "supported_rng_sources": ["hwrng", "random"], "uuid": "C7DD25D8-E277-11E7-B946-0894EF59EF94", "version": "13"}, "hooks": [], "href": "/ovirt-engine/api/hosts/9ade7ed6-a693-445c-854e-5e7878b5c8b8", "id": "9ade7ed6-a693-445c-854e-5e7878b5c8b8", "iscsi": {"initiator": "iqn.1994-05.com.redhat:6687044bc65d"}, "katello_errata": [], "kdump_status": "disabled", "ksm": {"enabled": false}, "libvirt_version": {"build": 0, "full_version": "libvirt-3.9.0-12.el7", "major": 3, "minor": 9, "revision": 0}, "max_scheduling_memory": 32567721984, "memory": 32972472320, "name": "ibm-x3650m5-06.lab.eng.pek2.redhat.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": "", "reported_kernel_cmdline": "BOOT_IMAGE=/rhvh-4.2.1.4-0.20180305.0+1/vmlinuz-3.10.0-845.el7.x86_64 root=/dev/rhvh_ibm-x3650m5-06/rhvh-4.2.1.4-0.20180305.0+1 ro crashkernel=auto rd.lvm.lv=rhvh_ibm-x3650m5-06/swap rd.lvm.lv=rhvh_ibm-x3650m5-06/rhvh-4.2.1.4-0.20180305.0+1 rhgb quiet LANG=en_US.UTF-8 img.bootid=rhvh-4.2.1.4-0.20180305.0+1", "type": "RHEL", "version": {"full_version": "7.5 - 1.4.el7", "major": 7, "minor": 5}}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {"mode": "enforcing"}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:625KHS+TT2BvUNftxNpGuESOvZ7Ito31fBKTnmAiZBI", "port": 22}, "statistics": [], "status": "non_responsive", "storage_connection_extensions": [], "summary": {"active": 1, "migrating": 0, "total": 1}, "tags": [], "transparent_huge_pages": {"enabled": true}, "type": "ovirt_node", "unmanaged_networks": [], "update_available": false, "version": {"build": 20, "full_version": "vdsm-4.20.20-1.el7ev", "major": 4, "minor": 20, "revision": 0}}]}, "attempts": 50, "changed": false}
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 
          Please specify the nfs version you would like to use (auto, v3, v4, v4_1)[auto]: 
          Please specify the full shared storage connection path to use (example: host:/path): 10.66.148.11:/home/jiawu/nfs3
          If needed, specify additional mount options for the connection to the hosted-engine storagedomain []: 
"""



Version-Release number of selected component (if applicable): 
cockpit-dashboard-160-3.el7.x86_64
cockpit-system-160-3.el7.noarch
cockpit-bridge-160-3.el7.x86_64
cockpit-ws-160-3.el7.x86_64
cockpit-storaged-160-3.el7.noarch
cockpit-ovirt-dashboard-0.11.14-0.1.el7ev.noarch
cockpit-160-3.el7.x86_64
rhvm-appliance-4.2-20180202.0.el7.noarch
ovirt-hosted-engine-setup-2.2.12-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.6-1.el7ev.noarch
ansible-2.4.3.0-1.el7ae.noarch
rhvh-4.2.1.4-0.20180305.0+1

How reproducible: 
30%


Steps to Reproduce: 
1. Clean install latest RHVH4.2 with ks(rhvh-4.2.1.4-0.20180305.0+1)
2. Deploy HE via CLI based ansible deployment 

Actual results: 
The same of the description.


Expected results: 
Ansible deployment successfully


Additional info:
With redeployment, fetch host facts also failed. From the CLI:

[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": []}, "attempts": 20, "changed": false}
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}

Comment 1 Yihui Zhao 2018-03-07 02:48:00 UTC
Created attachment 1405093 [details]
/var/log/*

Comment 2 Yihui Zhao 2018-03-07 02:52:28 UTC
Deployment process:
http://pastebin.test.redhat.com/562322

logs see attachment 1405093 [details]

Comment 3 Simone Tiraboschi 2018-03-07 08:18:27 UTC
The issue is here:
"status": "non_responsive"

The host went up, then after some time the engine VM lost network communication with the host that has been marked as non_responsive.

The host is still able to start connection to the engine VM and check its status.

It's exactly a duplicate of
https://bugzilla.redhat.com/show_bug.cgi?id=1549642

That can lead to different results according to when engine-host communication got broken.

*** This bug has been marked as a duplicate of bug 1549642 ***


Note You need to log in before you can comment on or make changes to this bug.