Bug 1539560 - Deploy HE failed via cockpit based ansible deployment.
Summary: Deploy HE failed via cockpit based ansible deployment.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: cockpit-ovirt
Classification: oVirt
Component: Hosted Engine
Version: ---
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-4.2.2
: ---
Assignee: Phillip Bailey
QA Contact: Yihui Zhao
URL:
Whiteboard:
: 1540927 (view as bug list)
Depends On:
Blocks: 1525853 1540621
TreeView+ depends on / blocked
 
Reported: 2018-01-29 08:12 UTC by Yihui Zhao
Modified: 2018-03-29 11:00 UTC (History)
16 users (show)

Fixed In Version: cockpit-ovirt-0.11.17-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-29 11:00:37 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: blocker+
cshao: testing_ack+


Attachments (Terms of Use)
setup_failed.png (81.73 KB, image/png)
2018-01-29 08:12 UTC, Yihui Zhao
no flags Details
static_config.png (62.70 KB, image/png)
2018-03-06 05:38 UTC, Yihui Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1549642 0 unspecified CLOSED Race condition between host up at engine eyes and SuperVdsm.ServerCallback::add_sourceroute on DHCP configured hosts 2021-02-22 00:41:40 UTC
oVirt gerrit 86579 0 'None' MERGED ansible: install engine appliance 2020-03-15 12:42:20 UTC
oVirt gerrit 88517 0 'None' MERGED wizard: Consolidate network and VM steps 2020-03-15 12:42:20 UTC
oVirt gerrit 88675 0 'None' MERGED wizard: Consolidate network and VM steps 2020-03-15 12:42:20 UTC

Internal Links: 1549642

Description Yihui Zhao 2018-01-29 08:12:40 UTC
Created attachment 1387599 [details]
setup_failed.png

Description of problem: 
Deploy HE failed via cockpit based ansible deployment.


Version-Release number of selected component (if applicable): 
cockpit-ws-157-1.el7.x86_64
cockpit-bridge-157-1.el7.x86_64
cockpit-storaged-157-1.el7.noarch
cockpit-dashboard-157-1.el7.x86_64
cockpit-157-1.el7.x86_64
cockpit-ovirt-dashboard-0.11.6-0.1.el7ev.noarch
cockpit-system-157-1.el7.noarch
ovirt-hosted-engine-setup-2.2.8-2.el7ev.noarch
ovirt-hosted-engine-ha-2.2.4-1.el7ev.noarch
rhvm-appliance-4.2-20180125.0.el7.noarch
rhvh-4.2.1.2-0.20180126.0+1


How reproducible: 
100% 


Steps to Reproduce: 
1. Clean install latest RHVH4.2.1 with ks(rhvh-4.2.1.2-0.20180126.0+1)
2. Deploy HE via cockpit based ansible deployment


Actual results: 
After step2, from the cockpit:

"""
TASK [Extract appliance to local vm dir]
TASK [Extract appliance to local vm dir]
fatal: [localhost]: FAILED! => {"changed": false, "msg": "src (or content) and dest are required"}
fatal: [localhost]: FAILED! => {"changed": false, "msg": "src (or content) and dest are required"}
"""


Expected results: 
Deploy HE via cockpit based ansible deployment successfully.


Additional info: 
There is no setup log in the dir "/var/log/ovirt-hosted-engine-setup/"

Comment 1 Red Hat Bugzilla Rules Engine 2018-01-31 03:27:13 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Simone Tiraboschi 2018-01-31 14:03:40 UTC
Fixed by https://gerrit.ovirt.org/#/c/86579/ which is not in ovirt-hosted-engine-setup-2.2.8

Workaround: manually install the appliance RPM before trying deploying from cockpit.

Comment 3 Yaniv Lavi 2018-02-01 13:05:28 UTC
(In reply to Simone Tiraboschi from comment #2)
> Fixed by https://gerrit.ovirt.org/#/c/86579/ which is not in
> ovirt-hosted-engine-setup-2.2.8
> 
> Workaround: manually install the appliance RPM before trying deploying from
> cockpit.

Is this in oVirt 4.2.1?

Comment 4 Simone Tiraboschi 2018-02-01 15:38:11 UTC
(In reply to Yaniv Lavi from comment #3)
> Is this in oVirt 4.2.1?

Yes the fix will be in ovirt-hosted-engine-setup-2.2.9

Comment 5 Yihui Zhao 2018-02-02 06:06:21 UTC
(In reply to Simone Tiraboschi from comment #4)
> (In reply to Yaniv Lavi from comment #3)
> > Is this in oVirt 4.2.1?
> 
> Yes the fix will be in ovirt-hosted-engine-setup-2.2.9

Simone, 
 I already tested with ovirt-hosted-engine-setup-2.2.9, but also failed.

Test versions:
cockpit-ws-157-1.el7.x86_64
cockpit-bridge-157-1.el7.x86_64
cockpit-storaged-157-1.el7.noarch
cockpit-dashboard-157-1.el7.x86_64
cockpit-157-1.el7.x86_64
cockpit-ovirt-dashboard-0.11.9-0.1.el7ev.noarch
cockpit-system-157-1.el7.noarch
ovirt-hosted-engine-setup-2.2.9-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.4-1.el7ev.noarch
rhvh-4.2.1.2-0.20180201.0+1
rhvm-appliance-4.2-20180125.0.el7.noarch

Test result:
"""
TASK [Obtain SSO token using username/password credentials]
ok: [localhost]
TASK [Add host]
changed: [localhost]
TASK [Wait for the host to become non operational]
fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": []}, "attempts": 150, "changed": false}
"""

Additional info:
With CLI have the same issue:
 INFO  ] TASK [Wait for the host to become non operational]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": []}, "attempts": 150, "changed": false}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] ok: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180202010400.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180202005142-9d4z9f.log

Comment 6 Yihui Zhao 2018-02-07 08:52:42 UTC
Update :
Test version:
cockpit-system-157-1.el7.noarch
cockpit-dashboard-157-1.el7.x86_64
cockpit-ovirt-dashboard-0.11.11-0.1.el7ev.noarch
cockpit-bridge-157-1.el7.x86_64
cockpit-storaged-157-1.el7.noarch
cockpit-157-1.el7.x86_64
cockpit-ws-157-1.el7.x86_64
ovirt-hosted-engine-ha-2.2.4-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.9-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
rhvh-4.2.1.2-0.20180205.0+1

Test steps:
Deploy HE via cockpit based ansible:


Test result:
Deployment was failed.

""""
TASK [Add host]
changed: [localhost]
TASK [Wait for the host to become non operational]
ok: [localhost]
TASK [Get virbr0 routing configuration]
changed: [localhost]
TASK [Get ovirtmgmt route table id]
changed: [localhost]
TASK [Check network configuration]
changed: [localhost]
TASK [Clean network configuration]
skipping: [localhost]
TASK [Restore network configuration]
changed: [localhost]
TASK [Wait for the host to be up]
ok: [localhost]
TASK [Persist ovirtmngmt network configuration]
changed: [localhost]
TASK [Remove host-deploy configuration file]
changed: [localhost]





From the engine log:

"""
2018-02-07 16:21:52,536+08 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [2fc07076] FINISH, HostSetupNetworksVDSCommand, log id: 4209d1a9
2018-02-07 16:22:11,000+08 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "a8c9e016-66bd-48ac-b9b6-99886f569b1b"
2018-02-07 16:22:11,000+08 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "bdebcca9-d79b-4b4b-b2ff-e231e82a7e69"
2018-02-07 16:22:11,042+08 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "96de56c2-1883-4692-9c71-3f74713f0a54"
2018-02-07 16:22:11,281+08 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "144db715-9687-47cc-b3c5-0a2eea20a150"
2018-02-07 16:22:11,625+08 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "bba75b5c-5fbb-45a6-9b25-97a779068884"
2018-02-07 16:22:11,627+08 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "0db6240f-bae5-45fd-9322-eb50978beb8d"
2018-02-07 16:22:12,628+08 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "1f9f2100-ff9b-453e-bb64-f4662a72c2c2"
2018-02-07 16:22:14,137+08 ERROR [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not able to update response for "05587fa5-2799-4376-a55d-f16f58e16de5"


2018-02-07 16:22:22,017+08 ERROR [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [24b5d77c] Can not run fence action on host 'dell-per515-02.lab.eng.pek2.redhat.com', no suitable proxy host was found.


2018-02-07 16:34:53,750+08 ERROR [org.ovirt.engine.core.bll.host.HostUpgradeManager] (EE-ManagedThreadFactory-hostUpdatesChecker-Thread-1) [] Failed to run check-update of host 'dell-per515-02.lab.eng.pek2.redhat.com'.
2018-02-07 16:34:53,750+08 ERROR [org.ovirt.engine.core.bll.hostdeploy.HostUpdatesChecker] (EE-ManagedThreadFactory-hostUpdatesChecker-Thread-1) [] Failed to check if updates are available for host 'dell-per515-02.lab.eng.pek2.redhat.com' with error message 'Failed to run check-update of host 'dell-per515-02.lab.eng.pek2.redhat.com'.'
2018-02-07 16:34:53,753+08 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-hostUpdatesChecker-Thread-1) [] EVENT_ID: HOST_AVAILABLE_UPDATES_FAILED(839), Failed to check for available updates on host dell-per515-02.lab.eng.pek2.redhat.com with message 'Failed to run check-update of host 'dell-per515-02.lab.eng.pek2.redhat.com'.'.
"""



Additional info:
[root@dell-per515-02 log]# ping rhevh-hostedengine-vm-04.lab.eng.pek2.redhat.com
PING rhevh-hostedengine-vm-04.lab.eng.pek2.redhat.com (192.168.124.70) 56(84) bytes of data.
64 bytes from rhevh-hostedengine-vm-04.lab.eng.pek2.redhat.com (192.168.124.70): icmp_seq=1 ttl=64 time=0.197 ms
64 bytes from rhevh-hostedengine-vm-04.lab.eng.pek2.redhat.com (192.168.124.70): icmp_seq=2 ttl=64 time=0.147 ms
^C
--- rhevh-hostedengine-vm-04.lab.eng.pek2.redhat.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.147/0.172/0.197/0.025 ms


ping dell-per515-02.lab.eng.pek2.redhat.com
PING dell-per515-02.lab.eng.pek2.redhat.com (10.73.73.19) 56(84) bytes of data.
64 bytes from dell-per515-02.lab.eng.pek2.redhat.com (10.73.73.19): icmp_seq=1 ttl=64 time=0.122 ms
64 bytes from dell-per515-02.lab.eng.pek2.redhat.com (10.73.73.19): icmp_seq=2 ttl=64 time=0.136 ms
64 bytes from dell-per515-02.lab.eng.pek2.redhat.com (10.73.73.19): icmp_seq=3 ttl=64 time=0.123 ms
64 bytes from dell-per515-02.lab.eng.pek2.redhat.com (10.73.73.19): icmp_seq=4 ttl=64 time=0.098 ms
^C
--- dell-per515-02.lab.eng.pek2.redhat.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.098/0.119/0.136/0.019 ms

Comment 7 Ryan Barry 2018-02-20 14:15:01 UTC
*** Bug 1540927 has been marked as a duplicate of this bug. ***

Comment 8 Simone Tiraboschi 2018-03-05 10:48:05 UTC
Yihui, was you host in your latest test configured with DHCP?
If so I think you hit https://bugzilla.redhat.com/show_bug.cgi?id=1549642

Can you please try with a statically configured host?

Comment 9 Yihui Zhao 2018-03-06 01:53:25 UTC
(In reply to Simone Tiraboschi from comment #8)
> Yihui, was you host in your latest test configured with DHCP?
> If so I think you hit https://bugzilla.redhat.com/show_bug.cgi?id=1549642
> 
> Can you please try with a statically configured host?

This bug:https://bugzilla.redhat.com/show_bug.cgi?id=1548958 block the ansible deployment.

Comment 10 Ryan Barry 2018-03-06 02:00:00 UTC
I think you mean this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1551258

But Simone's question can be verified with an earlier version (0.11.11-0.1, for example)

Comment 12 Yihui Zhao 2018-03-06 05:38:44 UTC
Created attachment 1404636 [details]
static_config.png

Comment 13 Yihui Zhao 2018-03-06 06:24:18 UTC
Update:

Test version:
cockpit-dashboard-160-3.el7.x86_64
cockpit-system-160-3.el7.noarch
cockpit-bridge-160-3.el7.x86_64
cockpit-ws-160-3.el7.x86_64
cockpit-storaged-160-3.el7.noarch
cockpit-ovirt-dashboard-0.11.11-0.1.el7ev.noarch
cockpit-160-3.el7.x86_64
ovirt-hosted-engine-setup-2.2.12-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.6-1.el7ev.noarch
rhvh-4.2.1.4-0.20180305.0+1
rhvm-appliance-4.2-20180202.0.el7.noarch

Test steps:
1. Install the latest rhvh4.2(rhvh-4.2.1.4-0.20180305.0+1)
2. Downgrade cockpit-ovirt-dashboard pkg from 11.14 to 11.11
3. Ansible deployment 


Test failed from cockpit:

TASK [Detect VLAN ID]
TASK [Detect VLAN ID]
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'BRIDGE_IF' is undefined\n\nThe error appears to have been in '/usr/share/ovirt-hosted-engine-setup/ansible/bootstrap_local_vm.yml': line 223, column 9, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n - debug: var=engine_status\n - name: Detect VLAN ID\n ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'BRIDGE_IF' is undefined"}
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'BRIDGE_IF' is undefined\n\nThe error appears to have been in '/usr/share/ovirt-hosted-engine-setup/ansible/bootstrap_local_vm.yml': line 223, column 9, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n - debug: var=engine_status\n - name: Detect VLAN ID\n ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'BRIDGE_IF' is undefined"}
TASK [include_tasks]
TASK [include_tasks]
ok: [localhost]
ok: [localhost]
TASK [Remove local vm dir]
TASK [Remove local vm dir]
changed: [localhost]
changed: [localhost]
TASK [Notify the user about a failure]
TASK [Notify the user about a failure]
fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}

Comment 14 Yihui Zhao 2018-03-06 06:25:40 UTC
(In reply to Yihui Zhao from comment #13)
> Update:
> 
> Test version:
> cockpit-dashboard-160-3.el7.x86_64
> cockpit-system-160-3.el7.noarch
> cockpit-bridge-160-3.el7.x86_64
> cockpit-ws-160-3.el7.x86_64
> cockpit-storaged-160-3.el7.noarch
> cockpit-ovirt-dashboard-0.11.11-0.1.el7ev.noarch
> cockpit-160-3.el7.x86_64
> ovirt-hosted-engine-setup-2.2.12-1.el7ev.noarch
> ovirt-hosted-engine-ha-2.2.6-1.el7ev.noarch
> rhvh-4.2.1.4-0.20180305.0+1
> rhvm-appliance-4.2-20180202.0.el7.noarch
> 
> Test steps:
> 1. Install the latest rhvh4.2(rhvh-4.2.1.4-0.20180305.0+1)
> 2. Downgrade cockpit-ovirt-dashboard pkg from 11.14 to 11.11
> 3. Ansible deployment 
> 
> 
> Test failed from cockpit:
> 
> TASK [Detect VLAN ID]
> TASK [Detect VLAN ID]
> fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an
> undefined variable. The error was: 'BRIDGE_IF' is undefined\n\nThe error
> appears to have been in
> '/usr/share/ovirt-hosted-engine-setup/ansible/bootstrap_local_vm.yml': line
> 223, column 9, but may\nbe elsewhere in the file depending on the exact
> syntax problem.\n\nThe offending line appears to be:\n\n - debug:
> var=engine_status\n - name: Detect VLAN ID\n ^ here\n\nexception type:
> <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'BRIDGE_IF' is
> undefined"}
> fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an
> undefined variable. The error was: 'BRIDGE_IF' is undefined\n\nThe error
> appears to have been in
> '/usr/share/ovirt-hosted-engine-setup/ansible/bootstrap_local_vm.yml': line
> 223, column 9, but may\nbe elsewhere in the file depending on the exact
> syntax problem.\n\nThe offending line appears to be:\n\n - debug:
> var=engine_status\n - name: Detect VLAN ID\n ^ here\n\nexception type:
> <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'BRIDGE_IF' is
> undefined"}
> TASK [include_tasks]
> TASK [include_tasks]
> ok: [localhost]
> ok: [localhost]
> TASK [Remove local vm dir]
> TASK [Remove local vm dir]
> changed: [localhost]
> changed: [localhost]
> TASK [Notify the user about a failure]
> TASK [Notify the user about a failure]
> fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not
> be provisioned according to the playbook results: please check the logs for
> the issue, fix accordingly or re-deploy from scratch.\n"}
> fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not
> be provisioned according to the playbook results: please check the logs for
> the issue, fix accordingly or re-deploy from scratch.\n"}

Add:
With a statically host:

[root@ibm-x3650m5-05 ~]# ip a  s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 08:94:ef:59:db:f4 brd ff:ff:ff:ff:ff:ff
    inet 10.73.73.103/22 brd 10.73.75.255 scope global noprefixroute eno1
       valid_lft forever preferred_lft forever
    inet6 2620:52:0:4948:a94:efff:fe59:dbf4/64 scope global noprefixroute dynamic 
       valid_lft 2591957sec preferred_lft 604757sec
    inet6 fe80::a94:efff:fe59:dbf4/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 08:94:ef:59:db:f5 brd ff:ff:ff:ff:ff:ff
4: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 08:94:ef:59:db:f6 brd ff:ff:ff:ff:ff:ff
5: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 08:94:ef:59:db:f7 brd ff:ff:ff:ff:ff:ff
6: ens3f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 90:e2:ba:93:9b:9c brd ff:ff:ff:ff:ff:ff
7: enp0s20u1u5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether 0a:94:ef:59:db:fb brd ff:ff:ff:ff:ff:ff
8: ens3f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 90:e2:ba:93:9b:9e brd ff:ff:ff:ff:ff:ff
9: ens1f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 90:e2:ba:93:98:80 brd ff:ff:ff:ff:ff:ff
10: ens1f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 90:e2:ba:93:98:82 brd ff:ff:ff:ff:ff:ff
26: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:c2:fb:4d brd ff:ff:ff:ff:ff:ff
    inet 192.168.124.1/24 brd 192.168.124.255 scope global virbr0
       valid_lft forever preferred_lft forever
27: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:c2:fb:4d brd ff:ff:ff:ff:ff:ff
30: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:5d:21:64 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe5d:2164/64 scope link 
       valid_lft forever preferred_lft forever
kroot@ibm-x3650m5-05:~\[root@ibm-x3650m5-05 ~]# ip route
default via 10.73.75.254 dev eno1 proto static metric 100 
10.73.72.0/22 dev eno1 proto kernel scope link src 10.73.73.103 metric 100 
192.168.124.0/24 dev virbr0 proto kernel scope link src 192.168.124.1

Comment 15 Yaniv Kaul 2018-03-06 13:19:03 UTC
Who's setting BRIDGE_IF in case of Cockpit?

Comment 16 Phillip Bailey 2018-03-08 13:21:08 UTC
(In reply to Yaniv Kaul from comment #15)
> Who's setting BRIDGE_IF in case of Cockpit?

I took care of it in https://gerrit.ovirt.org/#/c/88517/.

Comment 17 Michael 2018-03-09 12:32:43 UTC
Hi guys,

I'm getting the same error during ansible deployment.

ovirt-hosted-engine-setup v. 2.2.9, fresh install of CentOS (to be fair the machine has been added as a hosted engine host in another engine, but never used).


[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Wait for the host to become non operational]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": []}, "attempts": 150, "changed": false}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] ok: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180309104100.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.



From the log:

2018-03-09 10:40:58,750+0100 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:180 ansible-playbook rc: 2
2018-03-09 10:40:58,750+0100 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:187 ansible-playbook stdout:
2018-03-09 10:40:58,750+0100 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:189  to retry, use: --limit @/usr/share/ovirt-hosted-engine-setup/ansible/bootstrap_local_vm.retry

2018-03-09 10:40:58,750+0100 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:190 ansible-playbook stderr:
2018-03-09 10:40:58,751+0100 DEBUG otopi.context context._executeMethod:143 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 133, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py", line 186, in _closeup
    r = ah.run()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/ansible_utils.py", line 194, in run
    raise RuntimeError(_('Failed executing ansible-playbook'))
RuntimeError: Failed executing ansible-playbook
2018-03-09 10:40:58,753+0100 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Closing up': Failed executing ansible-playbook
2018-03-09 10:40:58,753+0100 DEBUG otopi.context context.dumpEnvironment:859 ENVIRONMENT DUMP - BEGIN
2018-03-09 10:40:58,753+0100 DEBUG otopi.context context.dumpEnvironment:869 ENV BASE/error=bool:'True'
2018-03-09 10:40:58,754+0100 DEBUG otopi.context context.dumpEnvironment:869 ENV BASE/exceptionInfo=list:'[(<type 'exceptions.RuntimeError'>, RuntimeError('Failed executing ansible-playbook',), <traceback object at 0x28994d0>)]'
2018-03-09 10:40:58,754+0100 DEBUG otopi.context context.dumpEnvironment:873 ENVIRONMENT DUMP - END
2018-03-09 10:40:58,754+0100 INFO otopi.context context.runSequence:741 Stage: Clean up


Is there a known work-around? Yihui, do you know? Would I be able to deploy it successfully in a newer release (tag) of hosted engine, e.g. 2.2.12 or master?


Thanks,
Mike

Comment 18 Michael 2018-03-09 13:29:28 UTC
(In reply to Michael from comment #17)
> stuff

Re #17: I was able to run the setup on a different machine (different CPU type as well - Sky Lake vs. Haswell), so I don't think it's completely broken... It may be a case of "works on my pc". :)

Comment 19 Simone Tiraboschi 2018-03-09 15:39:14 UTC
(In reply to Michael from comment #18)
> (In reply to Michael from comment #17)
> > stuff
> 
> Re #17: I was able to run the setup on a different machine (different CPU
> type as well - Sky Lake vs. Haswell), so I don't think it's completely
> broken... It may be a case of "works on my pc". :)

It was probably due to https://bugzilla.redhat.com/show_bug.cgi?id=1549642
It was a race condition so not 100% systematic.

It will be fixed in the next build.

Comment 20 Yihui Zhao 2018-03-12 02:11:55 UTC
(In reply to Michael from comment #17)
> Hi guys,
> 
> I'm getting the same error during ansible deployment.
> 
> ovirt-hosted-engine-setup v. 2.2.9, fresh install of CentOS (to be fair the
> machine has been added as a hosted engine host in another engine, but never
> used).
> 
> 
> [ INFO  ] changed: [localhost]
> [ INFO  ] TASK [Wait for the host to become non operational]
> [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts":
> []}, "attempts": 150, "changed": false}
> [ ERROR ] Failed to execute stage 'Closing up': Failed executing
> ansible-playbook
> [ INFO  ] Stage: Clean up
> [ INFO  ] Cleaning temporary resources
> [ INFO  ] TASK [Gathering Facts]
> [ INFO  ] ok: [localhost]
> [ INFO  ] TASK [Remove local vm dir]
> [ INFO  ] ok: [localhost]
> [ INFO  ] Generating answer file
> '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180309104100.conf'
> [ INFO  ] Stage: Pre-termination
> [ INFO  ] Stage: Termination
> [ ERROR ] Hosted Engine deployment failed: please check the logs for the
> issue, fix accordingly or re-deploy from scratch.
> 
> 
> 
> From the log:
> 
> 2018-03-09 10:40:58,750+0100 DEBUG
> otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:180
> ansible-playbook rc: 2
> 2018-03-09 10:40:58,750+0100 DEBUG
> otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:187
> ansible-playbook stdout:
> 2018-03-09 10:40:58,750+0100 DEBUG
> otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:189  to
> retry, use: --limit
> @/usr/share/ovirt-hosted-engine-setup/ansible/bootstrap_local_vm.retry
> 
> 2018-03-09 10:40:58,750+0100 DEBUG
> otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:190
> ansible-playbook stderr:
> 2018-03-09 10:40:58,751+0100 DEBUG otopi.context context._executeMethod:143
> method exception
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/otopi/context.py", line 133, in
> _executeMethod
>     method['method']()
>   File
> "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/
> core/misc.py", line 186, in _closeup
>     r = ah.run()
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/ansible_utils.
> py", line 194, in run
>     raise RuntimeError(_('Failed executing ansible-playbook'))
> RuntimeError: Failed executing ansible-playbook
> 2018-03-09 10:40:58,753+0100 ERROR otopi.context context._executeMethod:152
> Failed to execute stage 'Closing up': Failed executing ansible-playbook
> 2018-03-09 10:40:58,753+0100 DEBUG otopi.context context.dumpEnvironment:859
> ENVIRONMENT DUMP - BEGIN
> 2018-03-09 10:40:58,753+0100 DEBUG otopi.context context.dumpEnvironment:869
> ENV BASE/error=bool:'True'
> 2018-03-09 10:40:58,754+0100 DEBUG otopi.context context.dumpEnvironment:869
> ENV BASE/exceptionInfo=list:'[(<type 'exceptions.RuntimeError'>,
> RuntimeError('Failed executing ansible-playbook',), <traceback object at
> 0x28994d0>)]'
> 2018-03-09 10:40:58,754+0100 DEBUG otopi.context context.dumpEnvironment:873
> ENVIRONMENT DUMP - END
> 2018-03-09 10:40:58,754+0100 INFO otopi.context context.runSequence:741
> Stage: Clean up
> 
> 
> Is there a known work-around? Yihui, do you know? Would I be able to deploy
> it successfully in a newer release (tag) of hosted engine, e.g. 2.2.12 or
> master?
> 
> 
> Thanks,
> Mike

From my side, it just in testing process, so I cannot provide the workaround for you. From Comment 19, it was probably due to https://bugzilla.redhat.com/show_bug.cgi?id=1549642, so let us look forward to next build.

Comment 21 Yihui Zhao 2018-03-26 07:49:58 UTC
Tested with these versions:

rhvh-4.2.2.0-0.20180322.0+1
cockpit-ovirt-dashboard-0.11.19-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.14-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
rhvm-appliance-4.2-20180322.0.el7.noarch

Test result:
Not a blocker, with these pkg versions, I can deploy HE via cockpit with ansible successfully for some times.

[root@ibm-x3650m5-06 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ibm-x3650m5-06.lab.eng.pek2.redhat.com
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "Up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : a8dcd9f1
local_conf_timestamp               : 156195
Host timestamp                     : 156195
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=156195 (Mon Mar 26 15:39:04 2018)
	host-id=1
	score=3400
	vm_conf_refresh_time=156195 (Mon Mar 26 15:39:05 2018)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineUp
	stopped=False



Additional info:
If other failed reasons, for example, deploy HE failed on some tasks, I will report the specified bugs about the failure.



Here, I will change this bug 's status to verified.

Comment 22 Sandro Bonazzola 2018-03-29 11:00:37 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.