Bug 1539563

Summary: Deploy HE failed via CLI based ansible deployment.
Product: [oVirt] ovirt-hosted-engine-setup Reporter: Yihui Zhao <yzhao>
Component: GeneralAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED DUPLICATE QA Contact: Pavel Stehlik <pstehlik>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: ---CC: bugs, cshao, dguo, huzhao, mavital, phbailey, qiyuan, rbarry, sbonazzo, weiwang, yaniwang, ycui, yisong, ylavi, yzhao
Target Milestone: ---Flags: ycui: planning_ack?
ycui: devel_ack?
ycui: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-05 08:27:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
setup_log none

Description Yihui Zhao 2018-01-29 08:16:26 UTC
Created attachment 1387601 [details]
setup_log

Description of problem: 
Deploy HE failed via CLI based ansible deployment.


Version-Release number of selected component (if applicable): 
cockpit-ws-157-1.el7.x86_64
cockpit-bridge-157-1.el7.x86_64
cockpit-storaged-157-1.el7.noarch
cockpit-dashboard-157-1.el7.x86_64
cockpit-157-1.el7.x86_64
cockpit-ovirt-dashboard-0.11.6-0.1.el7ev.noarch
cockpit-system-157-1.el7.noarch
ovirt-hosted-engine-setup-2.2.8-2.el7ev.noarch
ovirt-hosted-engine-ha-2.2.4-1.el7ev.noarch
rhvm-appliance-4.2-20180125.0.el7.noarch
rhvh-4.2.1.2-0.20180126.0+1


How reproducible: 
100% 


Steps to Reproduce: 
1. Clean install latest RHVH4.2.1 with ks(rhvh-4.2.1.2-0.20180126.0+1)
2. Deploy HE via CLI with NFS based ansible deployment (hosted-engine --deploy --ansible)


Actual results: 
After step 2, I noticed that

1) the ovirtmgmt bridge didn't setup, I just select the em1 nic to setup the bridge:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 256
    link/ether 00:c0:dd:20:86:e0 brd ff:ff:ff:ff:ff:ff
3: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 08:9e:01:63:2c:6d brd ff:ff:ff:ff:ff:ff
    inet 10.73.73.19/22 brd 10.73.75.255 scope global dynamic em1
       valid_lft 35746sec preferred_lft 35746sec
    inet6 2620:52:0:4948:f85:9417:e4bd:55a9/64 scope global noprefixroute dynamic 
       valid_lft 2591985sec preferred_lft 604785sec
    inet6 fe80::fb42:751c:2356:9c87/64 scope link 
       valid_lft forever preferred_lft forever
4: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 08:9e:01:63:2c:6e brd ff:ff:ff:ff:ff:ff
5: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:1b:21:a6:3c:b0 brd ff:ff:ff:ff:ff:ff
6: p3p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 00:1b:21:a6:3c:b1 brd ff:ff:ff:ff:ff:ff
7: p2p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:1b:21:a6:3d:04 brd ff:ff:ff:ff:ff:ff
8: p2p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:1b:21:a6:3d:05 brd ff:ff:ff:ff:ff:ff
24: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 5a:f7:03:c0:1b:5d brd ff:ff:ff:ff:ff:ff
25: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
    link/ether 52:54:00:73:2d:e0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.124.1/24 brd 192.168.124.255 scope global virbr0
       valid_lft forever preferred_lft forever
26: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN qlen 1000
    link/ether 52:54:00:73:2d:e0 brd ff:ff:ff:ff:ff:ff


2) The HE-VM's ip change to the different network segment, HE-VM's ip is :192.168.124.70
/etc/hosts file: 

192.168.124.70 rhevh-hostedengine-vm-04.lab.eng.pek2.redhat.com
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.73.73.19 dell-per515-02.lab.eng.pek2.redhat.com

3) From the CLI:
"""
[ INFO  ] TASK [Wait for the host to become non operational]
[ ERROR ] Error: Failed to read response.
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 150, "changed": false, "msg": "Failed to read response."}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] ok: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180129141506.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180129140142-rwqwvl.log
""


Expected results: 
Deploy HE via cockpit based ansible deployment successfully.


Additional info:

Comment 1 Yihui Zhao 2018-01-29 08:18:22 UTC
(In reply to Yihui Zhao from comment #0)
> Created attachment 1387601 [details]
> setup_log
> 
> Description of problem: 
> Deploy HE failed via CLI based ansible deployment.
> 
> 
> Version-Release number of selected component (if applicable): 
> cockpit-ws-157-1.el7.x86_64
> cockpit-bridge-157-1.el7.x86_64
> cockpit-storaged-157-1.el7.noarch
> cockpit-dashboard-157-1.el7.x86_64
> cockpit-157-1.el7.x86_64
> cockpit-ovirt-dashboard-0.11.6-0.1.el7ev.noarch
> cockpit-system-157-1.el7.noarch
> ovirt-hosted-engine-setup-2.2.8-2.el7ev.noarch
> ovirt-hosted-engine-ha-2.2.4-1.el7ev.noarch
> rhvm-appliance-4.2-20180125.0.el7.noarch
> rhvh-4.2.1.2-0.20180126.0+1
> 
> 
> How reproducible: 
> 100% 
> 
> 
> Steps to Reproduce: 
> 1. Clean install latest RHVH4.2.1 with ks(rhvh-4.2.1.2-0.20180126.0+1)
> 2. Deploy HE via CLI with NFS based ansible deployment (hosted-engine
> --deploy --ansible)
> 
> 
> Actual results: 
> After step 2, I noticed that
> 
> 1) the ovirtmgmt bridge didn't setup, I just select the em1 nic to setup the
> bridge:
> 
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>     inet 127.0.0.1/8 scope host lo
>        valid_lft forever preferred_lft forever
>     inet6 ::1/128 scope host 
>        valid_lft forever preferred_lft forever
> 2: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
> UP qlen 256
>     link/ether 00:c0:dd:20:86:e0 brd ff:ff:ff:ff:ff:ff
> 3: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
>     link/ether 08:9e:01:63:2c:6d brd ff:ff:ff:ff:ff:ff
>     inet 10.73.73.19/22 brd 10.73.75.255 scope global dynamic em1
>        valid_lft 35746sec preferred_lft 35746sec
>     inet6 2620:52:0:4948:f85:9417:e4bd:55a9/64 scope global noprefixroute
> dynamic 
>        valid_lft 2591985sec preferred_lft 604785sec
>     inet6 fe80::fb42:751c:2356:9c87/64 scope link 
>        valid_lft forever preferred_lft forever
> 4: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
>     link/ether 08:9e:01:63:2c:6e brd ff:ff:ff:ff:ff:ff
> 5: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
>     link/ether 00:1b:21:a6:3c:b0 brd ff:ff:ff:ff:ff:ff
> 6: p3p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
> qlen 1000
>     link/ether 00:1b:21:a6:3c:b1 brd ff:ff:ff:ff:ff:ff
> 7: p2p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
>     link/ether 00:1b:21:a6:3d:04 brd ff:ff:ff:ff:ff:ff
> 8: p2p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
>     link/ether 00:1b:21:a6:3d:05 brd ff:ff:ff:ff:ff:ff
> 24: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen
> 1000
>     link/ether 5a:f7:03:c0:1b:5d brd ff:ff:ff:ff:ff:ff
> 25: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state
> DOWN qlen 1000
>     link/ether 52:54:00:73:2d:e0 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.124.1/24 brd 192.168.124.255 scope global virbr0
>        valid_lft forever preferred_lft forever
> 26: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master
> virbr0 state DOWN qlen 1000
>     link/ether 52:54:00:73:2d:e0 brd ff:ff:ff:ff:ff:ff
> 
> 
> 2) The HE-VM's ip change to the different network segment, HE-VM's ip is
> :192.168.124.70
> /etc/hosts file: 
> 
> 192.168.124.70 rhevh-hostedengine-vm-04.lab.eng.pek2.redhat.com
> 127.0.0.1   localhost localhost.localdomain localhost4
> localhost4.localdomain4
> ::1         localhost localhost.localdomain localhost6
> localhost6.localdomain6
> 10.73.73.19 dell-per515-02.lab.eng.pek2.redhat.com
> 
> 3) From the CLI:
> """
> [ INFO  ] TASK [Wait for the host to become non operational]
> [ ERROR ] Error: Failed to read response.
> [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 150, "changed": false,
> "msg": "Failed to read response."}
> [ ERROR ] Failed to execute stage 'Closing up': Failed executing
> ansible-playbook
> [ INFO  ] Stage: Clean up
> [ INFO  ] Cleaning temporary resources
> [ INFO  ] TASK [Gathering Facts]
> [ INFO  ] ok: [localhost]
> [ INFO  ] TASK [Remove local vm dir]
> [ INFO  ] ok: [localhost]
> [ INFO  ] Generating answer file
> '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180129141506.conf'
> [ INFO  ] Stage: Pre-termination
> [ INFO  ] Stage: Termination
> [ ERROR ] Hosted Engine deployment failed: this system is not reliable,
> please check the issue,fix and redeploy
>           Log file is located at
> /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180129140142-
> rwqwvl.log
> ""
> 
> 
> Expected results: 
> Deploy HE via cockpit based ansible deployment successfully.
> 
> 
> Additional info:

Update the Expected results: 
Deploy HE via CLI based ansible deployment successfully.

Comment 2 Simone Tiraboschi 2018-01-30 08:19:11 UTC
Yihui, was it the first attempt on that host?
If not it's exactly a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1539040

Comment 3 Yihui Zhao 2018-01-30 08:27:35 UTC
(In reply to Simone Tiraboschi from comment #2)
> Yihui, was it the first attempt on that host?
> If not it's exactly a duplicate of
> https://bugzilla.redhat.com/show_bug.cgi?id=1539040

Yes, it was the first attempt, and every time is the same failure.

I think it was not a duplicate of bug 1539040. Because the HE-VM' ip changed to 192.168.124.70, it was the different network segment. And the status is to "Wait for the host to become non operational". Also, the ovirtmgmt bridge didn't setup.

Comment 4 Simone Tiraboschi 2018-01-30 09:35:42 UTC
(In reply to Yihui Zhao from comment #3)

> I think it was not a duplicate of bug 1539040. Because the HE-VM' ip changed
> to 192.168.124.70, it was the different network segment. 

This is fine:
the whole point about node zero is to use an engine running on a local VM to bootstrap the whole system.
The local VM is running on default natted libvirt network and so that network address.

> And the status is
> to "Wait for the host to become non operational". Also, the ovirtmgmt bridge
> didn't setup.

Here hosted-engine-setup is polling the engine for host status; in bug 1539040 libvirt-guests killed the engine VM and so the engine. Not sure here.

Can you please attach 
 - /var/log/message
 - /var/log/libvirt/qemu/HostedEngineLocal.log
 - the output of 'journalctl -u libvirt-guest.service' after a failure

Comment 7 Yaniv Lavi 2018-02-05 08:27:05 UTC

*** This bug has been marked as a duplicate of bug 1539040 ***