Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1698643

Summary: can't deploy hosted-engine when eth0 and the default libvirt network try to use the same subnet
Product: [oVirt] ovirt-ansible-collection Reporter: Douglas Schilling Landgraf <dougsland>
Component: hosted-engine-setupAssignee: Ido Rosenzwig <irosenzw>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: medium Docs Contact: Tahlia Richardson <trichard>
Priority: medium    
Version: 1.0.16CC: bugs, irosenzw, stirabos
Target Milestone: ovirt-4.3.6Keywords: ZStream
Target Release: 1.0.21Flags: sbonazzo: ovirt-4.3?
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-ansible-hosted-engine-setup-1.0.21 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-26 19:42:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1710725    
Attachments:
Description Flags
ovirt-hosted-engine-setup-20190410203822-76iivc.log
none
alllogs-enginesetup.tar.gz
none
sosreport from alma03 none

Description Douglas Schilling Landgraf 2019-04-10 20:45:00 UTC
https://resources.ovirt.org/pub/ovirt-4.3-pre/iso/ovirt-node-ng-installer/4.3.3-2019040909/el7/ovirt-node-ng-installer-4.3.3-2019040909.el7.iso

[ INFO  ] TASK [ovirt.hosted_engine_setup : Undefine leftover local engine VM]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Check for leftover defined Hosted Engine VM]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Undefine leftover engine VM]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Remove eventually entries for the local VM from known_hosts file]
[ INFO  ] ok: [localhost -> localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Start libvirt]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Activate default libvirt network]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "internal error: Network is already in use by interface eth0"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [ovirt.hosted_engine_setup : Execute just a specific set of steps]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Force facts gathering]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Fetch logs from the engine VM]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Set destination directory path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Create destination directory]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Find the local appliance image]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Set local_vm_disk_path]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Give the vm time to flush dirty buffers]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Copy engine logs]
[ INFO  ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Remove local vm dir]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Remove temporary entry in /etc/hosts for the local VM]
[ INFO  ] ok: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20190410204215.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20190410203822-76iivc.log

# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:65:50:8e brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.151/24 brd 192.168.122.255 scope global dynamic eth0
       valid_lft 3436sec preferred_lft 3436sec


# cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
VARIANT="oVirt Node 4.3.3_rc3"
VARIANT_ID="ovirt-node"
PRETTY_NAME="oVirt Node 4.3.3_rc3"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.ovirt.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

# rpm -qa | grep ovirt -i
ovirt-host-deploy-common-1.8.0-1.el7.noarch
ovirt-release-host-node-4.3.3-0.3.rc3.el7.noarch
ovirt-node-ng-nodectl-4.3.0-0.20190110.0.el7.noarch
cockpit-machines-ovirt-176-4.el7.centos.noarch
ovirt-ansible-engine-setup-1.1.9-1.el7.noarch
ovirt-provider-ovn-driver-1.2.20-1.el7.noarch
ovirt-release43-pre-4.3.3-0.3.rc3.el7.noarch
ovirt-imageio-common-1.5.1-0.el7.x86_64
python2-ovirt-setup-lib-1.2.0-1.el7.noarch
ovirt-vmconsole-1.0.7-2.el7.noarch
python2-ovirt-host-deploy-1.8.0-1.el7.noarch
ovirt-ansible-repositories-1.1.5-1.el7.noarch
ovirt-hosted-engine-ha-2.3.1-1.el7.noarch
ovirt-hosted-engine-setup-2.3.7-1.el7.noarch
ovirt-ansible-hosted-engine-setup-1.0.16-1.el7.noarch
ovirt-host-dependencies-4.3.2-1.el7.x86_64
cockpit-ovirt-dashboard-0.12.7-1.el7.noarch
ovirt-imageio-daemon-1.5.1-0.el7.noarch
ovirt-host-4.3.2-1.el7.x86_64
python-ovirt-engine-sdk4-4.3.1-2.el7.x86_64
ovirt-node-ng-image-update-placeholder-4.3.3-0.3.rc3.el7.noarch
python2-ovirt-node-ng-nodectl-4.3.0-0.20190110.0.el7.noarch
ovirt-vmconsole-host-1.0.7-2.el7.noarch

Comment 1 Douglas Schilling Landgraf 2019-04-10 20:45:56 UTC
Created attachment 1554347 [details]
ovirt-hosted-engine-setup-20190410203822-76iivc.log

Comment 2 Ido Rosenzwig 2019-04-11 05:36:31 UTC
Can you please attach all the logs under /var/log/ovirt-hosted-engine-setup ?

Comment 3 Sandro Bonazzola 2019-04-11 06:22:10 UTC
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "internal error: Network is already in use by interface eth0"}

I think this happened because Node was running within a VM trying to deploy hosted engine as nested virtualization and there is a conflict between nested libvirt network and baremetal libvirt network.

Comment 4 Simone Tiraboschi 2019-04-11 07:24:26 UTC
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
> UP group default qlen 1000
>     link/ether 52:54:00:65:50:8e brd ff:ff:ff:ff:ff:ff
>     inet 192.168.122.151/24 brd 192.168.122.255 scope global dynamic eth0

Yes, I tend to agree: the issue most probably comes from here ^^^

Libvirt failed to bring-up its default network (on 192.168.122.1/24) just because eth0 is also the same subnet range: (192.168.122.151/24).

We should absolutely detect it and eventually change libvirt default network configuration on the fly (or simply emit a clear error message asking the user to edit default.xml).

Comment 5 Ido Rosenzwig 2019-04-11 07:48:58 UTC
I'm in favor of changing 'default' network on the fly, it's more elegant IMO.

Comment 6 Douglas Schilling Landgraf 2019-04-18 03:03:13 UTC
Created attachment 1556061 [details]
alllogs-enginesetup.tar.gz

Comment 7 Nikolai Sednev 2019-07-16 08:02:36 UTC
What was the fix? Will the libvirt subnet get changed to another address subnet in case of eth0 is using the same subnet?

Comment 8 Ido Rosenzwig 2019-07-16 08:30:58 UTC
This bug-fix just changed the default subnet from libvirt's default subnet to 192.168.222.X.
It fixed the problem if the machine was using libvirt's default subnet already, 
but introduce an issue with re-deployment when the machine most definitely use libvirt's subnet from the previous deployment.

This issue was documented in this bug : https://bugzilla.redhat.com/show_bug.cgi?id=1725033 and was fixed.

Now we are checking if a subnet is used. 
If it's not we are using it.
If it does we search for an unused subnet and use it.

Comment 9 Nikolai Sednev 2019-07-16 16:07:12 UTC
Getting error "[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "internal error: Network is already in use by interface enp5s0f0"}", while deployment being executed at enp5s0f1!
[ INFO  ] skipping: [localhost]
          Please indicate a nic to set ovirtmgmt bridge on: (enp5s0f1) [enp5s0f1]: 
          Please specify which way the network connectivityshould be checked (ping, dns, tcp, none) [dns]: 

Tested on these components:
Engine Software Version:4.3.5.4-0.1.el7
ovirt-hosted-engine-ha-2.3.3-1.el7ev.noarch
ovirt-hosted-engine-setup-2.3.11-1.el7ev.noarch
Linux 3.10.0-1061.el7.x86_64 #1 SMP Thu Jul 11 21:02:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.7 (Maipo)


puma18 ~]# ifconfig
enp5s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.151  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 44:1e:a1:73:39:26  txqueuelen 1000  (Ethernet)
        RX packets 18061  bytes 1190174 (1.1 MiB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 34  bytes 2486 (2.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xfbe60000-fbe7ffff  

enp5s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 44:1e:a1:73:39:27  txqueuelen 1000  (Ethernet)
        RX packets 31706  bytes 9314656 (8.8 MiB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 9068  bytes 1755588 (1.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xfbee0000-fbefffff  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 246  bytes 59454 (58.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 246  bytes 59454 (58.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ovirtmgmt: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.35.160.45  netmask 255.255.252.0  broadcast 10.35.163.255
        inet6 fe80::461e:a1ff:fe73:3927  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:23bd:461e:a1ff:fe73:3927  prefixlen 64  scopeid 0x0<global>
        ether 44:1e:a1:73:39:27  txqueuelen 1000  (Ethernet)
        RX packets 13178  bytes 740806 (723.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1010  bytes 146840 (143.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

puma18 ~]# brctl show
bridge name     bridge id               STP enabled     interfaces
;vdsmdummy;             8000.000000000000       no
ovirtmgmt               8000.441ea1733927       no              enp5s0f1


Moving back to assigned.

Comment 10 Nikolai Sednev 2019-07-17 12:42:48 UTC
Failure on redeployment:
http://pastebin.test.redhat.com/780688

Sosreport in attachment.

Comment 11 Nikolai Sednev 2019-07-17 12:47:52 UTC
Created attachment 1591407 [details]
sosreport from alma03

Comment 12 Simone Tiraboschi 2019-07-18 12:42:57 UTC
Retargeting to 4.3.6

Comment 13 Sandro Bonazzola 2019-09-26 19:42:59 UTC
This bugzilla is included in oVirt 4.3.6 release, published on September 26th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.6 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.