Bug 1545931 - hosted-engine deploy fails when ovirtmgmt is defined on vlan subinterface (ansible version)
Summary: hosted-engine deploy fails when ovirtmgmt is defined on vlan subinterface (a...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: Tools
Version: 2.2.8
Hardware: Unspecified
OS: Unspecified
unspecified
high with 1 vote
Target Milestone: ovirt-4.2.3
: ---
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks: 1458709
TreeView+ depends on / blocked
 
Reported: 2018-02-15 21:15 UTC by Miguel Armas
Modified: 2018-05-10 06:32 UTC (History)
8 users (show)

Fixed In Version: ovirt-hosted-engine-setup-2.2.15
Clone Of:
Environment:
Last Closed: 2018-05-10 06:32:02 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.2+
ylavi: blocker+


Attachments (Terms of Use)
logs from puma19 (9.70 MB, application/x-xz)
2018-03-27 17:38 UTC, Nikolai Sednev
no flags Details
logs from the engine (9.24 MB, application/x-xz)
2018-03-27 17:39 UTC, Nikolai Sednev
no flags Details
sosreport from host (9.15 MB, application/x-xz)
2018-04-23 17:35 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 87783 0 'None' MERGED ansible: network: set VLAN ID at datacenter level 2021-02-18 08:57:57 UTC
oVirt gerrit 87794 0 'None' MERGED ansible: network: fix network only if needed 2021-02-18 08:57:57 UTC
oVirt gerrit 87954 0 'None' MERGED ansible: network: fix network only if needed 2021-02-18 08:57:57 UTC
oVirt gerrit 87959 0 'None' MERGED ansible: network: set VLAN ID at datacenter level 2021-02-18 08:57:57 UTC

Description Miguel Armas 2018-02-15 21:15:55 UTC
Description of problem:

The new ansible based deploy fails when the management network (ovirtmgmt) is defined on a vlan subinterface 

The problem seems to be that the ovirtmgmt bridge is not created because the management network in the datacenter is not defined with the vlan, so it's inconsistent with the interface configuration in the host.
This error can be seen in the engine.log inside the provisioned hosted-engine VM:

2018-02-15 13:49:26,850Z INFO  [org.ovirt.engine.core.bll.host.HostConnectivityChecker] (EE-ManagedThreadFactory-engine-Thread-1) [15c7e33a] Engine managed to communicate with VDSM agent on host 'ovirt1' with address 'ovirt1' ('06651b32-4ef8-4b5d-ab2d-c38e84c2d790')
2018-02-15 13:49:30,302Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [15c7e33a] EVENT_ID: VLAN_ID_
MISMATCH_FOR_MANAGEMENT_NETWORK_CONFIGURATION(1,119), Failed to configure management network on host ovirt1. Host ovirt1 has an interface bond0.1005 for the management network configuration with VLAN-ID (1005), which is different from data-center definition (none).
2018-02-15 13:49:30,302Z ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [15c7e33a] Exception: org.ovirt.engine.core.bll.network.NetworkConfigurator$NetworkConfiguratorException: Failed to configure management network


Version-Release number of selected component (if applicable):

oVirt 4.2.1
ovirt-hosted-engine-setup-2.2.9-1

How reproducible:

Always

Steps to Reproduce:
1. Define a vlan subinterface in the host (bond0.1005)
2. Deploy the hosted-engine with the ansible version (hosted-engine --deploy)
3. Choose the VLAN interface as the management network interface (bond.1005)

Actual results:

The deployment fails with the following error:

[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "ip rule list | grep ovirtmgmt | sed s/\\\\[.*\\\\]\\ //g | awk '{ print $9 }'", "delta": "0:00:00.006473", "end": "2018-02-15 13:57:11.132359", "rc": 0, "start": "2018-02-15  13:57:11.125886", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook

This error happens because the ovirtmgmt bridge is not created, and the reasons seems to be the error mentioned in the description

Expected results:
This is a supported configuration, so the deploy should finish with no errors

Additional info:
The deprecated (--noansible) method works with this configuration

Comment 1 Nikolai Sednev 2018-03-26 15:03:32 UTC
I've been disconnected from host on:
enp5s0f1.173: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.35.149.225  netmask 255.255.252.0  broadcast 10.35.151.255
        inet6 fe80::5e9:1f4c:43aa:370d  prefixlen 64  scopeid 0x20<link>
        ether 44:1e:a1:73:39:27  txqueuelen 1000  (Ethernet)
        RX packets 131  bytes 7406 (7.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 11  bytes 1342 (1.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 722  bytes 1870207 (1.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 722  bytes 1870207 (1.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:b7:d8:5c  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@puma18 ~]# hosted-engine --deploy --ansible 
[ INFO  ] Stage: Initializing
[ INFO  ] Stage: Environment setup
          During customization use CTRL-D to abort.
          Continuing will configure this host for serving as hypervisor and create a local VM with a running engine.
          The locally running engine will be used to configure a storage domain and create a VM there.
          At the end the disk of the local VM will be moved to the shared storage.
          Are you sure you want to continue? (Yes, No)[Yes]: 
          It has been detected that this program is executed through an SSH connection without using screen.
          Continuing with the installation may lead to broken installation if the network connection fails.
          It is highly recommended to abort the installation and run it inside a screen session using command "screen".
          Do you want to continue anyway? (Yes, No)[No]: yes
          Configuration files: []
          Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180326171952-rn8l3f.log
          Version: otopi-1.7.7 (otopi-1.7.7-1.el7ev)
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Stage: Programs detection
[ INFO  ] Stage: Environment setup
[ INFO  ] Stage: Environment customization
         
          --== STORAGE CONFIGURATION ==--
         
         
          --== HOST NETWORK CONFIGURATION ==--
         
          Please indicate a pingable gateway IP address [10.35.163.254]: 
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Detecting interface on existing management bridge]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Get all active network interfaces]
[ INFO  ] TASK [Filter bonds with bad naming]
[ INFO  ] TASK [Generate output list]
[ INFO  ] ok: [localhost]
          Please indicate a nic to set ovirtmgmt bridge on: (enp5s0f1, enp5s0f0, enp5s0f1.173) [enp5s0f1]: enp5s0f1.173
         
          --== VM CONFIGURATION ==--
         
          If you want to deploy with a custom engine appliance image,
          please specify the path to the OVA archive you would like to use
          (leave it empty to skip, the setup will use rhvm-appliance rpm installing it if missing): 
[ INFO  ] Detecting host timezone.
          Please provide the FQDN you would like to use for the engine appliance.
          Note: This will be the FQDN of the engine VM you are now going to launch,
          it should not point to the base host or to any other existing machine.
          Engine VM FQDN: (leave it empty to skip):  []: nsednev-he-7.scl.lab.tlv.redhat.com
          Please provide the domain name you would like to use for the engine appliance.
          Engine VM domain: [scl.lab.tlv.redhat.com]
          Enter root password that will be used for the engine appliance: 
          Confirm appliance root password: 
          Enter ssh public key for the root user that will be used for the engine appliance (leave it empty to skip): 
[WARNING] Skipping appliance root ssh public key
          Do you want to enable ssh access for the root user (yes, no, without-password) [yes]: 
          Please specify the number of virtual CPUs for the VM (Defaults to appliance OVF value): [4]: 
          Please specify the memory size of the VM in MB (Defaults to appliance OVF value): [16384]: 
          You may specify a unicast MAC address for the VM or accept a randomly generated default [00:16:3e:4d:be:2c]: 00:16:3e:EE:EE:E1
          How should the engine VM network be configured (DHCP, Static)[DHCP]? 
          Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?
          Note: ensuring that this host could resolve the engine VM hostname is still up to you
          (Yes, No)[No] 
         
          --== HOSTED ENGINE CONFIGURATION ==--
         
          Please provide the name of the SMTP server through which we will send notifications [localhost]: 
          Please provide the TCP port number of the SMTP server [25]: 
          Please provide the email address from which notifications will be sent [root@localhost]: 
          Please provide a comma-separated list of email addresses which will get notifications [root@localhost]: 
          Enter engine admin password: 
          Confirm engine admin password: 
[ INFO  ] Stage: Setup validation
[ INFO  ] Stage: Transaction setup
[ INFO  ] Stage: Misc configuration
[ INFO  ] Stage: Package installation
[ INFO  ] Stage: Misc configuration
[ INFO  ] Stage: Transaction commit
[ INFO  ] Stage: Closing up
[ INFO  ] Cleaning previous attempts
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Stop libvirt service]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Drop vdsm config statements]
[ INFO  ] TASK [Restore initial abrt config files]
[ INFO  ] TASK [Restart abrtd service]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Drop libvirt sasl2 configuration by vdsm]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Stop and disable services]
[ INFO  ] TASK [Start libvirt]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Check for leftover local Hosted Engine VM]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Destroy leftover local Hosted Engine VM]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Check for leftover defined local Hosted Engine VM]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Undefine leftover local engine VM]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Remove eventually entries for the local VM from known_hosts file]
[ INFO  ] ok: [localhost]
[ INFO  ] Starting local VM
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Start libvirt]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Activate default libvirt network]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Get libvirt interfaces]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Get routing rules]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Save bridge name]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Wait for the bridge to appear on the host]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Refresh network facts]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Prepare CIDR for virbr0]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Add outbound route rules]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Add inbound route rules]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Register the engine FQDN as a host]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Create directory for local VM]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Set local vm dir path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fix local VM directory permission]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Install rhvm-appliance rpm]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Parse appliance configuration for path]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Parse appliance configuration for sha1sum]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Get OVA path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Compute sha1sum]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Compare sha1sum]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Register appliance PATH]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Extract appliance to local VM directory]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Find the appliance image]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Get appliance disk size]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Parse qemu-img output]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Create cloud init user-data and meta-data files]
[ INFO  ] TASK [Create ISO disk]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Create local VM]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Get local VM IP]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Remove eventually entries for the local VM from /etc/hosts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Create an entry in /etc/hosts for the local VM]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Wait for SSH to restart on the local VM]
[ INFO  ] ok: [localhost -> localhost]
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [nsednev-he-7.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Wait for the local VM]
[ INFO  ] ok: [nsednev-he-7.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Add an entry for this host on /etc/hosts on the local VM]
[ INFO  ] changed: [nsednev-he-7.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Set FQDN]
[ INFO  ] changed: [nsednev-he-7.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Force the local VM FQDN to resolve on 127.0.0.1]
[ INFO  ] changed: [nsednev-he-7.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Restore sshd reverse DNS lookups]
[ INFO  ] changed: [nsednev-he-7.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Generate an answer file for engine-setup]
[ INFO  ] changed: [nsednev-he-7.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Include before engine-setup custom tasks files for the engine VM]
[ INFO  ] TASK [Execute engine-setup]
[ INFO  ] changed: [nsednev-he-7.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Include after engine-setup custom tasks files for the engine VM]
[ INFO  ] TASK [Configure LibgfApi support]
[ INFO  ] skipping: [nsednev-he-7.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Restart ovirt-engine service for LibgfApi support]
[ INFO  ] skipping: [nsednev-he-7.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Mask cloud-init services to speed up future boot]
[ INFO  ] TASK [Clean up bootstrap answer file]
[ INFO  ] changed: [nsednev-he-7.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Wait for ovirt-engine service to start]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Detect VLAN ID]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Set Engine public key as authorized key without validating the TLS/SSL certificates]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Obtain SSO token using username/password credentials]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Enable GlusterFS at cluster level]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Set VLAN ID at datacenter level]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Force host-deploy in offline mode]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Add host]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Wait for the host to be up]
[ INFO  ] ok: [localhost]

Comment 2 Nikolai Sednev 2018-03-26 15:29:42 UTC
Disconnection appeared due to default gateway got changed from initial native VLAN to tagged VLAN and I had no route to it from my station.

I'll have to change my environment accordingly.

Tested on these components on host:

ovirt-hosted-engine-setup-2.2.14-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 3 Nikolai Sednev 2018-03-27 17:29:11 UTC
Node 0 over FC deployment failed, while mgmt network had been set on VLAN subinterface.

Components on host:
ovirt-hosted-engine-ha-2.2.9-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)


[ INFO  ] TASK [Check for the local bootstrap VM]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Make the engine aware that the external VM is stopped]
[ INFO  ] TASK [Wait for the local bootstrap VM to be down at engine eyes]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_vms": [{"affinity_labels": [], "applications": [], "bios": {"boot_menu": {"enabled": false}}, "cdroms": [], "cluster": {"href": "/ovirt-engine/api/clusters/53cd11c6-31e1-11e8-963a-00163eeeeee1", "id": "53cd11c6-31e1-11e8-963a-00163eeeeee1"}, "cpu": {"architecture": "x86_64", "topology": {"cores": 1, "sockets": 4, "threads": 1}}, "cpu_profile": {"href": "/ovirt-engine/api/cpuprofiles/58ca604e-01a7-003f-01de-000000000250", "id": "58ca604e-01a7-003f-01de-000000000250"}, "cpu_shares": 0, "creation_time": "2018-03-27 20:11:18.559000+03:00", "delete_protected": false, "disk_attachments": [], "display": {"address": "127.0.0.1", "allow_override": false, "copy_paste_enabled": true, "disconnect_action": "LOCK_SCREEN", "file_transfer_enabled": true, "monitors": 1, "port": 5900, "single_qxl_pci": false, "smartcard_enabled": false, "type": "vnc"}, "graphics_consoles": [], "high_availability": {"enabled": false, "priority": 0}, "host": {"href": "/ovirt-engine/api/hosts/c67c6723-aeba-46c4-855b-39f250c8232d", "id": "c67c6723-aeba-46c4-855b-39f250c8232d"}, "host_devices": [], "href": "/ovirt-engine/api/vms/9fdc893f-2ca4-4601-b141-4e256d8ea29d", "id": "9fdc893f-2ca4-4601-b141-4e256d8ea29d", "io": {"threads": 0}, "katello_errata": [], "large_icon": {"href": "/ovirt-engine/api/icons/7095c66e-0b3c-3fef-7b39-374c69ffea91", "id": "7095c66e-0b3c-3fef-7b39-374c69ffea91"}, "memory": 17179869184, "memory_policy": {"guaranteed": 17179869184, "max": 17179869184}, "migration": {"auto_converge": "inherit", "compressed": "inherit"}, "migration_downtime": -1, "name": "external-HostedEngineLocal", "next_run_configuration_exists": false, "nics": [], "numa_nodes": [], "numa_tune_mode": "interleave", "origin": "external", "original_template": {"href": "/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000", "id": "00000000-0000-0000-0000-000000000000"}, "os": {"boot": {"devices": ["hd"]}, "type": "other"}, "permissions": [], "placement_policy": {"affinity": "migratable"}, "quota": {"id": "65eeaffe-31e1-11e8-85ce-00163eeeeee1"}, "reported_devices": [], "run_once": false, "sessions": [], "small_icon": {"href": "/ovirt-engine/api/icons/198a3117-13e4-8248-3156-df1dfd66431e", "id": "198a3117-13e4-8248-3156-df1dfd66431e"}, "snapshots": [], "sso": {"methods": [{"id": "guest_agent"}]}, "start_paused": false, "stateless": false, "statistics": [], "status": "unknown", "storage_error_resume_behaviour": "auto_resume", "tags": [], "template": {"href": "/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000", "id": "00000000-0000-0000-0000-000000000000"}, "time_zone": {"name": "Etc/GMT"}, "type": "desktop", "usb": {"enabled": false}, "watchdogs": []}]}, "attempts": 24, "changed": false}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180327202700.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180327195924-a2z17j.log

Comment 4 Nikolai Sednev 2018-03-27 17:31:51 UTC
Simone, please provide your input.

Comment 5 Nikolai Sednev 2018-03-27 17:38:27 UTC
Created attachment 1413856 [details]
logs from puma19

Comment 6 Nikolai Sednev 2018-03-27 17:39:08 UTC
Created attachment 1413857 [details]
logs from the engine

Comment 7 Nikolai Sednev 2018-03-27 17:45:01 UTC
I think its another error now, as I see different errors within the engine's log:
2018-03-27 20:41:12,307+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (EE-ManagedThre
adFactory-engineScheduled-Thread-21) [] Command 'GetCapabilitiesVDSCommand(HostName = puma19.scl.lab.tlv.redhat.com, V
dsIdAndVdsVDSCommandParametersBase:{hostId='c67c6723-aeba-46c4-855b-39f250c8232d', vds='Host[puma19.scl.lab.tlv.redhat
.com,c67c6723-aeba-46c4-855b-39f250c8232d]'})' execution failed: java.rmi.ConnectException: Connection timeout
2018-03-27 20:41:12,307+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-
engineScheduled-Thread-21) [] Failure to refresh host 'puma19.scl.lab.tlv.redhat.com' runtime info: java.rmi.ConnectEx
ception: Connection timeout
2018-03-27 20:41:15,316+03 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connect
ing to puma19.scl.lab.tlv.redhat.com/10.35.160.47
2018-03-27 20:41:26,225+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (EE-ManagedThread
Factory-engineScheduled-Thread-64) [] Command 'GetAllVmStatsVDSCommand(HostName = puma19.scl.lab.tlv.redhat.com, VdsId
VDSCommandParametersBase:{hostId='c67c6723-aeba-46c4-855b-39f250c8232d'})' execution failed: VDSGenericException: VDSN
etworkException: Connection issue Connection timeout
2018-03-27 20:41:26,226+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] (EE-ManagedThreadFa
ctory-engineScheduled-Thread-64) [] Failed to fetch vms info for host 'puma19.scl.lab.tlv.redhat.com' - skipping VMs m
onitoring.
2018-03-27 20:41:35,316+03 WARN  [org.ovirt.vdsm.jsonrpc.client.utils.retry.Retryable] (SSL Stomp Reactor) [] Retry fa
iled
2018-03-27 20:41:35,316+03 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Exception during connection
2018-03-27 20:41:35,317+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Command 'GetCapabilitiesVDSCommand(HostName = puma19.scl.lab.tlv.redhat.com, VdsIdAndVdsVDSCommandParametersBase:{hostId='c67c6723-aeba-46c4-855b-39f250c8232d', vds='Host[puma19.scl.lab.tlv.redhat.com,c67c6723-aeba-46c4-855b-39f250c8232d]'})' execution failed: java.rmi.ConnectException: Connection timeout
2018-03-27 20:41:35,317+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failure to refresh host 'puma19.scl.lab.tlv.redhat.com' runtime info: java.rmi.ConnectException: Connection timeout

Comment 8 Simone Tiraboschi 2018-03-28 07:58:45 UTC
The root cause is here:

2018-03-27 20:24:06,029+03 INFO  [org.ovirt.engine.core.bll.storage.pool.SetStoragePoolStatusCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [ceaf0e] Running command: SetStoragePoolStatusCommand internal: true. Entities affected :  ID: 53cad834-31e1-11e8-8e51-00163eeeeee1 Type: StoragePool
2018-03-27 20:24:06,038+03 INFO  [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [ceaf0e] Storage Pool '53cad834-31e1-11e8-8e51-00163eeeeee1' - Updating Storage Domain 'a677edf1-d15b-4e25-bb5d-ad126e02e465' status from 'Active' to 'Unknown', reason: null
2018-03-27 20:24:06,119+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [ceaf0e] EVENT_ID: SYSTEM_CHANGE_STORAGE_POOL_STATUS_PROBLEMATIC(980), Invalid status on Data Center Default. Setting status to Non Responsive.

Due to a network issue the engine failed to communicate with the host and so the host has been set as non responsive and the datacenter from active to unknown so everything else after that will fail for sure.

Comment 9 Nikolai Sednev 2018-03-28 12:16:15 UTC
Moving back to assigned.

Comment 10 Red Hat Bugzilla Rules Engine 2018-03-28 12:16:22 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 11 Nikolai Sednev 2018-03-28 12:18:28 UTC
For reproduction I even did not used Bond, simple usage of trunk (802.1Q) with mgmt network placed over tagged VLAN is sufficient.

Comment 12 Red Hat Bugzilla Rules Engine 2018-03-28 12:18:34 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 13 Nikolai Sednev 2018-03-28 13:15:34 UTC
Verification from comment #7 was made with these components:
ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.9-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch

Moving back to assigned.

Comment 14 Nikolai Sednev 2018-03-29 14:54:54 UTC
Forth to our conversation with Simone, these two seems to be related to SHE failed deployment over tagged VLAN interface on 4.2.
https://bugzilla.redhat.com/show_bug.cgi?id=1561483
https://bugzilla.redhat.com/show_bug.cgi?id=1560684

Martin, I had a chat with Danken, he suggested an async.
Please provide your input.

Comment 16 Yaniv Lavi 2018-04-16 07:46:11 UTC
(In reply to Nikolai Sednev from comment #14)
> Forth to our conversation with Simone, these two seems to be related to SHE
> failed deployment over tagged VLAN interface on 4.2.
> https://bugzilla.redhat.com/show_bug.cgi?id=1561483
> https://bugzilla.redhat.com/show_bug.cgi?id=1560684
> 
> Martin, I had a chat with Danken, he suggested an async.
> Please provide your input.

These are fixed, please retest.

Comment 17 Nikolai Sednev 2018-04-23 17:34:54 UTC
I still see issue with deployment of SHE over tagged VLAN:
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The resolved address doesn't resolve on the selected interface\n"}

The same deployment over 4.1.11 is working just fine.

I've tried to deploy using interface:
enp5s0f1.404: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.35.147.75  netmask 255.255.255.240  broadcast 10.35.147.79
        inet6 fe80::216:3eff:fe7b:b864  prefixlen 64  scopeid 0x20<link>
        ether 00:16:3e:7b:b8:64  txqueuelen 1000  (Ethernet)
        RX packets 6  bytes 717 (717.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 20  bytes 2028 (1.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

This is my routing table:
puma19 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.35.163.254   0.0.0.0         UG    100    0        0 enp5s0f1
0.0.0.0         10.35.163.254   0.0.0.0         UG    101    0        0 enp5s0f0
10.35.147.64    0.0.0.0         255.255.255.240 U     0      0        0 enp5s0f1.404
10.35.160.0     0.0.0.0         255.255.252.0   U     100    0        0 enp5s0f1
10.35.160.0     0.0.0.0         255.255.252.0   U     101    0        0 enp5s0f0
169.254.0.0     0.0.0.0         255.255.0.0     U     1019   0        0 enp5s0f1.404
192.168.122.0   0.0.0.0         255.255.255.0   U     0      0        0 virbr0

external NFS share is reachable from the 404 subinterface.

Logs from host attached.

Tested on these components:
ovirt-hosted-engine-setup-2.2.18-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch
rhvm-appliance-4.2-20180420.0.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 18 Nikolai Sednev 2018-04-23 17:35:45 UTC
Created attachment 1425705 [details]
sosreport from host

Comment 19 Simone Tiraboschi 2018-04-24 09:16:30 UTC
(In reply to Nikolai Sednev from comment #17)
> I still see issue with deployment of SHE over tagged VLAN:
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The
> resolved address doesn't resolve on the selected interface\n"}

From the logs I set that you tryed to add the host to the engine as puma19.scl.lab.tlv.redhat.com:

 2018-04-23 20:38:51,880+0300 DEBUG otopi.plugins.gr_he_common.network.bridge dialog.queryEnvKey:90 queryEnvKey called for key OVEHOSTED_NETWORK/host_name
 2018-04-23 20:38:51,881+0300 DEBUG otopi.plugins.gr_he_common.network.bridge hostname._validateFQDNresolvability:261 puma19.scl.lab.tlv.redhat.com resolves to: set(['10.35.160.47'])

and puma19.scl.lab.tlv.redhat.com got resolved as 10.35.160.47

and on your system you have:

3: enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 44:1e:a1:73:39:61 brd ff:ff:ff:ff:ff:ff
    inet 10.35.160.47/22 brd 10.35.163.255 scope global noprefixroute dynamic enp5s0f1
       valid_lft 42119sec preferred_lft 42119sec
    inet6 fe80::461e:a1ff:fe73:3961/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
19: enp5s0f1.404@enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:7b:b8:64 brd ff:ff:ff:ff:ff:ff
    inet 10.35.147.75/28 brd 10.35.147.79 scope global dynamic enp5s0f1.404
       valid_lft 42127sec preferred_lft 42127sec
    inet6 fe80::216:3eff:fe7b:b864/64 scope link
       valid_lft forever preferred_lft forever

So puma19.scl.lab.tlv.redhat.com will match with the address configured on enp5s0f1 but not with address configured on enp5s0f1.404@enp5s0f1 (10.35.147.75/28).

So "[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The resolved address doesn't resolve on the selected interface\n"}" looks correct and coherent.

Can you please retry ensuring that the host address you are going to use really resolves on the selected VLAN interface?

Comment 20 Nikolai Sednev 2018-04-24 12:39:55 UTC
As Simone told in previous comment, it was a DNS resolution issue in my environment.
Deployment was successful over NFS share using these components:
ovirt-engine-setup-4.2.3.2-0.1.el7.noarch
ovirt-hosted-engine-setup-2.2.18-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch
rhvm-appliance-4.2-20180420.0.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

I've used interface VLAN404 during deployment and it was successful, after fixing the environment issue.
Moving to verified.

Comment 21 Sandro Bonazzola 2018-05-10 06:32:02 UTC
This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.