Bug 1910340

Summary: Adding a new host doesn't add it to existing OVN config
Product: [oVirt] ovirt-provider-ovn Reporter: Gianluca Cecchi <gianluca.cecchi>
Component: providerAssignee: eraviv
Status: CLOSED NOTABUG QA Contact: Michael Burman <mburman>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 1.2.33CC: bugs, danken, dholler, mperina
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-12 10:51:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
debug tool
none
debug tool none

Description Gianluca Cecchi 2020-12-23 14:08:06 UTC
Description of problem:

I add a new host but the existing OVN config is not applied to him, so for example I cannot migrate VMs using vnic on OVN

Version-Release number of selected component (if applicable):

oVirt is 4.4.4 and on host I have:

ovn2.11-2.11.1-56.el8.x86_64
ovirt-openvswitch-ovn-host-2.11-0.2020061801.el8.noarch
ovirt-provider-ovn-driver-1.2.33-1.el8.noarch
ovn2.11-host-2.11.1-56.el8.x86_64
ovirt-openvswitch-ovn-2.11-0.2020061801.el8.noarch
ovirt-openvswitch-ovn-common-2.11-0.2020061801.el8.noarch

On engine (external) I have:

ovirt-openvswitch-ovn-2.11-0.2020061801.el8.noarch
ovirt-openvswitch-ovn-central-2.11-0.2020061801.el8.noarch
ovirt-openvswitch-ovn-common-2.11-0.2020061801.el8.noarch
ovirt-provider-ovn-1.2.33-1.el8.noarch
novnc-1.1.0-6.el8.noarch
ovn2.11-central-2.11.1-56.el8.x86_64
ovn2.11-2.11.1-56.el8.x86_64



How reproducible:
always

Steps to Reproduce:
1. add a new CentOS 8.3 host to the 4.4.4 infra
2. configure networks through Network Interfaces --> Setup Host Network 
3. activate host

Actual results:

OVN confing for new host is not in place; the command "ovn-sbctl show" on engine doesn't contain Chassis information related to the new host

Expected results:

OVN config automatically applied

Additional info:

If on host I manually run then:

vdsm-tool ovn-config engine_ip host_on_ovirtmgmt_ip

The configuration completes and I can use OVN based VMs on the new host

Comment 1 Michael Burman 2020-12-23 15:37:00 UTC
HI, thank you for the report.

QE can't reproduce the issue, working for us as expected. 
New host configured properly with OVN config.

Using this versions:
rhvm-4.4.4.5-0.10.el8ev.noarch
ovn2.11-2.11.1-56.el8fdp.x86_64
ovirt-provider-ovn-1.2.33-1.el8ev.noarch
rhv-openvswitch-ovn-common-2.11-7.el8ev.noarch
rhv-openvswitch-ovn-central-2.11-7.el8ev.noarch
rhv-openvswitch-2.11-7.el8ev.noarch
openvswitch2.11-2.11.3-74.el8fdp.x86_64

So the only difference is that you are using u/s OS and openvswitch versions. 
Might be an issue in the u/s openvswitch packages. 

I hope someone from development will be able to look on this report soon.

Comment 2 RHEL Program Management 2020-12-23 15:37:07 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 3 Michael Burman 2020-12-23 15:40:41 UTC
I noticed that the openvswitch packages are very old, 6 months old. Can you try to upgrade these packages and retry?

Comment 4 Gianluca Cecchi 2020-12-23 15:46:47 UTC
I think they are brought in from the oVirt related repos... I donna if it is safe to customize it
eg the package ovirt-openvswitch-ovn-2.11-0.2020061801.el8.noarch is from ovirt-4.4-centos-ovirt44 repo

Comment 5 Gianluca Cecchi 2020-12-23 15:47:28 UTC
Is there a particular log file where the actions of adding a new hosts, related to OVN config, should be written. So that we can compare the apparently working RHV ones and the apparently not working upstream oVirt ones?

Comment 6 Michael Burman 2020-12-23 15:59:53 UTC
You can look at /var/log/ovirt-engine/ansible/ansible.... log and check if  TASK [ovirt-provider-ovn-driver : Configure OVN for oVirt] is running as expected.
You can locate the exact ansible log for the host from the engine admin event UI.

I assume that your cluster is configured with network porivder ovn right? 
And that your ovirt-provider-ovn.service status is running

Comment 7 Dominik Holler 2021-01-04 07:46:22 UTC
In addition to the ansible logfiles, it would be helpful to know the relevant whole line containing the term 'host_deploy_ovn_central' of ansible-runner-service.log which describes the parameters of triggering ovirt-host-deploy.yml.

Comment 8 Gianluca Cecchi 2021-01-07 14:55:22 UTC
So I'm changing one-by-one my oVirt hosts from M610 to M620.
On 22/12/2020 I first removed my pre-existing ov200 and I do see this in ansible-20201222161323-ovirt-host-remove_yml-811f607b-c2bd-4a39-a745-0333346f10af.log-20210101.gz

2020-12-22 16:13:29 CET - TASK [ovirt-provider-ovn-driver : Install ovs] *********************************
2020-12-22 16:13:29 CET - TASK [ovirt-provider-ovn-driver : Ensure Open vSwitch is started] **************
2020-12-22 16:13:29 CET - TASK [Install ovirt-provider-ovn-driver] ***************************************
2020-12-22 16:13:29 CET - TASK [ovirt-provider-ovn-driver : Ensure ovn-controller is started] ************
2020-12-22 16:13:29 CET - TASK [ovirt-provider-ovn-driver : Configure OVN for oVirt] *********************
2020-12-22 16:13:29 CET - TASK [Check if ovirt-provider-ovn-driver is installed] *************************
2020-12-22 16:13:29 CET - ok: [ov200.mydomain]
. . .
2020-12-22 16:13:29 CET - TASK [ovirt-provider-ovn-driver : Unconfigure the OVN chassis] *****************
2020-12-22 16:13:32 CET - changed: [ov200.mydomain]
. . .

And this is my last file:

[root@ovmgr1 ansible]# pwd
/var/log/ovirt-engine/ansible
[root@ovmgr1 ansible]# ll -t | head -2
total 320
-rw-r--r--. 1 ovirt ovirt  9265 Dec 22 16:13 ansible-20201222161323-ovirt-host-remove_yml-811f607b-c2bd-4a39-a745-0333346f10af.log-20210101.gz
[root@ovmgr1 ansible]# 

While in engine web admin UI on Hosts --> ov200 --> Events I see for the new host added (with the same name as the previous one):

Dec 22, 2020, 11:41:12 PM Host ov200 was added by ...

In ansible-runner-service.log (actually ansible-runner-service.log-20201227.gz) if I search for the string "host_deploy_ovn_central", I see:


2020-12-22 23:41:13,225 - runner_service.controllers.playbooks - INFO - Playbook run request for ovirt-host-deploy.yml, from 127.0.0.1, parameters: {'host_deploy_cluster_name': 'Z2Z3-OV-10G', 'host_deploy_kdump_integration': 'false', 'host_deploy_kdump_destination_port': '7410', 'host_deploy_iptables_rules': "\n# oVirt default firewall configuration. Automatically generated by vdsm bootstrap script.\n*filter\n:INPUT ACCEPT [0:0]\n:FORWARD ACCEPT [0:0]\n:OUTPUT ACCEPT [0:0]\n-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT\n-A INPUT -p icmp -j ACCEPT\n-A INPUT -i lo -j ACCEPT\n# vdsm\n-A INPUT -p tcp --dport 22 -j ACCEPT\n# ovirt-imageio-daemon\n-A INPUT -p tcp --dport 54322 -j ACCEPT\n# rpc.statd\n-A INPUT -p tcp --dport 111 -j ACCEPT\n-A INPUT -p udp --dport 111 -j ACCEPT\n# SSH\n-A INPUT -p tcp --dport 54321 -j ACCEPT\n# snmp\n-A INPUT -p udp --dport 161 -j ACCEPT\n# Cockpit\n-A INPUT -p tcp --dport 9090 -j ACCEPT\n\n\n# libvirt tls\n-A INPUT -p tcp --dport 16514 -j ACCEPT\n\n# serial consoles\n-A INPUT -p tcp -m multiport --dports 2223 -j ACCEPT\n\n# guest consoles\n-A INPUT -p tcp -m multiport --dports 5900:6923 -j ACCEPT\n\n# migration\n-A INPUT -p tcp -m multiport --dports 49152:49216 -j ACCEPT\n\n# OVN host tunnels\n-A INPUT -p udp --dport 6081 -j ACCEPT\n-A OUTPUT -p udp --dport 6081 -j ACCEPT\n-A INPUT -p tcp --dport 5666 -s 10.4.5.99/32 -m comment --comment 'Nagios NRPE daemon' -j ACCEPT\n\n# Reject any other input traffic\n-A INPUT -j REJECT --reject-with icmp-host-prohibited\n-A FORWARD -m physdev ! --physdev-is-bridged -j REJECT --reject-with icmp-host-prohibited\nCOMMIT\n", 'host_deploy_vdsm_ssl_ciphers': 'HIGH:!aNULL', 'hosted_engine_host_id': '3', 'host_deploy_cluster_switch_type': 'legacy', 'ovirt_pki_dir': '/etc/pki/ovirt-engine', 'ovirt_ca_key': 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCeubHyjT2iejf2KAdg1Qp154X/yPIpD/1jH/ytbPAhsGleIsp56iQrYoXm9lCGNlaa11/XX/BFSzbLsfvtydOZoprBne7L61QjEzn8QpbsdL4QSNRNLSajSkcThCMUSR2ob6nWcVbgBFj1Vat65TRpYSuiTXnoRuNAq0jvz7klGABfaAI48KRB3x7fNrwO4SRvsMgfpuFwoMxkY8Q+7yEu2vGZ1ZUuLhGowPSX5NHJ8XrBWs1r6/iNbSGm6i1KFbRexBodOcY7zZ95rAu4SejIwidk9AcW2/DYeqgKur8yS8t9ezvE4g73ZQP2S2CNz4B8Q/oVI2zE0jkupqP/xGkL', 'host_deploy_ovn_central': 'null', 'host_deploy_vnc_tls': 'false', 'ovirt_organizationname': 'mydomain', 'ovirt_vds_certificate_validity_in_days': '398', 'ovirt_vds_hostname': 'ov200.mydomain', 'ovirt_signcerttimeoutinseconds': '30', 'hosted_engine_tmp_cfg_file': '/tmp/temp-he-config14852668461596250012', 'host_deploy_ovn_tunneling_interface': '10.4.192.32', 'host_deploy_virt_enabled': 'true', 'host_deploy_kernel_cmdline_old': 'null', 'host_deploy_vdsm_port': '54321', 'ansible_port': '22', 'host_deploy_kdump_message_interval': '5', 'host_deploy_gluster_supported': 'false', 'hosted_engine_deploy_action': 'none', 'host_deploy_cluster_version': '4.5', 'host_deploy_vdsm_encrypt_host_communication': 'true', 'ovirt_qemu_ca_cert': '-----BEGIN CERTIFICATE----- . . . -----END CERTIFICATE-----\n', 'host_deploy_post_tasks': '/etc/ovirt-engine/ansible/ovirt-host-deploy-post-tasks.yml', 'host_deploy_tuned_profile': 'null', 'ovirt_ca_cert': '-----BEGIN CERTIFICATE----- . . . -----END CERTIFICATE-----\n', 'host_deploy_firewall_type': 'FIREWALLD', 'host_deploy_kdump_destination_address': 'ovmgr1.mydomain', 'ovirt_san': 'DNS:ov200.mydomain', 'ovirt_engine_usr': '/usr/share/ovirt-engine', 'host_deploy_gluster_enabled': 'false', 'host_deploy_kernel_cmdline_new': '', 'host_deploy_vdsm_min_version': '4.9', 'host_deploy_origin_type': 'OVIRT', 'host_deploy_override_firewall': 'true'}


So it seems I have:

 'host_deploy_ovn_central': 'null'

Comment 9 Dominik Holler 2021-01-11 09:04:29 UTC
(In reply to Gianluca Cecchi from comment #8)
> 
> So it seems I have:
> 
>  'host_deploy_ovn_central': 'null'

Thanks, this is the problem. oVirt Engine is not able to get it's own IP address [1], to use it as ovn-central.
Is the issue solved, if you ensure that the FQDN in the url of the ovirt-provider-ovn is resolved to an IP address?
An entry in /etc/hosts on Engine's host should be sufficient.

[1]  https://github.com/oVirt/ovirt-engine/blob/8fbc440618fb4f0ede3305fe97db4bcffb30c314/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/hostdeploy/InstallVdsInternalCommand.java#L530

Comment 10 Gianluca Cecchi 2021-01-11 12:53:16 UTC
In which sense? oVirt Engine IS able to get its own IP address, at least at DNS level:

[root@ovmgr1 ~]# nslookup ovmgr1
Server:		10.4.169.1
Address:	10.4.169.1#53

Name:	ovmgr1.mydomain
Address: 10.4.192.43

[root@ovmgr1 ~]# nslookup ovmgr1.mydomain
Server:		10.4.169.1
Address:	10.4.169.1#53

Name:	ovmgr1.mydomain
Address: 10.4.192.43

[root@ovmgr1 ~]# nslookup 10.4.192.43
43.192.4.10.in-addr.arpa	name = ovmgr1.mydomain.

[root@ovmgr1 ~]# 

And the same results if I force to use the secondary DNS server (10.4.167.1).
The DNS configuration is consistent also throughout the hosts that are on the same management network and with the same DNS servers set up.

Or are you referring to another way of "getting its own IP"?

Comment 11 Dominik Holler 2021-01-11 14:16:35 UTC
(In reply to Gianluca Cecchi from comment #10)
> In which sense? oVirt Engine IS able to get its own IP address, at least at
> DNS level:
> 

thanks for checking this

> 
> Or are you referring to another way of "getting its own IP"?

oVirt Engine tries to use Java's getAllByName to get the IP address from the hostname in the url for the ovirt-provider-ovn configured in oVirt Engine.
Does the following commands provide IP addresses?

host HOSTNAME_OF_OVIRT_PROVIDER_OVN
getent hosts HOSTNAME_OF_OVIRT_PROVIDER_OVN
getent ahosts HOSTNAME_OF_OVIRT_PROVIDER_OVN

Thanks for trying again.

Comment 12 Gianluca Cecchi 2021-01-11 14:31:57 UTC
In my web admin GUI I see the OVN external provider configured this way:

Name: ovirt-provider-ovn
Type: External Network Provider
Description: oVirt network provider for OVN
Provider URL: https://ovmgr1.mydomain:9696

So I used these commands to crosscheck what you asked:

[root@ovmgr1 ~]# host ovmgr1.mydomain
ovmgr1.mydomain has address 10.4.192.43

[root@ovmgr1 ~]# getent hosts ovmgr1.mydomain
fe80::9ec0:953a:2a4c:9ec7 ovmgr1.mydomain

[root@ovmgr1 ~]# getent ahosts ovmgr1.mydomain
10.4.192.43     STREAM ovmgr1.mydomain
10.4.192.43     DGRAM  
10.4.192.43     RAW    
[root@ovmgr1 ~]# 

Donna if the second one is what expected or if I should get the IPv4 value instead of the IPv6 link-local one...

Comment 13 Dominik Holler 2021-01-11 16:21:49 UTC
Created attachment 1746321 [details]
debug tool

Indeed, the output looks like it should.
Would you share the output of the attached tool? It mimics oVirt Engine's behavior in this regard.

Comment 14 Gianluca Cecchi 2021-01-11 17:15:10 UTC
I get this exception:

[root@ovmgr1 resolve]# ./resolve.sh
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.UnsupportedClassVersionError: Resolve has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:601)
[root@ovmgr1 resolve]#

Right now I have installed on my engine:

[root@ovmgr1 resolve]# java -version
openjdk version "1.8.0_272"
OpenJDK Runtime Environment (build 1.8.0_272-b10)
OpenJDK 64-Bit Server VM (build 25.272-b10, mixed mode)
[root@ovmgr1 resolve]#

Comment 15 Dominik Holler 2021-01-11 17:24:24 UTC
Created attachment 1746365 [details]
debug tool

(In reply to Gianluca Cecchi from comment #14)
> I get this exception:
> 
> [root@ovmgr1 resolve]# ./resolve.sh
> Error: A JNI error has occurred, please check your installation and try again
> Exception in thread "main" java.lang.UnsupportedClassVersionError: Resolve
> has been compiled by a more recent version of the Java Runtime (class file
> version 55.0), this version of the Java Runtime only recognizes class file
> versions up to 52.0
[..]
> [root@ovmgr1 resolve]#
> 
> Right now I have installed on my engine:
> 
> [root@ovmgr1 resolve]# java -version
> openjdk version "1.8.0_272"
> OpenJDK Runtime Environment (build 1.8.0_272-b10)
> OpenJDK 64-Bit Server VM (build 25.272-b10, mixed mode)
> [root@ovmgr1 resolve]#

Thanks, I recompiled for java 1.8

Comment 16 Gianluca Cecchi 2021-01-11 17:32:48 UTC
[root@ovmgr1 resolve]# ./resolve.sh
url: https://ovmgr1.mydomain:9696
uri.getHost: ovmgr1.mydomain
InetAddress: ovmgr1.mydomain/10.4.192.43
ip: 10.4.192.43
10.4.192.43
[root@ovmgr1 resolve]#

Comment 17 Dominik Holler 2021-01-11 19:59:43 UTC
Gianluca, thanks for your patience, so the hostname to IP address mapping is fine.

Can you please check if the ovirt-provider-ovn is configured as the default network provider for the cluster to which the host was added?

Comment 18 Gianluca Cecchi 2021-01-11 22:49:34 UTC
I'm not sure about what you mean with "default" network provider.
I only have one DC and one Cluster in this environment and if I go in web admin gui and then Administration --> Providers I only have "ovirt-image-repository" for type "OpenStack Image" and "ovirt-provider-ovn" for type "External Network Provider".
Or let me know what to check eventually, even at database level if it could help.

Comment 19 Dominik Holler 2021-01-12 06:56:58 UTC
Clusters have the attribute "Default Network Provider", which has to be set to "ovirt-provider-ovn" to trigger the automatic deployment of OVN.

Comment 20 Gianluca Cecchi 2021-01-12 09:24:52 UTC
Bingo! I missed this important information, described inside the admin guide:
https://www.ovirt.org/documentation/administration_guide/#Adding_OVN_as_an_External_Network_Provider

My cluster (that is not the Default one) had "Default Network Provider" set to "No Default Provider".
After changing it to "ovirt-provider-ovn", the existing hosts have been marked as to be reinstalled and indeed a reinstall removed the chassis line and then readded it.
Also, installing a new host now OVN is automatically and properly configured on it.
In host install events I see:

Installing Host ov301. Ensure Open vSwitch is started.
Installing Host ov301. Install ovirt-provider-ovn-driver.
Installing Host ov301. Ensure ovn-controller is started.
Installing Host ov301. Configure OVN for oVirt.

and indeed also in ansible-runner-service.log related to the deploy of the host I see

'host_deploy_ovn_central': '10.4.192.43'

Problem solved. Thanks