Bug 1593804

Summary: ovn-controller: report when was the most recent successful communication with central
Product: Red Hat Enterprise Linux 7 Reporter: Dan Kenigsberg <danken>
Component: openvswitchAssignee: lorenzo bianconi <lorenzo.bianconi>
Status: CLOSED ERRATA QA Contact: haidong li <haili>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: atelang, atragler, haili, kfida, lmanasko, lorenzo.bianconi, mmichels, mmirecki, tredaelli
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openvswitch-2.9.0-58.el7fdn Doc Type: Enhancement
Doc Text:
With this update the ovs-appctl connection-status command has been introduced to the ovs-appctl utility. The command enables to monitor hypervisor (HV) south bound database (SBDB) connection status. Layered products can now check if the ovn-controller is properly connected to a central node.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-05 14:59:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1216991    

Description Dan Kenigsberg 2018-06-21 15:31:54 UTC
RHV would like to know whether an ovn chassis is properly connected to ovn-central.

One way to achieve that is if ovn-controller touches a file after a successful communication with central. RHV would consider it an error condition if the file is older than X minutes.

Other implementations are welcome, too.

Comment 1 lorenzo bianconi 2018-07-04 15:16:05 UTC
If a CMS wants to know if a configuration change has been applied or if a controller is reachable it will possible to use nb_cfg/sb_cfg/hv_cfg columns in the NB_Global table of NB db.
According to the ovn-architecture man page, when the CMS updates the configuration in the northbound database, as part of the same transaction, it can increment the value of the nb_cfg column in the NB_Global table. ovn-northd copies nb_cfg from NB db to the SB_Global table of SB db as part of the same transaction. Each ovn-controller updates ovs-vswitchd configuration and nb_cfg value is propagated to the corresponding columns in the Chassis table of the SB db. Moreover ovn-northd monitors the nb_cfg column in all of the Chassis records in the southbound database. It keeps track of the minimum value among all the records and copies it into the hv_cfg column in the northbound NB_Global table. The CMS or another observer can determine when all of the hypervisors have caught up to the northbound configuration and if they are reachable

Comment 2 Marcin Mirecki 2018-07-11 13:12:17 UTC
Connecting from the host where ovn-controller is located to SB to check if the installation/configuration succeeded would be somewhat problematic. 

Is it possible to validate if the ovn-controller connected and registerd itself correctly with OVN central, without accessesing SB, but only by looking at ovn-controller resources available on the host?

>  Each ovn-controller updates ovs-vswitchd configuration
what is updated? Could we make use of this?

Is external_ids:ovn-chassis-id generated by ovn-controller, or is this value generated on SB, and returned back to ovn-controller? If on SB, we could use that as an indicator.

Comment 3 Marcin Mirecki 2018-07-11 13:49:49 UTC
After a talk with Lorenzo, let me narrow down the question:

We need to know from the ovn-controller, if the ovn-controller ever successfully connected to SB.

By saying "from ovn-controller", I mean from the host where ovn-controller is installed, either by querying the local ovs db or ovn-controller directly.

Comment 4 Dan Kenigsberg 2018-07-12 15:27:52 UTC
What we'd like to have is basically option 2 of https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047025.html

Comment 5 Mark Michelson 2018-08-13 12:24:12 UTC
Dan asked me to comment about the change here. Lorenzo added a new command:

`ovs-appctl -t ovn-controller connection-status`

If ovn-controller is connected to the southbound database, then the command returns "connected". Otherwise, it returns "not connected".

Comment 7 haidong li 2018-09-30 09:16:43 UTC
I have tested with this command on the latest ovs version,the command works well  
if the ovn-controller is connected to SB.Then I changed the address of "external-ids:ovn-remote" to a inexistent address,so the ovn-controller can't connect to the SB and the "ovs-appctl -t" command displayed "not connected" as expected.But after that I restarted the ovn-controller,then the "ovs-appctl -t" command hang there and didn't print anything.Is it expected?

[root@hp-dl580g7-01 ~]# uname -a
Linux hp-dl580g7-01.rhts.eng.pek2.redhat.com 3.10.0-954.el7.x86_64.debug #1 SMP Mon Sep 24 16:24:23 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@hp-dl580g7-01 ~]# rpm -qa | grep openvswitch
openvswitch-ovn-central-2.9.0-70.el7fdp.x86_64
openvswitch-ovn-common-2.9.0-70.el7fdp.x86_64
openvswitch-ovn-host-2.9.0-70.el7fdp.x86_64
openvswitch-selinux-extra-policy-1.0-3.el7fdp.noarch
openvswitch-2.9.0-70.el7fdp.x86_64
[root@hp-dl580g7-01 ~]#
[root@hp-dl580g7-01 ~]# ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=tcp:20.0.0.25:6642
[root@hp-dl580g7-01 ~]# systemctl restart ovn-controller
[root@hp-dl580g7-01 ~]# ovs-appctl -t ovn-controller connection-status
connected
[root@hp-dl580g7-01 ~]# ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=tcp:20.0.0.28:6642
[root@hp-dl580g7-01 ~]# ovs-appctl -t ovn-controller connection-status
not connected
[root@hp-dl580g7-01 ~]# systemctl restart ovn-controller
[root@hp-dl580g7-01 ~]# ovs-appctl -t ovn-controller connection-status
^C2018-09-30T09:10:11Z|00001|fatal_signal|WARN|terminating with signal 2 (Interrupt)                                              <----hang there

[root@hp-dl580g7-01 ~]#

Comment 8 lorenzo bianconi 2018-10-04 15:07:06 UTC
Hi,

Yes, that is the expected behavior since at bootstrap ovn-controller tries to get an initial snapshot of sbdb and blocks until it gets it (forever if ovn remote is invalid). The following series has been proposed by Ben Pfaff in order to improve this limitation but it has not been merged yet
- https://patchwork.ozlabs.org/patch/931134/
- https://patchwork.ozlabs.org/patch/931135/

I double-checked that this series fixes the reported issue

Comment 9 haidong li 2018-10-08 06:36:13 UTC
Change the status to verified according to comment7 and comment8.

Comment 15 errata-xmlrpc 2018-11-05 14:59:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3500