Bug 1593804
Summary: | ovn-controller: report when was the most recent successful communication with central | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Dan Kenigsberg <danken> |
Component: | openvswitch | Assignee: | lorenzo bianconi <lorenzo.bianconi> |
Status: | CLOSED ERRATA | QA Contact: | haidong li <haili> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.0 | CC: | atelang, atragler, haili, kfida, lmanasko, lorenzo.bianconi, mmichels, mmirecki, tredaelli |
Target Milestone: | rc | Keywords: | FutureFeature |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openvswitch-2.9.0-58.el7fdn | Doc Type: | Enhancement |
Doc Text: |
With this update the ovs-appctl connection-status command has been introduced to the ovs-appctl utility. The command enables to monitor hypervisor (HV) south bound database (SBDB) connection status. Layered products can now check if the ovn-controller is properly connected to a central node.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-11-05 14:59:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1216991 |
Description
Dan Kenigsberg
2018-06-21 15:31:54 UTC
If a CMS wants to know if a configuration change has been applied or if a controller is reachable it will possible to use nb_cfg/sb_cfg/hv_cfg columns in the NB_Global table of NB db. According to the ovn-architecture man page, when the CMS updates the configuration in the northbound database, as part of the same transaction, it can increment the value of the nb_cfg column in the NB_Global table. ovn-northd copies nb_cfg from NB db to the SB_Global table of SB db as part of the same transaction. Each ovn-controller updates ovs-vswitchd configuration and nb_cfg value is propagated to the corresponding columns in the Chassis table of the SB db. Moreover ovn-northd monitors the nb_cfg column in all of the Chassis records in the southbound database. It keeps track of the minimum value among all the records and copies it into the hv_cfg column in the northbound NB_Global table. The CMS or another observer can determine when all of the hypervisors have caught up to the northbound configuration and if they are reachable Connecting from the host where ovn-controller is located to SB to check if the installation/configuration succeeded would be somewhat problematic.
Is it possible to validate if the ovn-controller connected and registerd itself correctly with OVN central, without accessesing SB, but only by looking at ovn-controller resources available on the host?
> Each ovn-controller updates ovs-vswitchd configuration
what is updated? Could we make use of this?
Is external_ids:ovn-chassis-id generated by ovn-controller, or is this value generated on SB, and returned back to ovn-controller? If on SB, we could use that as an indicator.
After a talk with Lorenzo, let me narrow down the question: We need to know from the ovn-controller, if the ovn-controller ever successfully connected to SB. By saying "from ovn-controller", I mean from the host where ovn-controller is installed, either by querying the local ovs db or ovn-controller directly. What we'd like to have is basically option 2 of https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047025.html Dan asked me to comment about the change here. Lorenzo added a new command: `ovs-appctl -t ovn-controller connection-status` If ovn-controller is connected to the southbound database, then the command returns "connected". Otherwise, it returns "not connected". I have tested with this command on the latest ovs version,the command works well if the ovn-controller is connected to SB.Then I changed the address of "external-ids:ovn-remote" to a inexistent address,so the ovn-controller can't connect to the SB and the "ovs-appctl -t" command displayed "not connected" as expected.But after that I restarted the ovn-controller,then the "ovs-appctl -t" command hang there and didn't print anything.Is it expected? [root@hp-dl580g7-01 ~]# uname -a Linux hp-dl580g7-01.rhts.eng.pek2.redhat.com 3.10.0-954.el7.x86_64.debug #1 SMP Mon Sep 24 16:24:23 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux [root@hp-dl580g7-01 ~]# rpm -qa | grep openvswitch openvswitch-ovn-central-2.9.0-70.el7fdp.x86_64 openvswitch-ovn-common-2.9.0-70.el7fdp.x86_64 openvswitch-ovn-host-2.9.0-70.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-3.el7fdp.noarch openvswitch-2.9.0-70.el7fdp.x86_64 [root@hp-dl580g7-01 ~]# [root@hp-dl580g7-01 ~]# ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=tcp:20.0.0.25:6642 [root@hp-dl580g7-01 ~]# systemctl restart ovn-controller [root@hp-dl580g7-01 ~]# ovs-appctl -t ovn-controller connection-status connected [root@hp-dl580g7-01 ~]# ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=tcp:20.0.0.28:6642 [root@hp-dl580g7-01 ~]# ovs-appctl -t ovn-controller connection-status not connected [root@hp-dl580g7-01 ~]# systemctl restart ovn-controller [root@hp-dl580g7-01 ~]# ovs-appctl -t ovn-controller connection-status ^C2018-09-30T09:10:11Z|00001|fatal_signal|WARN|terminating with signal 2 (Interrupt) <----hang there [root@hp-dl580g7-01 ~]# Hi, Yes, that is the expected behavior since at bootstrap ovn-controller tries to get an initial snapshot of sbdb and blocks until it gets it (forever if ovn remote is invalid). The following series has been proposed by Ben Pfaff in order to improve this limitation but it has not been merged yet - https://patchwork.ozlabs.org/patch/931134/ - https://patchwork.ozlabs.org/patch/931135/ I double-checked that this series fixes the reported issue Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3500 |