Bug 1924751
| Summary: | [RFE] health-checking for ovn-controller | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Casey Callendrello <cdc> |
| Component: | OVN | Assignee: | Dumitru Ceara <dceara> |
| Status: | CLOSED ERRATA | QA Contact: | Ehsan Elahi <eelahi> |
| Severity: | low | Docs Contact: | |
| Priority: | medium | ||
| Version: | RHEL 8.0 | CC: | ctrautma, dceara, kfida |
| Target Milestone: | --- | ||
| Target Release: | FDP 21.I | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ovn21.09-21.09.0-9.el8fdp | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-12-09 15:37:27 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Casey Callendrello
2021-02-03 14:46:50 UTC
DCBW pointed out we've discussed something similar - namely, and end-to-end sign that flows are being propagated correctly. We should increment a timestamp in nb_cfg and see that it goes to sbdb and ovn-controller. Then, publish its value as a metric for both the ovn-kube master and all the ovn-controllers. This is a good idea, and I would like to see it as a part of this RFE as well. After talking to the OVN team, we realized the nb_cfg propagation mechanism is already almost everything we need. Updating nb_cfg in nbdb, e.g. ovn-nbctl set NB_Global . nb_cfg=$(ts), will cause this value to be propagated down to SB_Global and then, via the ovn-controller, to the Chassis_Priv table. This table already includes the "nb_cfg_timestamp", which is the timestamp at which the latest nb_cfg value was set. Furthermore, nb_cfg is also available in the node's ovsdb, in the Bridge table. It would be useful, however, if the nb_cfg_timestamp were also available in the local ovsdb. So, this RFE is for two small(?) additions to the node's local ovsdb: 1- nb_cfg_timestamp 2- ovn-controller process start timestamp (The latter is to make it possible to alert on high e2e latency. Otherwise it's hard to tell the difference between an ovn-controller restart and high running latency) Patch sent for review: http://patchwork.ozlabs.org/project/ovn/list/?series=249177&state=* v2 posted for review: http://patchwork.ozlabs.org/project/ovn/list/?series=249334&state=* Verified on:
# rpm -qa | grep -E 'ovn|openvswitch'
ovn-2021-host-21.09.0-12.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
openvswitch2.15-2.15.0-26.el8fdp.x86_64
ovn-2021-21.09.0-12.el8fdp.x86_64
ovn-2021-central-21.09.0-12.el8fdp.x86_64
ovn-nbctl --wait=hv sync
ovs-vsctl get Bridge br-int external_ids
{ct-zone-3d194f6b-c65f-4e4f-8867-b67f02208efa_dnat="4", ct-zone-3d194f6b-c65f-4e4f-8867-b67f02208efa_snat="5", ct-zone-495cb9cb-95a3-4c01-81d9-83977429b71d_dnat="8", ct-zone-495cb9cb-95a3-4c01-81d9-83977429b71d_snat="6", ct-zone-8f7b933f-d227-422d-a427-ac3a293ac19e_dnat="1", ct-zone-8f7b933f-d227-422d-a427-ac3a293ac19e_snat="2", ct-zone-ac15f654-9847-4331-9db2-1a920be4da7e_dnat="9", ct-zone-ac15f654-9847-4331-9db2-1a920be4da7e_snat="7", ct-zone-vm1="3", ct-zone-vm2="10", ct-zone-vm3="11", ovn-nb-cfg="1", ovn-nb-cfg-ts="1634816184014", ovn-startup-ts="1634814832761"}
====> ovn-nb-cfg-ts and ovn-startup-ts can be noted
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5059 |