Bug 1809747 - [ovn-kubernetes] When a node gets deleted, the Chassis record for that node is not deleted from the sbdb.
Summary: [ovn-kubernetes] When a node gets deleted, the Chassis record for that node i...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.0
Assignee: Aniket Bhat
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks: 1809738
TreeView+ depends on / blocked
 
Reported: 2020-03-03 19:33 UTC by Aniket Bhat
Modified: 2020-07-13 17:18 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: On node deletion, the chassis record for the node would not get removed from the south-bound database Consequence: Stale chassis records will result in a lot of stale logical flows for that chassis which will never get removed. Fix: Added a node sync mechanism in ovnkube-master to purge chassis records of deleted nodes. Result: There are no more stale chassis records and therefore stale logical flows corresponding to deleted nodes in the south-bound database anymore.
Clone Of: 1809738
Environment:
Last Closed: 2020-07-13 17:17:45 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github ovn-org ovn-kubernetes pull 1113 None closed sync: Delete the chassis records during deleteNode from sbdb 2020-08-18 17:24:38 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-07-13 17:18:07 UTC

Description Aniket Bhat 2020-03-03 19:33:49 UTC
+++ This bug was initially created as a clone of Bug #1809738 +++

Description of problem:

When a node is deprovisioned/deleted from the cluster, the southbound db's chassis record for this node doesn't get deleted. This results in stale geneve tunnels and vswitchd flows on the other nodes in the cluster. At scale, this can mean thousands of tunnels and unused stale flows.

Version-Release number of selected component (if applicable):
4.4

How reproducible:
Always

Steps to Reproduce:
1. Create a ovn cluster
2. Add a few nodes
3. Delete one node
4. Note that the tunnels corresponding to the deleted node and the ovs flows for this remote ip endpoint stay in ovs' on the other nodes.

Actual results:

Flows and tunnels corresponding to the node being deleted stay as stale entries.

Expected results:

All flows and the corresponding tunnels for the node being deleted are cleaned up when the node goes away.

Additional info:

Upstream issue: https://github.com/ovn-org/ovn-kubernetes/issues/1105

Comment from Russell Bryant:

Just some more detail ... ovn-controller will delete its associated Chassis record if it shuts down gracefully. I'm not sure that's ever the case, though. The fallback is that something else needs to do the cleanup. ovn-kubernetes is already watching Nodes, so it can add this as another thing it does when syncing Nodes or when it sees a Node get deleted.

This will require knowing which Chassis record in the ovn southbound database corresponds to a Node. ovn-kubernetes already ensures that the hostname field of the Chassis is equal to the Node name.

Comment 4 Ross Brattain 2020-05-07 02:15:58 UTC
delete node and sbdb Chassis was removed after ~5 minutes.

Verified on 4.5.0-0.nightly-2020-05-05-205255

Comment 6 errata-xmlrpc 2020-07-13 17:17:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.