Bug 1985838 - [OVN] CNO exportNetworkFlows does not clear collectors when deleted
Summary: [OVN] CNO exportNetworkFlows does not clear collectors when deleted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Andrew Stoycos
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-26 02:45 UTC by Ross Brattain
Modified: 2022-03-12 04:36 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-12 04:36:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 834 0 None Merged [DownstreamMerge] Revert revert 2021-11-24 16:51:51 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-12 04:36:22 UTC

Description Ross Brattain 2021-07-26 02:45:04 UTC
Description of problem:


When `spec.exportNetworkFlows` is removed from `network.object` the existing collector targets are not cleared from OVS

oc patch network.operator cluster --type='json' \
    -p='[{"op":"remove", "path":"/spec/exportNetworkFlows"}]'

does not remove the collector targets in OVS.

Version-Release number of selected component (if applicable):

4.9.0-0.nightly-2021-07-21-081948

How reproducible:

Always

Steps to Reproduce:
1. patch network.object to add collector target

spec:
  exportNetworkFlows:
    netFlow:
      collectors:
        - 10.129.0.7:2056

2. wait for ovnkube-node pods to recycle
3. Delete the `spec.exportNetworkFlows`

oc patch network.operator cluster --type='json' \
    -p='[{"op":"remove", "path":"/spec/exportNetworkFlows"}]'

4. Verify the collector target is still configured in OVS

for f in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node -o jsonpath='{range@.items[*]}{.metadata.name}{"\n"}{end}' ) ; do oc  -n openshift-ovn-kubernetes exec -c ovnkube-node  $f -- bash -c 'for f in ipfix sflow netflow ; do  ovs-vsctl find $f ; done'   ; done


Actual results:

The ovnkube-node pods are immediately terminated and no OVS command is run to clear the collector targets.


Expected results:

OVS netflow collector targets are cleared when CNO `spec.exportNetworkFlows` is deleted.


Additional info:

Workaround is to clear the flows manually by running `ovs-vsctl -- clear Bridge br-int <FLOW>` for each flow type

for f in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node -o jsonpath='{range@.items[*]}{.metadata.name}{"\n"}{end}' ) ; do oc  -n openshift-ovn-kubernetes exec -c ovnkube-node  $f -- bash -c 'for f in ipfix sflow netflow ; do ovs-vsctl -- clear Bridge br-int $f ; done'   ; done


If the administrator changes the collectors then the old targets will be over-written, so another workaround is to set the collector to a non-routeable address and take whatever hit OVS will incur sending to non-routeable addresses.

`spec.exportNetworkFlows.netFlow.collectors` has `minItems: 1` in the API so we cannot clear the collectors with null or empty list.

Comment 1 Andrew Stoycos 2021-08-25 21:46:47 UTC
I reproduced this in upstream easily enough, 

1. Run the "e2e br-int NetFlow export validation" upstream e2e test on a local kind cluster. 

2. Manually remove the Netflow targets like so -> kubectl -n ovn-kubernetes set env daemonset/ovnkube-node -c ovnkube-node OVN_NETFLOW_TARGETS=""

...Allow ovnkube-node pods to reboot 

3. Check to see if the targets are still in OVS 

[astoycos@nfvsdn-02-oot ovn-kubernetes]$ for f in $(kubectl get pods -n ovn-kubernetes -l app=ovnkube-node -o jsonpath='{range@.items[*]}{.metadata.name}{"\n"}{end}' ) ; do kubectl  -n ovn-kubernetes exec -c ovnkube-node  $f -- bash -c 'for f in ipfix sflow netflow ; do  ovs-vsctl find $f ; done'   ; done
_uuid               : 5fff6c60-4e62-4d4d-9a85-8ef04adaa03a
active_timeout      : 60
add_id_to_interface : false
engine_id           : []
engine_type         : []
external_ids        : {}
targets             : ["172.18.0.5:2056"]
_uuid               : ed886d14-538d-4035-8d0c-c880572ae42a
active_timeout      : 60
add_id_to_interface : false
engine_id           : []
engine_type         : []
external_ids        : {}
targets             : ["172.18.0.5:2056"]
_uuid               : ae82ae95-aa42-4c14-b355-550ada8b8cb9
active_timeout      : 60
add_id_to_interface : false
engine_id           : []
engine_type         : []
external_ids        : {}
targets             : ["172.18.0.5:2056"]

I will post an upstream fix shortly and will extend the CI coverage to ensure we don't hit this again. 

Thanks, 
Andrew

Comment 2 Andrew Stoycos 2021-09-01 15:38:18 UTC
Upstream fix can be seen here -> https://github.com/ovn-org/ovn-kubernetes/pull/2462

Comment 5 Ross Brattain 2021-12-02 04:28:01 UTC
Verified on 4.10.0-0.nightly-2021-11-29-191648

Comment 9 errata-xmlrpc 2022-03-12 04:36:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.