Bug 1950590
| Summary: | CNO: Too many OVN netFlows collectors causes ovnkube pods CrashLoopBackOff | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ross Brattain <rbrattai> | ||||
| Component: | Networking | Assignee: | Aniket Bhat <anbhat> | ||||
| Networking sub component: | ovn-kubernetes | QA Contact: | Ross Brattain <rbrattai> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | unspecified | CC: | aconstan, memodi, ricarril | ||||
| Version: | 4.8 | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.8.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-07-27 23:01:42 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 1772708 [details]
bad-cno-8.yaml
CNO YAML that fails
we need to check maxItems in API schema and in CNO renderOVNKubernetes and in ovn-kubernetes ParseFlowCollectors, MonitoringFlags, and setupOVNNode. We could exceed ovnkube `--netflow-targets=` arg as well as ovs-vsctl `targets=[%s]` command line limits. https://github.com/openshift/cluster-network-operator/pull/1068 for bumping openshift/api Verified on 4.8.0-0.nightly-2021-04-25-231500
Schema error is triggered.
# networks.operator.openshift.io "cluster" was not valid:
# * spec.exportNetworkFlows.netFlow.collectors: Invalid value: 47: spec.exportNetworkFlows.netFlow.collectors in body should have at most 10 items
oc explain updated
$ oc explain --api-version=operator.openshift.io/v1 networks.spec.exportNetworkFlows.netFlow
KIND: Network
VERSION: operator.openshift.io/v1
RESOURCE: netFlow <Object>
DESCRIPTION:
netFlow defines the NetFlow configuration.
FIELDS:
collectors <[]string>
netFlow defines the NetFlow collectors that will consume the flow data
exported from OVS. It is a list of strings formatted as ip:port with a
maximum of ten items
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |
Description of problem: The exportNetworkFlows collector API spec does not specify maxItems. Too many collectors will cause ovnkube to CrashLoopBackOff container dies with standard_init_linux.go:219: exec user process caused: argument list too long due to E2BIG Argument list too long (POSIX.1-2001). presumably due to ovnkube command line bytes approaching `getconf ARG_MAX` bytes. The limit seems to be less 2 MB. Greater than 2MB in the YAML causes etcd to reject the change with error: networks.operator.openshift.io "cluster" could not be patched: etcdserver: request is too large Version-Release number of selected component (if applicable): 4.8.0-0.nightly-2021-04-16-032542 How reproducible: Always Steps to Reproduce: 1. generate ~51200 IP address and port pairs as a YAML list - 10.1.1.1:2056 - 10.1.1.2:2056 - 10.1.1.3:2056 2. oc edit network.operator 3. paste in the list into the collector list exportNetworkFlows: netFlow: collectors: Actual results: ovnkube-node-hmg9w 3/4 CrashLoopBackOff 8 standard_init_linux.go:219: exec user process caused: argument list too long NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE ovnkube-node 6 6 5 1 5 beta.kubernetes.io/os=linux 13h Expected results: oc edit schema validation fails using maxItems limit. Additional info: 5697 IP port pairs seemed to work. I did not try to measure performance impact All user inputs must have a specified maximum.