Bug 2104957
| Summary: | cluster becomes unstable after enabling IPFix exports with sampling of 1:1 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mehul Modi <memodi> |
| Component: | Networking | Assignee: | ffernand <ffernand> |
| Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | davegord, ffernandez, jtakvori, mifiedle, nweinber, rravaiol |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-01-05 15:51:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2080477, 2104943 | ||
| Bug Blocks: | |||
|
Description
Mehul Modi
2022-07-07 14:52:51 UTC
Adding more info: ip-10-0-141-43.us-east-2.compute.internal Ready worker 49m v1.24.0+2dd8bb1 ip-10-0-148-46.us-east-2.compute.internal NotReady master 58m v1.24.0+2dd8bb1 ip-10-0-154-197.us-east-2.compute.internal Ready worker 52m v1.24.0+2dd8bb1 ip-10-0-182-158.us-east-2.compute.internal Ready worker 49m v1.24.0+2dd8bb1 ip-10-0-182-170.us-east-2.compute.internal Ready master 59m v1.24.0+2dd8bb1 ip-10-0-190-24.us-east-2.compute.internal Ready worker 49m v1.24.0+2dd8bb1 ip-10-0-203-222.us-east-2.compute.internal Ready master 58m v1.24.0+2dd8bb1 ip-10-0-203-62.us-east-2.compute.internal Ready worker 49m v1.24.0+2dd8bb1 ip-10-0-207-237.us-east-2.compute.internal Ready worker 53m v1.24.0+2dd8bb1 COs state could transition to Progressing state and eventually recovers $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.11.0-rc.1 True False False 4s baremetal 4.11.0-rc.1 True False False 97m cloud-controller-manager 4.11.0-rc.1 True False False 101m cloud-credential 4.11.0-rc.1 True False False 101m cluster-autoscaler 4.11.0-rc.1 True False False 97m config-operator 4.11.0-rc.1 True False False 98m console 4.11.0-rc.1 True False False 86m csi-snapshot-controller 4.11.0-rc.1 True True False 98m CSISnapshotControllerProgressing: Waiting for Deployment to deploy csi-snapshot-controller pods dns 4.11.0-rc.1 True True False 98m DNS "default" reports Progressing=True: "Have 8 available DNS pods, want 9.\nHave 8 available node-resolver pods, want 9." etcd 4.11.0-rc.1 True False False 96m image-registry 4.11.0-rc.1 True False False 92m ingress 4.11.0-rc.1 True False False 92m insights 4.11.0-rc.1 True False False 92m kube-apiserver 4.11.0-rc.1 True False False 93m kube-controller-manager 4.11.0-rc.1 True False False 94m kube-scheduler 4.11.0-rc.1 True False False 96m kube-storage-version-migrator 4.11.0-rc.1 False True False 74s KubeStorageVersionMigratorAvailable: Waiting for Deployment machine-api 4.11.0-rc.1 True False False 94m machine-approver 4.11.0-rc.1 True False False 98m machine-config 4.11.0-rc.1 True False False 97m marketplace 4.11.0-rc.1 True False False 98m monitoring 4.11.0-rc.1 True False False 91m network 4.11.0-rc.1 True True False 100m DaemonSet "/openshift-multus/multus-admission-controller" is not available (awaiting 1 nodes)... node-tuning 4.11.0-rc.1 True False False 41m openshift-apiserver 4.11.0-rc.1 True False False 74s openshift-controller-manager 4.11.0-rc.1 True False False 94m openshift-samples 4.11.0-rc.1 True False False 91m operator-lifecycle-manager 4.11.0-rc.1 True False False 97m operator-lifecycle-manager-catalog 4.11.0-rc.1 True False False 98m operator-lifecycle-manager-packageserver 4.11.0-rc.1 True False False 40m service-ca 4.11.0-rc.1 True False False 98m storage 4.11.0-rc.1 True True False 98m AWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverControllerServiceControllerProgressing: Waiting for Deployment to deploy pods... which in some cases are eventually recovered as it did in above, note that in above case I had AWS m5.2xlarge machines with 32GB and 8 vCPU Apologies for setting flags which I should not have. Removing the blocker+ flag and target release to let engineering teams decide based on their triage. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |