Bug 1866868
| Summary: | Flake: error waiting for deployment e2e-aws-fips | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Matthew Heon <mheon> |
| Component: | kube-controller-manager | Assignee: | Maciej Szulik <maszulik> |
| Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | unspecified | CC: | aos-bugs, danili, fromani, jokerman, knarra, mfojtik |
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | non-multi-arch | ||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-27 16:26:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
e2e-fips is having a miriade of issues right now. Sometimes the cluster doesn't even init and other times it has a ton of flakes/failures. I want to try to narrow this before suggesting changes or passing it along. https://search.ci.openshift.org/?search=k8s.io%2Fkubernetes%2Ftest%2Fe2e%2Fapps%2Fdeployment.go%3A904 1 of the 3 deployment test pods is not being scheduled Aug 12 18:33:01.831: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for test-rolling-update-with-lb-b9c9c6bcc-4phtw: { } Scheduled: Successfully assigned e2e-deployment-5371/test-rolling-update-with-lb-b9c9c6bcc-4phtw to ip-10-0-146-124.us-east-2.compute.internal Aug 12 18:33:01.831: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for test-rolling-update-with-lb-b9c9c6bcc-hlsnz: { } FailedScheduling: 0/5 nodes are available: 2 node(s) didn't match pod affinity/anti-affinity, 2 node(s) didn't match pod anti-affinity rules, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Aug 12 18:33:01.831: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for test-rolling-update-with-lb-b9c9c6bcc-vx4j5: { } Scheduled: Successfully assigned e2e-deployment-5371/test-rolling-update-with-lb-b9c9c6bcc-vx4j5 to ip-10-0-228-200.us-east-2.compute.internal I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. Fix actually landed in https://github.com/openshift/origin/pull/25010 will wait for few more days to check the flake and then move to verified state. Moving the bug to verified state as i see that the fix landed about 6 days ago and no failures seen from that point when checked here for about 7 days. https://search.ci.openshift.org/?search=k8s.io%2Fkubernetes%2Ftest%2Fe2e%2Fapps%2Fdeployment.go%3A904 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |
[BUILD-WATCHER] Seeing frequent deployment errors in pull-ci-openshift-kubernetes-master-e2e-aws-fips job: Aug 6 14:01:30.921: INFO: Running AfterSuite actions on all nodes Aug 6 14:01:30.921: INFO: Running AfterSuite actions on node 1 fail [@/k8s.io/kubernetes/test/e2e/apps/deployment.go:904]: Unexpected error: <*errors.errorString | 0xc001b13e90>: { s: "error waiting for deployment \"test-rolling-update-with-lb\" status to match expectation: deployment status: v1.DeploymentStatus{ObservedGeneration:1, Replicas:3, UpdatedReplicas:3, ReadyReplicas:2, AvailableReplicas:2, UnavailableReplicas:1, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:\"Available\", Status:\"False\", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63732318989, loc:(*time.Location)(0x9e74040)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63732318989, loc:(*time.Location)(0x9e74040)}}, Reason:\"MinimumReplicasUnavailable\", Message:\"Deployment does not have minimum availability.\"}, v1.DeploymentCondition{Type:\"Progressing\", Status:\"True\", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63732319009, loc:(*time.Location)(0x9e74040)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63732318989, loc:(*time.Location)(0x9e74040)}}, Reason:\"ReplicaSetUpdated\", Message:\"ReplicaSet \\\"test-rolling-update-with-lb-b9c9c6bcc\\\" is progressing.\"}}, CollisionCount:(*int32)(nil)}", } error waiting for deployment "test-rolling-update-with-lb" status to match expectation: deployment status: v1.DeploymentStatus{ObservedGeneration:1, Replicas:3, UpdatedReplicas:3, ReadyReplicas:2, AvailableReplicas:2, UnavailableReplicas:1, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:"Available", Status:"False", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63732318989, loc:(*time.Location)(0x9e74040)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63732318989, loc:(*time.Location)(0x9e74040)}}, Reason:"MinimumReplicasUnavailable", Message:"Deployment does not have minimum availability."}, v1.DeploymentCondition{Type:"Progressing", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63732319009, loc:(*time.Location)(0x9e74040)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63732318989, loc:(*time.Location)(0x9e74040)}}, Reason:"ReplicaSetUpdated", Message:"ReplicaSet \"test-rolling-update-with-lb-b9c9c6bcc\" is progressing."}}, CollisionCount:(*int32)(nil)} occurred Sample failing job: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-remote-libvirt-s390x-4.5/1291373614746046464 Search of all failing jobs: https://search.ci.openshift.org/?search=error+waiting+for+deployment+&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job Possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1861095