Bug 1990937
Summary: | Azure CI is failing with ingress failures | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Stephen Benjamin <stbenjam> |
Component: | Networking | Assignee: | Luigi Mario Zuccarelli <luzuccar> |
Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | aos-bugs, aravindh, jsafrane, mmasters, sippy |
Version: | 4.9 | ||
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-08-11 15:14:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Stephen Benjamin
2021-08-06 15:46:05 UTC
Comparing to https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/1422744810065760256/artifacts/e2e-azure/gather-must-gather/artifacts/event-filter.html (the last green one), I see abundance of these events: MountVolume.SetUp failed for volume "machine-api-controllers-tls" : failed to sync secret cache: timed out waiting for the condition MountVolume.SetUp failed for volume "var-lib-tuned-profiles-data" : failed to sync configmap cache: timed out waiting for the condition 66 events total (about other secrets/config maps too), while the green one has only 11 of them. The event is emitted by a node when it can't get a Secret/ConfigMap from the API server. There is nothing storage related in these errors. I don't know if the events are the root cause of the failures. Thanks for looking, the failures all seem to be reporting SyncLoadBalacnerFailed errors, and I thought storage was the culprit because of those FailedMount messages + the bad timing of the CSI driver changes. For Ingress, it's reporting this: ./registry-build01-ci-openshift-org-ci-op-24lhkg93-stable-sha256-b3482870e4d8e70ff91da69d12fe1898afbcbadbaf90acd19f9a5047067bda84/namespaces/openshift-ingress-operator/pods/ingress-operator-6b4ccff786-ckz9s/ingress-operator/ingress-operator/logs/current.log:2021-08-06T13:07:36.330587391Z 2021-08-06T13:07:36.330Z ERROR operator.ingress_controller controller/controller.go:298 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1), LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: EnsureBackendPoolDeleted: failed to parse the VMAS ID : getAvailabilitySetNameByID: failed to parse the VMAS ID \nThe kube-controller-manager logs may contain more details.)"} Not sure if ./registry-build01-ci-openshift-org-ci-op-24lhkg93-stable-sha256-b3482870e4d8e70ff91da69d12fe1898afbcbadbaf90acd19f9a5047067bda84/namespaces/openshift-ingress-operator/pods/ingress-operator-6b4ccff786-ckz9s/ingress-operator/ingress-operator/logs/current.log:2021-08-06T13:07:36.330587391Z 2021-08-06T13:07:36.330Z ERROR operator.ingress_controller controller/controller.go:298 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1), LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: EnsureBackendPoolDeleted: failed to parse the VMAS ID : getAvailabilitySetNameByID: failed to parse the VMAS ID \nThe kube-controller-manager logs may contain more details.)"} From a quick look at the other changes around the time of the failures, I don't see any code changes on our side that might explain this Looks like Trevor fixed this under BZ1990916, the bootstrap was being left around during installation. *** This bug has been marked as a duplicate of bug 1990916 *** |