Bug 1990916 - Azure CI: bootstrap preserved: failed to ensure load balancer: failed to parse the VMAS ID
Summary: Azure CI: bootstrap preserved: failed to ensure load balancer: failed to pars...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Test Framework
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.0
Assignee: W. Trevor King
QA Contact: liyao
URL:
Whiteboard: tag-ci
: 1990937 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-06 15:12 UTC by Evan Cordell
Modified: 2021-08-11 15:14 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-07 18:09:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift release pull 20978 0 None None None 2021-08-07 17:53:24 UTC

Description Evan Cordell 2021-08-06 15:12:09 UTC
https://search.ci.openshift.org/chart?search=OAuthServerRouteEndpointAccessibleController_SyncError&maxAge=48h&context=1&type=build-log&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

This is persistently affecting ~5% of CI test suites across the board. At a glance I can't tell if this is an issue with oauth-apiserver or if it reflects an underlying networking issue.

Comment 2 W. Trevor King 2021-08-07 17:24:07 UTC
It's not entirely Azure-specific, but it is certainly hammering the Azure jobs:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=OAuthServerRouteEndpointAccessibleController_SyncError
&maxAge=24h&type=build-log' | grep 'failures match' | grep -v 'pull-ci-\|rehearse-' | sort
periodic-ci-kata-containers-kata-containers-main-e2e-tests (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-kubevirt-kubevirt-main-4.8-e2e (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-odo-main-v4.9-integration-e2e-periodic (all) - 4 runs, 100% failed, 25% of failures match = 25% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-azure (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-azure-cilium (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-azure-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-azure-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-azure (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-azure-compact (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-azure-compact-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-azure-compact-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-azure-ovn (all) - 7 runs, 100% failed, 86% of failures match = 86% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-azure-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-azure-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-azure-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-azure (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-azure-csi (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.8-openshift-ipi-azure-arcconformance (all) - 4 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.9-e2e-azure (all) - 7 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.9-e2e-azure-csi (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.9-e2e-azure-fips (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.9-e2e-azure-fips-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.9-e2e-vsphere-csi (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
periodic-ci-openshift-release-master-nightly-4.9-e2e-vsphere-serial (all) - 8 runs, 100% failed, 13% of failures match = 13% impact
periodic-ci-openshift-release-master-nightly-4.9-openshift-ipi-azure-arcconformance (all) - 9 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-windows-machine-config-operator-master-vsphere-e2e-periodic (all) - 3 runs, 67% failed, 50% of failures match = 33% impact
periodic-ci-openshift-windows-machine-config-operator-release-4.10-vsphere-e2e-periodic (all) - 3 runs, 67% failed, 50% of failures match = 33% impact
periodic-ci-openshift-windows-machine-config-operator-release-4.9-vsphere-e2e-periodic (all) - 3 runs, 33% failed, 100% of failures match = 33% impact

Getting some example jobs:

$ curl -s 'https://search.ci.openshift.org/search?search=OAuthServerRouteEndpointAccessibleController_SyncError&maxAge=24h&type=build-log' | jq -r 'keys[]' | grep 'periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/'
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/1423691131337576448
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/1423714493896069120
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/1423766555199541248
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/1423841979489325056
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/1423892820149669888
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/1423941421911511040
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/1424015809470009344

Per TestGrid [1], the breakage was between [2] and [3].  From [3]:

  level=error msg=Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: EnsureBackendPoolDeleted: failed to parse the VMAS ID : getAvailabilitySetNameByID: failed to parse the VMAS ID

Diffing [2,3], and focusing on the changes that seem like they might possibly relate to Azure ingress:

  $ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/1423023995355140096/artifacts/release/artifacts/release-images-latest
  $ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/1423392049742221312/artifacts/release/artifacts/release-images-latest
  $ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]'
  $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}")
  ...
  @@ -10,2 +10,2 @@
  -azure-machine-controllers https://github.com/openshift/cluster-api-provider-azure/commit/5d94c794092f4f19c17e85dfadc8c3e19fc7eff4
  -baremetal-installer https://github.com/openshift/installer/commit/4f3d8ba657cb9447f065a4e48b078be6376593e1
  +azure-machine-controllers https://github.com/openshift/cluster-api-provider-azure/commit/1040454af736beeb89c33b1779f017898776c14f
  +baremetal-installer https://github.com/openshift/installer/commit/09d231228f974e00a8c18946add3a3ff32fbd2b6
  @@ -30 +30 @@
  -cluster-kube-apiserver-operator https://github.com/openshift/cluster-kube-apiserver-operator/commit/c62288e1490f98a93d7496452a4861b7e4dbfa50
  +cluster-kube-apiserver-operator https://github.com/openshift/cluster-kube-apiserver-operator/commit/61d60ee6a90388fc89c1e4c1c52fdb6590833d5c
  ...
  @@ -74,2 +74,2 @@
  -installer https://github.com/openshift/installer/commit/4f3d8ba657cb9447f065a4e48b078be6376593e1
  -installer-artifacts https://github.com/openshift/installer/commit/4f3d8ba657cb9447f065a4e48b078be6376593e1
  +installer https://github.com/openshift/installer/commit/09d231228f974e00a8c18946add3a3ff32fbd2b6
  +installer-artifacts https://github.com/openshift/installer/commit/09d231228f974e00a8c18946add3a3ff32fbd2b6
  @@ -88 +88 @@
  -kube-proxy https://github.com/openshift/sdn/commit/a7a08ff18baa4ce5b310ee03640eb95d588f481c
  +kube-proxy https://github.com/openshift/sdn/commit/e54a2925fb868dcbfb904ce9733f9954e560b56a
  @@ -96 +96 @@
  -machine-api-operator https://github.com/openshift/machine-api-operator/commit/edaf826469d651580ddab305962b6a7b5bd6d49a
  +machine-api-operator https://github.com/openshift/machine-api-operator/commit/df4ed38125e97739c5cfae3573676f6a981909cc
  ...
  @@ -132 +132 @@
  -sdn https://github.com/openshift/sdn/commit/a7a08ff18baa4ce5b310ee03640eb95d588f481c
  +sdn https://github.com/openshift/sdn/commit/e54a2925fb868dcbfb904ce9733f9954e560b56a
  ...

[1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-nightly-4.9-e2e-azure
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/1423023995355140096
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-azure/1423392049742221312

Comment 3 W. Trevor King 2021-08-07 18:09:29 UTC
No need for ART/QE on this.

Comment 4 Stephen Benjamin 2021-08-11 15:14:58 UTC
*** Bug 1990937 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.