Seen in three different update CI jobs from 4.7.11 to 4.8.0-fc.4 [1,2,3]: disruption_tests: [sig-network-edge] Application behind service load balancer with PDB is not disrupted [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial] 1h13m22s fail [github.com/openshift/origin/test/e2e/upgrade/service/service.go:115]: Unexpected error: <*errors.errorString | 0xc000bf70a0>: { s: "failed to create PDB \"service-test\" the server could not find the requested resource", } failed to create PDB "service-test" the server could not find the requested resource occurred Extremely common in 4.7 -> 4.8 update CI: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=failed+to+create+PDB.*the+server+could+not+find+the+requested+resource' | grep 'failures match' | sort periodic-ci-openshift-release-master-ci-4.8-upgrade-from-from-stable-4.7-from-stable-4.6-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 20 runs, 100% failed, 85% of failures match = 85% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 17 runs, 100% failed, 94% of failures match = 94% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 4 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-openstack-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 6 runs, 100% failed, 67% of failures match = 67% impact pull-ci-openshift-ovn-kubernetes-master-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 12 runs, 100% failed, 83% of failures match = 83% impact rehearse-18540-periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact rehearse-18540-periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact rehearse-18540-periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact rehearse-18540-periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-openstack-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact release-openshift-origin-installer-launch-azure (all) - 12 runs, 83% failed, 10% of failures match = 8% impact release-openshift-origin-installer-launch-gcp (all) - 63 runs, 41% failed, 8% of failures match = 3% impact [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure/1393250382481723392 [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1393269399615442944 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1393269438639247360
Possibly the issue is that the 4.8 test suite is assuming that something is present which is not present on 4.7?
TestGrid [1] shows this transitioning to perma-fail between [2,3]. Comparing the target versions: $ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1391245505060671488/artifacts/release/artifacts/release-images-latest $ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1391607897154129920/artifacts/release/artifacts/release-images-latest $ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]' $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") --- /dev/fd/63 2021-05-14 13:38:59.849002076 -0700 +++ /dev/fd/62 2021-05-14 13:38:59.850002076 -0700 @@ -26 +26 @@ -cluster-etcd-operator https://github.com/openshift/cluster-etcd-operator/commit/b6530d132942cd84bec9e2a76a7386d4141cca78 +cluster-etcd-operator https://github.com/openshift/cluster-etcd-operator/commit/b54aaf90c1f0468730270163e8423ca23b27056c @@ -35 +35 @@ -cluster-network-operator https://github.com/openshift/cluster-network-operator/commit/91af127c2d693adbc357ab14cf0318de44409a14 +cluster-network-operator https://github.com/openshift/cluster-network-operator/commit/103304d59bb26fdaadb0170f76c012775d4a979f @@ -45 +45 @@ -console https://github.com/openshift/console/commit/f0b1fe1d368e50ccbf423de6165b9f870c0f06d5 +console https://github.com/openshift/console/commit/44c4fe0ea64befc9a9ebb54894bddba9a70b57a6 @@ -122 +122 @@ -ovirt-machine-controllers https://github.com/openshift/cluster-api-provider-ovirt/commit/2ac685fd451c03564072c873ca087b06a0934aab +ovirt-machine-controllers https://github.com/openshift/cluster-api-provider-ovirt/commit/d6f563502a708f84489d629cf0b05212cb345c55 @@ -134 +134 @@ -tests https://github.com/openshift/origin/commit/2a813e180f73b3876b42bf04f27a5f66814560c2 +tests https://github.com/openshift/origin/commit/265b6ef959b8c8183ecda5aba10f7d437b87a9a9 Checking in origin: $ git --no-pager log --oneline --first-parent 2a813e180f..265b6ef959b 265b6ef959 Merge pull request #26054 from soltysh/k8s-1.21 Ah, so yeah, lots of changes that came in there, and we broke this test-case (or maybe the new test-case logic is more robust and turning up implementation breakage we previously missed). [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1391245505060671488 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1391607897154129920
Clayton suspects the PDB-creating fixture moved to v1 PDBs in 4.8, but 4.7 only supports v1beta1 PDBs. Suggested fix is to try to create v1beta1 PDBs, falling back to v1 PDBs. Or maybe the other way around.
We really want this to give us a green signal for 4.7 -> 4.8 update CI. But if for some reason it doesn't land in time, we can look more closely at those CI jobs to decide if this is the only failure mode we're seeing, and if so, I don't see a problem going GA without this fix.
Still need the origin bump to vendor the new jig.
didn't see the failure in recent 4.7->4.8 upgrade CI jobs, see https://search.ci.openshift.org/?search=failed+to+create+PDB.*the+server+could+not+find+the+requested+resource&maxAge=336h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job so moving to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438