Bug 1960780 - CI: failed to create PDB "service-test" the server could not find the requested resource
Summary: CI: failed to create PDB "service-test" the server could not find the request...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.8.0
Assignee: W. Trevor King
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-14 20:32 UTC by W. Trevor King
Modified: 2021-07-27 23:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:08:44 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 752 0 None closed Bug 1960780: UPSTREAM: <carry>: Use policyv1beta1 2021-05-19 05:21:23 UTC
Github openshift origin pull 26168 0 None closed Bug 1960780: Bump k8s to pick up v1beta1 PDB jig 2021-06-14 18:45:47 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:08:57 UTC

Description W. Trevor King 2021-05-14 20:32:45 UTC
Seen in three different update CI jobs from 4.7.11 to 4.8.0-fc.4 [1,2,3]:

disruption_tests: [sig-network-edge] Application behind service load balancer with PDB is not disrupted 
[sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]	1h13m22s
fail [github.com/openshift/origin/test/e2e/upgrade/service/service.go:115]: Unexpected error:
    <*errors.errorString | 0xc000bf70a0>: {
        s: "failed to create PDB \"service-test\" the server could not find the requested resource",
    }
    failed to create PDB "service-test" the server could not find the requested resource
occurred

Extremely common in 4.7 -> 4.8 update CI:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=failed+to+create+PDB.*the+server+could+not+find+the+requested+resource' | grep 'failures match' | sort
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-from-stable-4.7-from-stable-4.6-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 20 runs, 100% failed, 85% of failures match = 85% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 17 runs, 100% failed, 94% of failures match = 94% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 4 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-openstack-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 6 runs, 100% failed, 67% of failures match = 67% impact
pull-ci-openshift-ovn-kubernetes-master-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 12 runs, 100% failed, 83% of failures match = 83% impact
rehearse-18540-periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
rehearse-18540-periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
rehearse-18540-periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
rehearse-18540-periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-openstack-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
release-openshift-origin-installer-launch-azure (all) - 12 runs, 83% failed, 10% of failures match = 8% impact
release-openshift-origin-installer-launch-gcp (all) - 63 runs, 41% failed, 8% of failures match = 3% impact

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure/1393250382481723392
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1393269399615442944
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1393269438639247360

Comment 1 W. Trevor King 2021-05-14 20:33:40 UTC
Possibly the issue is that the 4.8 test suite is assuming that something is present which is not present on 4.7?

Comment 2 W. Trevor King 2021-05-14 20:44:14 UTC
TestGrid [1] shows this transitioning to perma-fail between [2,3].  Comparing the target versions:

$ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1391245505060671488/artifacts/release/artifacts/release-images-latest
$ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1391607897154129920/artifacts/release/artifacts/release-images-latest
$ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]'
$ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}")
--- /dev/fd/63  2021-05-14 13:38:59.849002076 -0700
+++ /dev/fd/62  2021-05-14 13:38:59.850002076 -0700
@@ -26 +26 @@
-cluster-etcd-operator https://github.com/openshift/cluster-etcd-operator/commit/b6530d132942cd84bec9e2a76a7386d4141cca78
+cluster-etcd-operator https://github.com/openshift/cluster-etcd-operator/commit/b54aaf90c1f0468730270163e8423ca23b27056c
@@ -35 +35 @@
-cluster-network-operator https://github.com/openshift/cluster-network-operator/commit/91af127c2d693adbc357ab14cf0318de44409a14
+cluster-network-operator https://github.com/openshift/cluster-network-operator/commit/103304d59bb26fdaadb0170f76c012775d4a979f
@@ -45 +45 @@
-console https://github.com/openshift/console/commit/f0b1fe1d368e50ccbf423de6165b9f870c0f06d5
+console https://github.com/openshift/console/commit/44c4fe0ea64befc9a9ebb54894bddba9a70b57a6
@@ -122 +122 @@
-ovirt-machine-controllers https://github.com/openshift/cluster-api-provider-ovirt/commit/2ac685fd451c03564072c873ca087b06a0934aab
+ovirt-machine-controllers https://github.com/openshift/cluster-api-provider-ovirt/commit/d6f563502a708f84489d629cf0b05212cb345c55
@@ -134 +134 @@
-tests https://github.com/openshift/origin/commit/2a813e180f73b3876b42bf04f27a5f66814560c2
+tests https://github.com/openshift/origin/commit/265b6ef959b8c8183ecda5aba10f7d437b87a9a9

Checking in origin:

$ git --no-pager log --oneline --first-parent 2a813e180f..265b6ef959b
265b6ef959 Merge pull request #26054 from soltysh/k8s-1.21

Ah, so yeah, lots of changes that came in there, and we broke this test-case (or maybe the new test-case logic is more robust and turning up implementation breakage we previously missed).

[1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1391245505060671488
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1391607897154129920

Comment 3 W. Trevor King 2021-05-14 21:22:46 UTC
Clayton suspects the PDB-creating fixture moved to v1 PDBs in 4.8, but 4.7 only supports v1beta1 PDBs.  Suggested fix is to try to create v1beta1 PDBs, falling back to v1 PDBs.  Or maybe the other way around.

Comment 5 W. Trevor King 2021-05-17 22:02:08 UTC
We really want this to give us a green signal for 4.7 -> 4.8 update CI.  But if for some reason it doesn't land in time, we can look more closely at those CI jobs to decide if this is the only failure mode we're seeing, and if so, I don't see a problem going GA without this fix.

Comment 7 W. Trevor King 2021-05-18 22:15:56 UTC
Still need the origin bump to vendor the new jig.

Comment 12 errata-xmlrpc 2021-07-27 23:08:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.