1960780 – CI: failed to create PDB "service-test" the server could not find the requested resource

Bug 1960780 - CI: failed to create PDB "service-test" the server could not find the requested resource

Summary: CI: failed to create PDB "service-test" the server could not find the request...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	4.8.0
Assignee:	W. Trevor King
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-14 20:32 UTC by W. Trevor King
Modified:	2022-08-04 22:32 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 23:08:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift kubernetes pull 752	None	closed	Bug 1960780: UPSTREAM: <carry>: Use policyv1beta1	2021-05-19 05:21:23 UTC
Github	openshift origin pull 26168	None	closed	Bug 1960780: Bump k8s to pick up v1beta1 PDB jig	2021-06-14 18:45:47 UTC
Red Hat Product Errata	RHSA-2021:2438	None	None	None	2021-07-27 23:08:57 UTC

Description W. Trevor King 2021-05-14 20:32:45 UTC

Seen in three different update CI jobs from 4.7.11 to 4.8.0-fc.4 [1,2,3]:

disruption_tests: [sig-network-edge] Application behind service load balancer with PDB is not disrupted 
[sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]	1h13m22s
fail [github.com/openshift/origin/test/e2e/upgrade/service/service.go:115]: Unexpected error:
    <*errors.errorString | 0xc000bf70a0>: {
        s: "failed to create PDB \"service-test\" the server could not find the requested resource",
    }
    failed to create PDB "service-test" the server could not find the requested resource
occurred

Extremely common in 4.7 -> 4.8 update CI:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=failed+to+create+PDB.*the+server+could+not+find+the+requested+resource' | grep 'failures match' | sort
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-from-stable-4.7-from-stable-4.6-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 20 runs, 100% failed, 85% of failures match = 85% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 17 runs, 100% failed, 94% of failures match = 94% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 4 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-openstack-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 6 runs, 100% failed, 67% of failures match = 67% impact
pull-ci-openshift-ovn-kubernetes-master-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 12 runs, 100% failed, 83% of failures match = 83% impact
rehearse-18540-periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
rehearse-18540-periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
rehearse-18540-periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
rehearse-18540-periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-openstack-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
release-openshift-origin-installer-launch-azure (all) - 12 runs, 83% failed, 10% of failures match = 8% impact
release-openshift-origin-installer-launch-gcp (all) - 63 runs, 41% failed, 8% of failures match = 3% impact

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure/1393250382481723392
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1393269399615442944
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1393269438639247360

Comment 1 W. Trevor King 2021-05-14 20:33:40 UTC

Possibly the issue is that the 4.8 test suite is assuming that something is present which is not present on 4.7?

Comment 2 W. Trevor King 2021-05-14 20:44:14 UTC

TestGrid [1] shows this transitioning to perma-fail between [2,3].  Comparing the target versions:

$ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1391245505060671488/artifacts/release/artifacts/release-images-latest
$ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1391607897154129920/artifacts/release/artifacts/release-images-latest
$ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]'
$ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}")
--- /dev/fd/63  2021-05-14 13:38:59.849002076 -0700
+++ /dev/fd/62  2021-05-14 13:38:59.850002076 -0700
@@ -26 +26 @@
-cluster-etcd-operator https://github.com/openshift/cluster-etcd-operator/commit/b6530d132942cd84bec9e2a76a7386d4141cca78
+cluster-etcd-operator https://github.com/openshift/cluster-etcd-operator/commit/b54aaf90c1f0468730270163e8423ca23b27056c
@@ -35 +35 @@
-cluster-network-operator https://github.com/openshift/cluster-network-operator/commit/91af127c2d693adbc357ab14cf0318de44409a14
+cluster-network-operator https://github.com/openshift/cluster-network-operator/commit/103304d59bb26fdaadb0170f76c012775d4a979f
@@ -45 +45 @@
-console https://github.com/openshift/console/commit/f0b1fe1d368e50ccbf423de6165b9f870c0f06d5
+console https://github.com/openshift/console/commit/44c4fe0ea64befc9a9ebb54894bddba9a70b57a6
@@ -122 +122 @@
-ovirt-machine-controllers https://github.com/openshift/cluster-api-provider-ovirt/commit/2ac685fd451c03564072c873ca087b06a0934aab
+ovirt-machine-controllers https://github.com/openshift/cluster-api-provider-ovirt/commit/d6f563502a708f84489d629cf0b05212cb345c55
@@ -134 +134 @@
-tests https://github.com/openshift/origin/commit/2a813e180f73b3876b42bf04f27a5f66814560c2
+tests https://github.com/openshift/origin/commit/265b6ef959b8c8183ecda5aba10f7d437b87a9a9

Checking in origin:

$ git --no-pager log --oneline --first-parent 2a813e180f..265b6ef959b
265b6ef959 Merge pull request #26054 from soltysh/k8s-1.21

Ah, so yeah, lots of changes that came in there, and we broke this test-case (or maybe the new test-case logic is more robust and turning up implementation breakage we previously missed).

[1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1391245505060671488
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1391607897154129920

Comment 3 W. Trevor King 2021-05-14 21:22:46 UTC

Clayton suspects the PDB-creating fixture moved to v1 PDBs in 4.8, but 4.7 only supports v1beta1 PDBs.  Suggested fix is to try to create v1beta1 PDBs, falling back to v1 PDBs.  Or maybe the other way around.

Comment 5 W. Trevor King 2021-05-17 22:02:08 UTC

We really want this to give us a green signal for 4.7 -> 4.8 update CI.  But if for some reason it doesn't land in time, we can look more closely at those CI jobs to decide if this is the only failure mode we're seeing, and if so, I don't see a problem going GA without this fix.

Comment 7 W. Trevor King 2021-05-18 22:15:56 UTC

Still need the origin bump to vendor the new jig.

Comment 9 Hongan Li 2021-05-31 04:49:43 UTC

didn't see the failure in recent 4.7->4.8 upgrade CI jobs, see https://search.ci.openshift.org/?search=failed+to+create+PDB.*the+server+could+not+find+the+requested+resource&maxAge=336h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

so moving to verified.

Comment 12 errata-xmlrpc 2021-07-27 23:08:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.