Bug 1861189
Summary: | [sig-apps] DisruptionController should block an eviction until the PDB is updated to allow it | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
Component: | kube-apiserver | Assignee: | Stefan Schimanski <sttts> |
Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.6 | CC: | amcdermo, aos-bugs, danili, jokerman, kewang, mfojtik, sjenning, wking, xxia |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | non-multi-arch | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:17:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
W. Trevor King
2020-07-28 02:57:29 UTC
History in [1] suggests the test became flaky between [2] and [3]. It's not a 100% failure rate, so it's possible [2] squeaked by with the broken code (or whatever the trigger is), but if that's a real bracket the suspects are: $ diff -U0 <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.6/1286818801559539712/artifacts/release-images-latest/release-images-latest | jq -r '[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]') <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.6/1286844539994116096/artifacts/release-images-latest/release-images-latest | jq -r '[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]') --- /dev/fd/63 2020-07-27 20:42:39.654779459 -0700 +++ /dev/fd/62 2020-07-27 20:42:39.654779459 -0700 @@ -4 +4 @@ -baremetal-installer https://github.com/openshift/installer/commit/d795e477966872804f20f89ca4a8477b1e23b596 +baremetal-installer https://github.com/openshift/installer/commit/7287d88d35e985b72747ed64b44907a827fb3cad @@ -27 +27 @@ -cluster-network-operator https://github.com/openshift/cluster-network-operator/commit/c9aefce9eb7510f80f26347ccd49c91301fc75b4 +cluster-network-operator https://github.com/openshift/cluster-network-operator/commit/7303c6858c6065a4ca4c3b6d8b6ed996af7d31ee @@ -36 +36 @@ -cluster-version-operator https://github.com/openshift/cluster-version-operator/commit/b658b4258dbecc74eb3b997806e95bb65181b274 +cluster-version-operator https://github.com/openshift/cluster-version-operator/commit/a49fef5c66c6b0707c54fd93f84d2f51d3d28aca @@ -38 +38 @@ -console https://github.com/openshift/console/commit/f7034541b5f435371d7f8174599e6e330b1bb1ff +console https://github.com/openshift/console/commit/60f367edb6a71ed2242784186665e7c957568c5e @@ -50 +50 @@ -hyperkube https://github.com/openshift/kubernetes/commit/9ba0c166caed682a678f0cf56be1f7aeeb339fc1 +hyperkube https://github.com/openshift/kubernetes/commit/53f1b9d6f8de259644c05a25aa7ac8c6a67258e2 @@ -52,2 +52,2 @@ -installer https://github.com/openshift/installer/commit/d795e477966872804f20f89ca4a8477b1e23b596 -installer-artifacts https://github.com/openshift/installer/commit/d795e477966872804f20f89ca4a8477b1e23b596 +installer https://github.com/openshift/installer/commit/7287d88d35e985b72747ed64b44907a827fb3cad +installer-artifacts https://github.com/openshift/installer/commit/7287d88d35e985b72747ed64b44907a827fb3cad @@ -76 +76 @@ -machine-config-operator https://github.com/openshift/machine-config-operator/commit/be70bfe842d7d2a996eeef3bd4c55e12a02b5a86 +machine-config-operator https://github.com/openshift/machine-config-operator/commit/ab2673986646c62bc6599931d22f050ffd5871db I suspect hyperkube: $ git --no-pager log --first-parent --oneline 9ba0c166cae..53f1b9d6f8d 53f1b9d6f8d (openshift/release-4.7, openshift/release-4.6, openshift/master) Merge pull request #166 from marun/rebase-1.19 Not clear to me if the change is an issue on the test-suite side, or the kubelet side, or what. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-blocking#release-openshift-ocp-installer-e2e-aws-4.6 [2]: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.6/1286818801559539712 [3]: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.6/1286844539994116096 Dan Li, it does not have multi-arch implications. ppc64le was just one of many tests suites that were flaking. Don't imagine this is a Node team bug, but I'll try to figure out where it should go. Current theory is this flake came in on the rebase right Trevor? > Current theory is this flake came in on the rebase right Trevor?
Yup. I just assigned to the node team because the kubelet (I think?) comes out of openshift/kubernetes now, and "PDB" seemed like a node-touching thing. Could also be on the API-server/controller side of openshift/kubernetes output.
Looks like there might be some skew between openshift/kubernetes and origin's vendored kube atm. https://github.com/openshift/origin/pull/25314 has not yet merged to sync them. If the e2e tests run out of the vendored kube, there have been changes upstream to PDBs both in terms of code and e2e test. https://github.com/kubernetes/kubernetes/pull/91342 (change to code) https://github.com/kubernetes/kubernetes/pull/92991 (e2e fix) Sending to apiserver since it handles the eviction API and PDB enforcement. Recovering bug state after the PR got green-buttoned [1]. [1]: https://github.com/openshift/origin/pull/25335#event-3596734606 Run the following command line to search test-runs in the past 7 days, the failures match rate with a marked downward trend, there is no 100% rate. $ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=DisruptionController%20should%20block%20an%20eviction%20until%20the%20PDB%20is%20updated%20to%20allow%20it&maxAge=168h' | grep 'failures match' | sort endurance-e2e-aws-4.4 - 5 runs, 100% failed, 20% of failures match osde2e-stage-aws-conformance-default - 7 runs, 43% failed, 67% of failures match promote-release-openshift-machine-os-content-e2e-aws-4.6 - 520 runs, 5% failed, 4% of failures match promote-release-openshift-okd-machine-os-content-e2e-aws-4.6 - 70 runs, 100% failed, 1% of failures match pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-single - 26 runs, 69% failed, 6% of failures match pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn - 95 runs, 98% failed, 1% of failures match pull-ci-openshift-cluster-network-operator-master-e2e-ovn-hybrid-step-registry - 99 runs, 99% failed, 2% of failures match pull-ci-openshift-cluster-network-operator-master-e2e-ovn-step-registry - 91 runs, 99% failed, 1% of failures match pull-ci-openshift-cluster-node-tuning-operator-master-e2e-aws - 4 runs, 75% failed, 33% of failures match pull-ci-openshift-installer-master-e2e-aws - 82 runs, 55% failed, 2% of failures match pull-ci-openshift-installer-master-e2e-aws-fips - 107 runs, 90% failed, 1% of failures match pull-ci-openshift-installer-release-4.5-e2e-vsphere - 3 runs, 100% failed, 33% of failures match pull-ci-openshift-kni-cnf-features-deploy-master-e2e-gcp-origin - 9 runs, 100% failed, 11% of failures match pull-ci-openshift-kubernetes-master-e2e-aws-fips - 67 runs, 94% failed, 2% of failures match pull-ci-openshift-machine-config-operator-master-e2e-ovn-step-registry - 102 runs, 85% failed, 3% of failures match pull-ci-openshift-origin-master-e2e-aws-fips - 248 runs, 90% failed, 3% of failures match pull-ci-openshift-origin-master-e2e-gcp - 89 runs, 57% failed, 10% of failures match pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn - 18 runs, 100% failed, 6% of failures match pull-ci-openshift-ovn-kubernetes-master-e2e-gcp-ovn - 20 runs, 100% failed, 5% of failures match pull-ci-openshift-router-master-e2e - 8 runs, 63% failed, 20% of failures match release-openshift-ocp-installer-e2e-aws-4.6 - 97 runs, 60% failed, 3% of failures match release-openshift-ocp-installer-e2e-azure-4.6 - 50 runs, 58% failed, 3% of failures match release-openshift-ocp-installer-e2e-azure-ovn-4.6 - 50 runs, 84% failed, 2% of failures match release-openshift-ocp-installer-e2e-gcp-ovn-4.6 - 49 runs, 96% failed, 2% of failures match release-openshift-ocp-installer-e2e-openstack-4.4 - 22 runs, 100% failed, 5% of failures match The fix works as expected, move the bug Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |