Bug 1822286
Summary: | [sig-network] IngressClass [Feature:Ingress] should prevent Ingress creation if more than 1 IngressClass marked as default [Suite:openshift/conformance/parallel] [Suite:k8s] | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Dan Williams <dcbw> |
Component: | Networking | Assignee: | Daneyon Hansen <dhansen> |
Networking sub component: | router | QA Contact: | Arvind iyengar <aiyengar> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aiyengar, aos-bugs, bbennett, dhansen |
Version: | 4.5 | ||
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-13 17:26:16 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Dan Williams
2020-04-08 16:45:25 UTC
Ran this test 500 times against a 4.5 cluster with no flaking. Another data point: the admission controller plugin that enforces the constraint verified in the test is compiled into the apiserver and I'm not sure it's configurable. Not clear yet under what conditions the plugin could somehow be disabled or misbehaving. Upon closer inspection of the admission code[1], it does seem at least _possible_ that an informer cache consistency issue could result in the admission plugin skipping defaulting of the Ingress if the IngressClass resources were persisted and the list call didn't return them (or returned just one of them). [1] https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/plugin/pkg/admission/defaultingressclass/admission.go#L150 More digging is needed, but I'm suspicious of the design of that admission plugin. Stale cache data can cause it to make the wrong admission decision, and there's no controller which can correct for mistakes. It seems inherently racy with IngressClass persistence. Maybe the "only one default IngressClass" invariant itself needs enforced during admission so that this plugin doesn't need to make the racy check itself. Test [1] should be run [Serial] since it causes other ingresses that do not specify an ingressclass [2] to use the default ingressclass created by [1] when [3] runs in parallel. [1] https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/network/ingressclass.go#L69-L94 [2] https://github.com/openshift/origin/blob/master/test/extended/testdata/router/ingress.yaml [3] https://github.com/openshift/origin/blob/master/test/extended/router/router.go#L70-L100 Checked the latest CI test with the serialized patch and didn't see the same flakiness anymore: ------ https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#release-openshift-ocp-installer-e2e-gcp-serial-4.5 https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#release-openshift-ocp-installer-e2e-openstack-serial-4.5 https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#release-openshift-ocp-installer-e2e-azure-serial-4.5 ------ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |