Bug 1822286

Summary: [sig-network] IngressClass [Feature:Ingress] should prevent Ingress creation if more than 1 IngressClass marked as default [Suite:openshift/conformance/parallel] [Suite:k8s]
Product: OpenShift Container Platform Reporter: Dan Williams <dcbw>
Component: NetworkingAssignee: Daneyon Hansen <dhansen>
Networking sub component: router QA Contact: Arvind iyengar <aiyengar>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aiyengar, aos-bugs, bbennett, dhansen
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:26:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Williams 2020-04-08 16:45:25 UTC
PR is a test skip for ovn-kubernetes, which is not even used in this e2e-gcp job, so is unrelated to the failure.

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/24833/pull-ci-openshift-origin-master-e2e-gcp/7122

Comment 1 Dan Mace 2020-04-16 18:18:27 UTC
Ran this test 500 times against a 4.5 cluster with no flaking. Another data point: the admission controller plugin that enforces the constraint verified in the test is compiled into the apiserver and I'm not sure it's configurable. Not clear yet under what conditions the plugin could somehow be disabled or misbehaving.

Comment 2 Dan Mace 2020-04-16 18:25:59 UTC
Upon closer inspection of the admission code[1], it does seem at least _possible_ that an informer cache consistency issue could result in the admission plugin skipping defaulting of the Ingress if the IngressClass resources were persisted and the list call didn't return them (or returned just one of them).

[1] https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/plugin/pkg/admission/defaultingressclass/admission.go#L150

Comment 3 Dan Mace 2020-04-17 14:11:36 UTC
More digging is needed, but I'm suspicious of the design of that admission plugin. Stale cache data can cause it to make the wrong admission decision, and there's no controller which can correct for mistakes. It seems inherently racy with IngressClass persistence. Maybe the "only one default IngressClass" invariant itself needs enforced during admission so that this plugin doesn't need to make the racy check itself.

Comment 4 Daneyon Hansen 2020-04-28 20:08:42 UTC
Test [1] should be run [Serial] since it causes other ingresses that do not specify an ingressclass [2] to use the default ingressclass created by [1] when [3] runs in parallel.

[1] https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/network/ingressclass.go#L69-L94
[2] https://github.com/openshift/origin/blob/master/test/extended/testdata/router/ingress.yaml
[3] https://github.com/openshift/origin/blob/master/test/extended/router/router.go#L70-L100

Comment 8 errata-xmlrpc 2020-07-13 17:26:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409