1903821 – Unexpected behavior on IngressController "namespaceOwnership" configuration

Bug 1903821 - Unexpected behavior on IngressController "namespaceOwnership" configuration

Summary: Unexpected behavior on IngressController "namespaceOwnership" configuration

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Grant Spence
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-02 21:39 UTC by Will Gordon
Modified:	2022-12-19 15:47 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-12-19 15:47:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift router pull 233	0	None	closed	Bug 1903821: Pass stop channel to RouterControllers	2021-02-23 18:05:39 UTC
Github	openshift router pull 240	0	None	closed	Bug 1903821: contention: Contend over route ingress admission condition	2021-02-25 19:23:58 UTC

Description Will Gordon 2020-12-02 21:39:47 UTC

Description of problem:

When specifying a configuration of "namespaceOwnership: InterNamespaceAllowed", I am correctly able to create routes in multiple namespaces that have the same hostname, but separate paths.

However, when changing the configuration back to "namespaceOwnership: Strict", previously Admitted routes now return a 503 error despite the route continuing to report itself as Admitted.

-----------------------------

Version-Release number of selected component (if applicable):

I've encountered this bug on a 4.6.4 OSD/ROSA cluster

-----------------------------

How reproducible: 

Always

-----------------------------

Steps to Reproduce:

# Configure the IngressController to InterNamespaceAllowed
$ oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission":{"namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge
# Create an initial testing project
$ oc new-project route-admission-test1
# Deploy a sample application
$ oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/hello-openshift/hello-pod.json
# Expose the pod as a service
$ oc expose pod/hello-openshift
# Expose the service as a route with a well defined hostname and path
$ oc expose svc/hello-openshift --hostname hello.openshift.com --path /test1
# Test the path
$ curl http://test.apps.aic-rosa-nov18.nnws.p1.openshiftapps.com/test1 -H 'Host:hello.openshift.com'
> Hello OpenShift!

# Create a second testing project
$ oc new-project route-admission-test2
# Deploy the same sample application
$ oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/hello-openshift/hello-pod.json
# Expose the pod as a service
$ oc expose pod/hello-openshift
# Expose the service as a route with the same hostname and a different path
$ oc expose svc/hello-openshift --hostname hello.openshift.com --path /test2
# Test the path
$ curl http://test.apps.aic-rosa-nov18.nnws.p1.openshiftapps.com/test2 -H 'Host:hello.openshift.com'
> Hello OpenShift!

# Revert the IngressController configuration to enforce Strict
$ oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission":{"namespaceOwnership":"Strict"}}}' --type=merge

# Retest routes (it takes ~1 minute before the failure occurs)
$ curl http://test.apps.aic-rosa-nov18.nnws.p1.openshiftapps.com/test1 -H 'Host:hello.openshift.com'
> Hello OpenShift!
$ curl http://test.apps.aic-rosa-nov18.nnws.p1.openshiftapps.com/test2 -H 'Host:hello.openshift.com'
> 503 ERROR!!

-----------------------------

Actual results:

The second exposed route now returns a 503 error.
The second exposed route still reports as "Admitted"

spec:
  host: hello.openshift.com
  path: /test2
  port:
    targetPort: 8080
  to:
    kind: Service
    name: hello-openshift
    weight: 100
  wildcardPolicy: None
status:
  ingress:
  - conditions:
    - lastTransitionTime: "2020-12-02T21:35:18Z"
      status: "True"
      type: Admitted
    host: hello.openshift.com
    routerCanonicalHostname: apps.aic-rosa-nov18.nnws.p1.openshiftapps.com
    routerName: default
    wildcardPolicy: None

-----------------------------

Expected results:

*My* expectation of what should happen would be that the route would continue to be admitted, and this traffic should not be blocked. However it seems unclear what the expected outcome should be. If it continues to be Admitted, then the traffic should be accepted. If it's getting blocked, the route status should be updated to reflect this.

Comment 1 Stephen Greene 2020-12-03 21:03:57 UTC

Was able to reproduce on 4.7. Probably would make the most sense for the admitted route to continue serving traffic until it is updated (since then the route would not be re-admitted). Currently investigating a fix. Moving to assigned and adding upcoming sprint since this bug won't be resolved by tomorrow.

Comment 8 Hongan Li 2020-12-14 09:29:49 UTC

the PR was merged to 4.7.0-0.nightly-2020-12-11-233938, and tested with 4.7.0-0.nightly-2020-12-14-035110 but failed.

after switching back to "namespaceOwnership":"Strict", the second route is still shown as Admitted.

# oc -n openshift-ingress-operator get ingresscontroller/default -oyaml
<---snip--->
spec:
  replicas: 2
  routeAdmission:
    namespaceOwnership: Strict

# curl http://test.apps.ci-ln-tfbmifk-d5d6b.origin-ci-int-aws.dev.rhcloud.com/test2 -H 'Host: hello.openshift.com' -I
HTTP/1.0 503 Service Unavailable

# oc get route -oyaml
<---snip--->
  spec:
    host: hello.openshift.com
    path: /test2
    port:
      targetPort: 8080
    to:
      kind: Service
      name: hello-openshift
      weight: 100
    wildcardPolicy: None
  status:
    ingress:
    - conditions:
      - lastTransitionTime: "2020-12-14T09:05:04Z"
        status: "True"
        type: Admitted
      host: hello.openshift.com
      routerCanonicalHostname: apps.ci-ln-tfbmifk-d5d6b.origin-ci-int-aws.dev.rhcloud.com
      routerName: default
      wildcardPolicy: None

# oc get route
NAME              HOST/PORT             PATH     SERVICES          PORT   TERMINATION   WILDCARD
hello-openshift   hello.openshift.com   /test2   hello-openshift   8080                 None

note: it should reports "HostAlreadyClaimed" if never switch "namespaceOwnership" setting.

Comment 13 Hongan Li 2021-03-04 09:14:32 UTC

tested with 4.8.0-0.nightly-2021-03-04-014703 but still failed.

after switching back to "namespaceOwnership":"Strict", the second route is still shown as Admitted.

$ oc version
Client Version: 4.8.0-0.nightly-2021-03-04-014703
Server Version: 4.8.0-0.nightly-2021-03-04-014703
Kubernetes Version: v1.20.0+2ce2be0

# oc -n openshift-ingress-operator get ingresscontroller/default -oyaml
<---snip--->
spec:
  replicas: 2
  routeAdmission:
    namespaceOwnership: Strict

$ curl http://test.apps.ci-ln-p7hsnt2-f76d1.origin-ci-int-gce.dev.openshift.com/test2 -H 'Host:hello.openshift.com' -I
HTTP/1.0 503 Service Unavailable

$ oc get route -n test1
NAME              HOST/PORT             PATH     SERVICES          PORT   TERMINATION   WILDCARD
hello-openshift   hello.openshift.com   /test1   hello-openshift   8080                 None

$ oc get route -n test2
NAME              HOST/PORT             PATH     SERVICES          PORT   TERMINATION   WILDCARD
hello-openshift   hello.openshift.com   /test2   hello-openshift   8080                 None


$ oc -n test2 get route hello-openshift -oyaml
<---snip--->
spec:
  host: hello.openshift.com
  path: /test2
  port:
    targetPort: 8080
  to:
    kind: Service
    name: hello-openshift
    weight: 100
  wildcardPolicy: None
status:
  ingress:
  - conditions:
    - lastTransitionTime: "2021-03-04T08:53:16Z"
      status: "True"
      type: Admitted
    host: hello.openshift.com
    routerCanonicalHostname: apps.ci-ln-p7hsnt2-f76d1.origin-ci-int-gce.dev.openshift.com
    routerName: default
    wildcardPolicy: None

Comment 14 Stephen Greene 2021-03-04 15:52:33 UTC

Hongan, could you try verifying this BZ again? I think you may need to wait a few seconds before checking the route admission status in namespace test2, since the admission condition may not be updated immediately (this is expected as the 2 running router replicas will contend over the route's admission status: Once the stale router replicas are gone, the admission status should not change again).

The follow works for me:


sgreene@snowplow:~]$ oc version
Client Version: 4.8.0-0.ci-2021-03-02-212242
Server Version: 4.8.0-0.ci-2021-03-04-005023

<initial reproducer setup>
<Set ingresscontroller to strict namespaceownership>
<brief pause>

[sgreene@snowplow:~]$ curl test.apps.ci-ln-h0mvpkt-f76d1.origin-ci-int-gce.dev.openshift.com/test2 -H "host:hello.openshift.com" -I
HTTP/1.0 503 Service Unavailable


[sgreene@snowplow:~]$ oc get route -n route-admission-test2 
NAME              HOST/PORT            PATH     SERVICES          PORT   TERMINATION   WILDCARD
hello-openshift   HostAlreadyClaimed   /test2   hello-openshift   8080                 None


If you still see that the route in namespace test2 is admitted, could you check again after a brief moment (the route could be toggling between admitted and not admitted until the old replica is completely spun down. This is unavoidable).

Moving back to modified since I have been able to verify these changes. Let me know if there's any questions / issues. Thanks!

Comment 16 Hongan Li 2021-03-16 00:57:38 UTC

can reproduce it if using 1 replica for the default ingress controller, so moving back to assigned.

Comment 17 Miciah Dashiel Butler Masters 2021-03-16 18:02:54 UTC

As this there is a simple workaround for the issue, we're lowering the priority of this BZ for now and will focus on it after 4.8 feature work winds down.

Comment 19 mfisher 2022-12-19 15:47:38 UTC

This issue is stale and has been closed because it has been open 90 days or more with no noted activity/comments in the last 60 days.  If this issue is crucial and still needs resolution, please open a new jira issue and the engineering team will triage and prioritize accordingly.

Note You need to log in before you can comment on or make changes to this bug.