Created attachment 1676479 [details] All the steps to reproduce the issue with two scenarios Created attachment 1676479 [details] All the steps to reproduce the issue with two scenarios Description of problem: After creating a route another route with the same hostname and path, the "HostAlreadyClaimed" comes up. Even on the deletion of the older route, "HostAlreadyClaimed" comes up for the route with the wrong message. Version-Release number of selected component (if applicable): v3.11.188 v3.11.153 v3.11.117 How reproducible: Every time, I am providing the exact steps. Steps to Reproduce: 1. Create a new project with a sample ruby application. # oc new-project path-based-routes # oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git # oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ruby-ex ClusterIP 172.30.216.93 <none> 8080/TCP 1m 2. Create one route without any path and one route with a path. # oc create route edge --service=ruby-ex --hostname=base.example.com # oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD ruby-ex base.example.com ruby-ex 8080-tcp edge None # oc create route edge path1 --service=ruby-ex --hostname=base.example.com --path=/path route.route.openshift.io/path1 created # oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD path1 base.example.com /path ruby-ex 8080-tcp edge None ruby-ex base.example.com ruby-ex 8080-tcp edge None 3. Now, create a route with the same configuration as to that of route path1. # oc create route edge path2 --service=ruby-ex --hostname=base.example.com --path=/path route.route.openshift.io/path2 created # oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD path1 base.example.com /path ruby-ex 8080-tcp edge None path2 HostAlreadyClaimed /path ruby-ex 8080-tcp edge None ruby-ex base.example.com ruby-ex 8080-tcp edge None 4. The description of route path2 reports that the route ruby-ex already exposes base.example.com and is older instead of path1 route. # oc describe route path2 Name: path2 Namespace: path-based-routes Created: About a minute ago Labels: app=ruby-ex Annotations: <none> Requested Host: base.example.com rejected by router router: HostAlreadyClaimed (about a minute ago) route ruby-ex already exposes base.example.com and is older Path: /path TLS Termination: edge Insecure Policy: <none> Endpoint Port: 8080-tcp Service: ruby-ex Weight: 100 (100%) Endpoints: 10.130.0.7:8080 5. Even after the deletion of path1 route, the path2 router reports HostAlreadyClaimed with the same description. # oc delete route path1 route.route.openshift.io "path1" deleted # oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD path2 HostAlreadyClaimed /path ruby-ex 8080-tcp edge None ruby-ex base.example.com ruby-ex 8080-tcp edge None # oc describe route path2 Name: path2 Namespace: path-based-routes Created: 3 minutes ago Labels: app=ruby-ex Annotations: <none> Requested Host: base.example.com rejected by router router: HostAlreadyClaimed (3 minutes ago) route ruby-ex already exposes base.example.com and is older Path: /path TLS Termination: edge Insecure Policy: <none> Endpoint Port: 8080-tcp Service: ruby-ex Weight: 100 (100%) Endpoints: 10.130.0.7:8080 Actual results: After the deletion of one route out of the two which is having the same hostname and port still shows "HostAlreadyClaimed". Expected results: The route shouldn't show "HostAlreadyClaimed". Additional info: The issue gets fixed by deleting the router pod so new pod comes up. I found this bug which was opened for the same issue but it was closed considering not a bug as the wrong router image was being used but I am able to reproduce the issue on a freshly install OCP v3.11.188 https://bugzilla.redhat.com/show_bug.cgi?id=1748489
Routes that are rejected won't be automatically re-processed for validity as you're wanting. In 3.11, we provide a script which clears route status[1] which can be used to cause the routes to be re-processed and which should achieve the desired effect. Although I do believe the router should handle this automatically, we won't be fixing it in 3.x. [1] https://github.com/openshift/origin/blob/release-3.11/images/router/clear-route-status.sh
Setting the target release so that it gets attention. Once the issue has been identified, we can work out where it gets backported to.
Just a quick update. Here's a self contained minimal reproducer script tested on a 4.5 cluster: ===== BEGIN test.sh ===== #!/bin/bash oc apply --filename - <<EOF apiVersion: route.openshift.io/v1 kind: Route metadata: namespace: default name: bz1821095-a spec: host: a.example.com to: kind: Service name: kubernetes port: targetPort: https EOF sleep 2 oc apply --filename - <<EOF apiVersion: route.openshift.io/v1 kind: Route metadata: namespace: default name: bz1821095-b spec: host: a.example.com path: /foo to: kind: Service name: kubernetes port: targetPort: https EOF sleep 2 oc apply --filename - <<EOF apiVersion: route.openshift.io/v1 kind: Route metadata: namespace: default name: bz1821095-c spec: host: a.example.com path: /foo to: kind: Service name: kubernetes port: targetPort: https EOF sleep 2 oc get --namespace default routes/bz1821095-c -o yaml oc delete --namespace default routes/bz1821095-b sleep 5 oc get --namespace default routes/bz1821095-c -o yaml ===== END test.sh ===== Observe that status isn't automatically recomputed. I tried: * Using `oc patch` to directly set status to `null` or `{}`. No effect. Seems suspiciously similar to an upstream bug around status subresource patch handling[1]. * Fixing the old v3 script[2] which issues a direct PUT to the apiserver. This works. * Restarting router pods. This causes the route status to be recomputed. Two things stand out at me: 1. The patch behavior is concerning. Note the route resource itself is not CRD-based[3]. It's possible create works but patch/update is broken somehow. 2. The router controller should be re-admitting automatically in response to the delete event. There could be a bug there, or the controller could be updating and silently failing for reasons related to (1). [1] https://github.com/kubernetes/kubernetes/issues/67455 [2] https://raw.githubusercontent.com/openshift/origin/release-3.11/images/router/clear-route-status.sh - the problem with the old script is invalid/outdated API paths — use /apis/route.openshift.io/v1/namespaces/${namespace}/routes/${route_name} throughout [3] https://github.com/openshift/openshift-apiserver/blob/master/pkg/api/legacy/route.go
Created attachment 1679155 [details] clear-route-status.sh updated for 4.x Here's a version of clear-route-status.sh updated for 4.x
I believe I've identified a very, very long-standing bug in the router that, when fixed, should cause previously rejected routes to be considered for admission in response to delete events. I'm prototyping a fix in https://github.com/openshift/router/pull/122.
Separately, I still have reason to suspect a related but separate problem whereby route status (and possibly other) updates aren't causing rejected routes to be reconsidered for promotion. The delete handling fix in https://github.com/openshift/router/pull/122 is about getting the behavior correct at the time deletes occur, which should solve the root problem going forward. The possible separate problem with update handling might be useful to run down to help get existing routes fixed up without having to restart the router. Although I was able to repair routes using direct PUT (https://bugzilla.redhat.com/show_bug.cgi?id=1821095#c10), I'd like to understand why the regular patch API is insufficient.
(In reply to Dan Mace from comment #20) > Separately, I still have reason to suspect a related but separate problem > whereby route status (and possibly other) updates aren't causing rejected > routes to be reconsidered for promotion. The delete handling fix in > https://github.com/openshift/router/pull/122 is about getting the behavior > correct at the time deletes occur, which should solve the root problem going > forward. The possible separate problem with update handling might be useful > to run down to help get existing routes fixed up without having to restart > the router. Although I was able to repair routes using direct PUT > (https://bugzilla.redhat.com/show_bug.cgi?id=1821095#c10), I'd like to > understand why the regular patch API is insufficient. And one more note as regards to updates. Given: 1. Restarting the routers during the upgrade to get the delete fix will repair existing route status 2. The direct PUT workaround still works post-restart The status updating bug is probably less urgent and should be split off into a new bug. Once I do a little more testing to more thoroughly describe the problem I can do that and we can hopefully call https://github.com/openshift/router/pull/122 a fix for this bug.
Fix is posted and awaiting review.
Moving out to 4.6.
(In reply to Andrew McDermott from comment #23) > Moving out to 4.6. Will it be getting backported to 3.11?
The fix has merged and will be in 4.5. We'll need to have a discussion about backports.
The PR merge made into "4.5.0-0.nightly-2020-05-20-175547" release. With this version, we see that the routes are getting promoted without any additional interventions: * Creation of conflicting routes: ----- $ oc create route edge --service=service-unsecure --hostname=unsecure.example.com route.route.openshift.io/service-unsecure created $ oc create route edge path1 --service=service-unsecure --hostname=unsecure.example.com --path=/path route.route.openshift.io/path1 created $ oc get route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD path1 unsecure.example.com ... 1 more /path service-unsecure http edge None service-unsecure unsecure.example.com ... 1 more service-unsecure http edge None $ oc create route edge path2 --service=service-unsecure --hostname=unsecure.example.com --path=/path route.route.openshift.io/path2 created $ oc get route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD path1 unsecure.example.com ... 1 more /path service-unsecure http edge None path2 HostAlreadyClaimed /path service-unsecure http edge None service-unsecure unsecure.example.com ... 1 more service-unsecure http edge None ----- * The second "path2" route gets prompted and admitted: ------ $ oc delete route path1 route.route.openshift.io "path1" deleted $ oc get route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD path2 unsecure.example.com ... 1 more /path service-unsecure http edge None service-unsecure unsecure.example.com ... 1 more service-unsecure http edge None ------
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409
A backport of the fix was verified in v3.11.492 per bug 1884421, comment 10.