Bug 1821095 - Wrong message error displayed when creating a route with path based [NEEDINFO]
Summary: Wrong message error displayed when creating a route with path based
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Miciah Dashiel Butler Masters
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-06 02:01 UTC by aygarg
Modified: 2020-08-05 11:44 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The router's enforcement of the constraint that each Route have a unique host had an error in the logic for handling the deletion of a Route. Consequence: The router was not promoting inactive Routes when a conflicting Route was deleted. Fix: When a Route is deleted, the router now reprocesses all inactive Routes. Result: Deleting a Route now causes inactive routes that no longer have a conflict to be activated.
Clone Of:
Environment:
Last Closed: 2020-07-13 17:25:33 UTC
Target Upstream Version:
aygarg: needinfo? (amcdermo)
aygarg: needinfo? (mmasters)
acomabon: needinfo? (mmasters)
aygarg: needinfo? (mmasters)
aygarg: needinfo? (mmasters)


Attachments (Terms of Use)
All the steps to reproduce the issue with two scenarios (10.19 KB, text/plain)
2020-04-06 02:01 UTC, aygarg
no flags Details
clear-route-status.sh updated for 4.x (4.58 KB, application/x-shellscript)
2020-04-15 18:44 UTC, Dan Mace
no flags Details


Links
System ID Priority Status Summary Last Updated
Github openshift router pull 126 None closed Bug 1821095: Try to promote inactive routes following route deletion 2020-08-10 04:58:41 UTC
Red Hat Knowledge Base (Solution) 4964001 None None None 2020-04-09 03:25:15 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-07-13 17:25:56 UTC

Description aygarg 2020-04-06 02:01:25 UTC
Created attachment 1676479 [details]
All the steps to reproduce the issue with two scenarios

Created attachment 1676479 [details]
All the steps to reproduce the issue with two scenarios

Description of problem:
After creating a route another route with the same hostname and path, the "HostAlreadyClaimed" comes up. Even on the deletion of the older route, "HostAlreadyClaimed" comes up for the route with the wrong message.

Version-Release number of selected component (if applicable):
v3.11.188
v3.11.153
v3.11.117

How reproducible:
Every time, I am providing the exact steps.

Steps to Reproduce:
1. Create a new project with a sample ruby application.

# oc new-project path-based-routes

# oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git

# oc get svc
NAME      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
ruby-ex   ClusterIP   172.30.216.93   <none>        8080/TCP   1m

2. Create one route without any path and one route with a path.

# oc create route edge --service=ruby-ex --hostname=base.example.com

# oc get routes
NAME      HOST/PORT          PATH      SERVICES   PORT       TERMINATION   WILDCARD
ruby-ex   base.example.com             ruby-ex    8080-tcp   edge          None

# oc create route edge path1 --service=ruby-ex --hostname=base.example.com --path=/path
route.route.openshift.io/path1 created

# oc get routes
NAME      HOST/PORT          PATH      SERVICES   PORT       TERMINATION   WILDCARD
path1     base.example.com   /path     ruby-ex    8080-tcp   edge          None
ruby-ex   base.example.com             ruby-ex    8080-tcp   edge          None

3. Now, create a route with the same configuration as to that of route path1.

# oc create route edge path2 --service=ruby-ex --hostname=base.example.com --path=/path
route.route.openshift.io/path2 created

# oc get routes
NAME      HOST/PORT            PATH      SERVICES   PORT       TERMINATION   WILDCARD
path1     base.example.com     /path     ruby-ex    8080-tcp   edge          None
path2     HostAlreadyClaimed   /path     ruby-ex    8080-tcp   edge          None
ruby-ex   base.example.com               ruby-ex    8080-tcp   edge          None

4. The description of route path2 reports that the route ruby-ex already exposes base.example.com and is older instead of path1 route.

# oc describe route path2
Name:           path2
Namespace:      path-based-routes
Created:        About a minute ago
Labels:         app=ruby-ex
Annotations:        <none>
Requested Host:     base.example.com
              rejected by router router: HostAlreadyClaimed (about a minute ago)
                route ruby-ex already exposes base.example.com and is older
Path:           /path
TLS Termination:    edge
Insecure Policy:    <none>
Endpoint Port:      8080-tcp

Service:    ruby-ex
Weight:     100 (100%)
Endpoints:  10.130.0.7:8080

5. Even after the deletion of path1 route, the path2 router reports HostAlreadyClaimed with the same description.

# oc delete route path1
route.route.openshift.io "path1" deleted

# oc get routes
NAME      HOST/PORT            PATH      SERVICES   PORT       TERMINATION   WILDCARD
path2     HostAlreadyClaimed   /path     ruby-ex    8080-tcp   edge          None
ruby-ex   base.example.com               ruby-ex    8080-tcp   edge          None

# oc describe route path2
Name:           path2
Namespace:      path-based-routes
Created:        3 minutes ago
Labels:         app=ruby-ex
Annotations:        <none>
Requested Host:     base.example.com
              rejected by router router: HostAlreadyClaimed (3 minutes ago)
                route ruby-ex already exposes base.example.com and is older
Path:           /path
TLS Termination:    edge
Insecure Policy:    <none>
Endpoint Port:      8080-tcp

Service:    ruby-ex
Weight:     100 (100%)
Endpoints:  10.130.0.7:8080

Actual results:
After the deletion of one route out of the two which is having the same hostname and port still shows "HostAlreadyClaimed".

Expected results:
The route shouldn't show "HostAlreadyClaimed".

Additional info:
The issue gets fixed by deleting the router pod so new pod comes up.

I found this bug which was opened for the same issue but it was closed considering not a bug as the wrong router image was being used but I am able to reproduce the issue on a freshly install OCP v3.11.188

https://bugzilla.redhat.com/show_bug.cgi?id=1748489

Comment 1 Dan Mace 2020-04-06 12:12:31 UTC
Routes that are rejected won't be automatically re-processed for validity as you're wanting. In 3.11, we provide a script which clears route status[1] which can be used to cause the routes to be re-processed and which should achieve the desired effect.

Although I do believe the router should handle this automatically, we won't be fixing it in 3.x.

[1] https://github.com/openshift/origin/blob/release-3.11/images/router/clear-route-status.sh

Comment 3 Ben Bennett 2020-04-14 14:50:45 UTC
Setting the target release so that it gets attention.  Once the issue has been identified, we can work out where it gets backported to.

Comment 10 Dan Mace 2020-04-15 18:43:04 UTC
Just a quick update. Here's a self contained minimal reproducer script tested on a 4.5 cluster:

===== BEGIN test.sh =====

#!/bin/bash

oc apply --filename - <<EOF
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  namespace: default
  name: bz1821095-a
spec:
  host: a.example.com
  to:
    kind: Service
    name: kubernetes
  port:
    targetPort: https
EOF

sleep 2

oc apply --filename - <<EOF
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  namespace: default
  name: bz1821095-b
spec:
  host: a.example.com
  path: /foo
  to:
    kind: Service
    name: kubernetes
  port:
    targetPort: https
EOF

sleep 2

oc apply --filename - <<EOF
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  namespace: default
  name: bz1821095-c
spec:
  host: a.example.com
  path: /foo
  to:
    kind: Service
    name: kubernetes
  port:
    targetPort: https
EOF

sleep 2

oc get --namespace default routes/bz1821095-c -o yaml

oc delete --namespace default routes/bz1821095-b

sleep 5

oc get --namespace default routes/bz1821095-c -o yaml

===== END test.sh =====

Observe that status isn't automatically recomputed.

I tried:

  * Using `oc patch` to directly set status to `null` or `{}`. No effect. Seems suspiciously similar to an upstream bug around status subresource patch handling[1].
  * Fixing the old v3 script[2] which issues a direct PUT to the apiserver. This works.
  * Restarting router pods. This causes the route status to be recomputed.

Two things stand out at me:

1. The patch behavior is concerning. Note the route resource itself is not CRD-based[3]. It's possible create works but patch/update is broken somehow.
2. The router controller should be re-admitting automatically in response to the delete event. There could be a bug there, or the controller could be updating and silently failing for reasons related to (1).

[1] https://github.com/kubernetes/kubernetes/issues/67455
[2] https://raw.githubusercontent.com/openshift/origin/release-3.11/images/router/clear-route-status.sh - the problem with the old script is invalid/outdated API paths — use /apis/route.openshift.io/v1/namespaces/${namespace}/routes/${route_name} throughout
[3] https://github.com/openshift/openshift-apiserver/blob/master/pkg/api/legacy/route.go

Comment 11 Dan Mace 2020-04-15 18:44:21 UTC
Created attachment 1679155 [details]
clear-route-status.sh updated for 4.x

Here's a version of clear-route-status.sh updated for 4.x

Comment 19 Dan Mace 2020-04-23 18:26:29 UTC
I believe I've identified a very, very long-standing bug in the router that, when fixed, should cause previously rejected routes to be considered for admission in response to delete events. I'm prototyping a fix in https://github.com/openshift/router/pull/122.

Comment 20 Dan Mace 2020-04-23 18:31:31 UTC
Separately, I still have reason to suspect a related but separate problem whereby route status (and possibly other) updates aren't causing rejected routes to be reconsidered for promotion. The delete handling fix in https://github.com/openshift/router/pull/122 is about getting the behavior correct at the time deletes occur, which should solve the root problem going forward. The possible separate problem with update handling might be useful to run down to help get existing routes fixed up without having to restart the router. Although I was able to repair routes using direct PUT (https://bugzilla.redhat.com/show_bug.cgi?id=1821095#c10), I'd like to understand why the regular patch API is insufficient.

Comment 21 Dan Mace 2020-04-23 18:38:45 UTC
(In reply to Dan Mace from comment #20)
> Separately, I still have reason to suspect a related but separate problem
> whereby route status (and possibly other) updates aren't causing rejected
> routes to be reconsidered for promotion. The delete handling fix in
> https://github.com/openshift/router/pull/122 is about getting the behavior
> correct at the time deletes occur, which should solve the root problem going
> forward. The possible separate problem with update handling might be useful
> to run down to help get existing routes fixed up without having to restart
> the router. Although I was able to repair routes using direct PUT
> (https://bugzilla.redhat.com/show_bug.cgi?id=1821095#c10), I'd like to
> understand why the regular patch API is insufficient.

And one more note as regards to updates. Given:

1. Restarting the routers during the upgrade to get the delete fix will repair existing route status
2. The direct PUT workaround still works post-restart

The status updating bug is probably less urgent and should be split off into a new bug. Once I do a little more testing to more thoroughly describe the problem I can do that and we can hopefully call https://github.com/openshift/router/pull/122 a fix for this bug.

Comment 22 Miciah Dashiel Butler Masters 2020-05-08 19:25:31 UTC
Fix is posted and awaiting review.

Comment 23 Andrew McDermott 2020-05-19 14:53:52 UTC
Moving out to 4.6.

Comment 24 aygarg 2020-05-20 02:28:32 UTC
(In reply to Andrew McDermott from comment #23)
> Moving out to 4.6.

Will it be getting backported to 3.11?

Comment 25 Miciah Dashiel Butler Masters 2020-05-20 15:13:55 UTC
The fix has merged and will be in 4.5.

We'll need to have a discussion about backports.

Comment 26 Arvind iyengar 2020-05-26 08:00:26 UTC
The PR merge made into "4.5.0-0.nightly-2020-05-20-175547" release. With this version, we see that the routes are getting promoted without any additional interventions: 

* Creation of conflicting routes: 
-----
$ oc create route edge --service=service-unsecure --hostname=unsecure.example.com
route.route.openshift.io/service-unsecure created

$ oc create route edge path1 --service=service-unsecure --hostname=unsecure.example.com --path=/path 
route.route.openshift.io/path1 created

$ oc get route 
NAME               HOST/PORT                         PATH    SERVICES           PORT   TERMINATION   WILDCARD
path1              unsecure.example.com ... 1 more   /path   service-unsecure   http   edge          None
service-unsecure   unsecure.example.com ... 1 more           service-unsecure   http   edge          None
    
$ oc create route edge path2 --service=service-unsecure --hostname=unsecure.example.com --path=/path
route.route.openshift.io/path2 created

$ oc get route
NAME               HOST/PORT                         PATH    SERVICES           PORT   TERMINATION   WILDCARD
path1              unsecure.example.com ... 1 more   /path   service-unsecure   http   edge          None
path2              HostAlreadyClaimed                /path   service-unsecure   http   edge          None
service-unsecure   unsecure.example.com ... 1 more           service-unsecure   http   edge          None
-----

* The second "path2" route gets prompted and admitted: 
------
$ oc delete  route path1 
route.route.openshift.io "path1" deleted

$ oc get route 
NAME               HOST/PORT                         PATH    SERVICES           PORT   TERMINATION   WILDCARD
path2              unsecure.example.com ... 1 more   /path   service-unsecure   http   edge          None
service-unsecure   unsecure.example.com ... 1 more           service-unsecure   http   edge          None
------

Comment 30 errata-xmlrpc 2020-07-13 17:25:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.