Bug 1821095

Summary:

Wrong message error displayed when creating a route with path based

Product:

OpenShift Container Platform

Reporter:

aygarg

Component:

Networking

Assignee:

Miciah Dashiel Butler Masters <mmasters>

Networking sub component:

router

QA Contact:

Arvind iyengar <aiyengar>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

medium

Priority:

medium

CC:

acomabon, aiyengar, amcdermo, aos-bugs, bbennett, dmace, mfisher, mmasters, skuznets, sychen

Version:

3.11.0

Keywords:

Reopened

Target Milestone:

---

Target Release:

4.5.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Cause: The router's enforcement of the constraint that each Route have a unique host had an error in the logic for handling the deletion of a Route. Consequence: The router was not promoting inactive Routes when a conflicting Route was deleted. Fix: When a Route is deleted, the router now reprocesses all inactive Routes. Result: Deleting a Route now causes inactive routes that no longer have a conflict to be activated.

Story Points:

---

Clone Of:

Clones:

1884421 (view as bug list)

Environment:

Last Closed:

2020-07-13 17:25:33 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1884421

Attachments:

Description	Flags
All the steps to reproduce the issue with two scenarios	none
clear-route-status.sh updated for 4.x	none

Description aygarg 2020-04-06 02:01:25 UTC

Created attachment 1676479 [details]
All the steps to reproduce the issue with two scenarios

Created attachment 1676479 [details]
All the steps to reproduce the issue with two scenarios

Description of problem:
After creating a route another route with the same hostname and path, the "HostAlreadyClaimed" comes up. Even on the deletion of the older route, "HostAlreadyClaimed" comes up for the route with the wrong message.

Version-Release number of selected component (if applicable):
v3.11.188
v3.11.153
v3.11.117

How reproducible:
Every time, I am providing the exact steps.

Steps to Reproduce:
1. Create a new project with a sample ruby application.

# oc new-project path-based-routes

# oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git

# oc get svc
NAME      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
ruby-ex   ClusterIP   172.30.216.93   <none>        8080/TCP   1m

2. Create one route without any path and one route with a path.

# oc create route edge --service=ruby-ex --hostname=base.example.com

# oc get routes
NAME      HOST/PORT          PATH      SERVICES   PORT       TERMINATION   WILDCARD
ruby-ex   base.example.com             ruby-ex    8080-tcp   edge          None

# oc create route edge path1 --service=ruby-ex --hostname=base.example.com --path=/path
route.route.openshift.io/path1 created

# oc get routes
NAME      HOST/PORT          PATH      SERVICES   PORT       TERMINATION   WILDCARD
path1     base.example.com   /path     ruby-ex    8080-tcp   edge          None
ruby-ex   base.example.com             ruby-ex    8080-tcp   edge          None

3. Now, create a route with the same configuration as to that of route path1.

# oc create route edge path2 --service=ruby-ex --hostname=base.example.com --path=/path
route.route.openshift.io/path2 created

# oc get routes
NAME      HOST/PORT            PATH      SERVICES   PORT       TERMINATION   WILDCARD
path1     base.example.com     /path     ruby-ex    8080-tcp   edge          None
path2     HostAlreadyClaimed   /path     ruby-ex    8080-tcp   edge          None
ruby-ex   base.example.com               ruby-ex    8080-tcp   edge          None

4. The description of route path2 reports that the route ruby-ex already exposes base.example.com and is older instead of path1 route.

# oc describe route path2
Name:           path2
Namespace:      path-based-routes
Created:        About a minute ago
Labels:         app=ruby-ex
Annotations:        <none>
Requested Host:     base.example.com
              rejected by router router: HostAlreadyClaimed (about a minute ago)
                route ruby-ex already exposes base.example.com and is older
Path:           /path
TLS Termination:    edge
Insecure Policy:    <none>
Endpoint Port:      8080-tcp

Service:    ruby-ex
Weight:     100 (100%)
Endpoints:  10.130.0.7:8080

5. Even after the deletion of path1 route, the path2 router reports HostAlreadyClaimed with the same description.

# oc delete route path1
route.route.openshift.io "path1" deleted

# oc get routes
NAME      HOST/PORT            PATH      SERVICES   PORT       TERMINATION   WILDCARD
path2     HostAlreadyClaimed   /path     ruby-ex    8080-tcp   edge          None
ruby-ex   base.example.com               ruby-ex    8080-tcp   edge          None

# oc describe route path2
Name:           path2
Namespace:      path-based-routes
Created:        3 minutes ago
Labels:         app=ruby-ex
Annotations:        <none>
Requested Host:     base.example.com
              rejected by router router: HostAlreadyClaimed (3 minutes ago)
                route ruby-ex already exposes base.example.com and is older
Path:           /path
TLS Termination:    edge
Insecure Policy:    <none>
Endpoint Port:      8080-tcp

Service:    ruby-ex
Weight:     100 (100%)
Endpoints:  10.130.0.7:8080

Actual results:
After the deletion of one route out of the two which is having the same hostname and port still shows "HostAlreadyClaimed".

Expected results:
The route shouldn't show "HostAlreadyClaimed".

Additional info:
The issue gets fixed by deleting the router pod so new pod comes up.

I found this bug which was opened for the same issue but it was closed considering not a bug as the wrong router image was being used but I am able to reproduce the issue on a freshly install OCP v3.11.188

https://bugzilla.redhat.com/show_bug.cgi?id=1748489

Comment 1 Dan Mace 2020-04-06 12:12:31 UTC

Routes that are rejected won't be automatically re-processed for validity as you're wanting. In 3.11, we provide a script which clears route status[1] which can be used to cause the routes to be re-processed and which should achieve the desired effect.

Although I do believe the router should handle this automatically, we won't be fixing it in 3.x.

[1] https://github.com/openshift/origin/blob/release-3.11/images/router/clear-route-status.sh

Comment 3 Ben Bennett 2020-04-14 14:50:45 UTC

Setting the target release so that it gets attention.  Once the issue has been identified, we can work out where it gets backported to.

Comment 10 Dan Mace 2020-04-15 18:43:04 UTC

Just a quick update. Here's a self contained minimal reproducer script tested on a 4.5 cluster:

===== BEGIN test.sh =====

#!/bin/bash

oc apply --filename - <<EOF
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  namespace: default
  name: bz1821095-a
spec:
  host: a.example.com
  to:
    kind: Service
    name: kubernetes
  port:
    targetPort: https
EOF

sleep 2

oc apply --filename - <<EOF
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  namespace: default
  name: bz1821095-b
spec:
  host: a.example.com
  path: /foo
  to:
    kind: Service
    name: kubernetes
  port:
    targetPort: https
EOF

sleep 2

oc apply --filename - <<EOF
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  namespace: default
  name: bz1821095-c
spec:
  host: a.example.com
  path: /foo
  to:
    kind: Service
    name: kubernetes
  port:
    targetPort: https
EOF

sleep 2

oc get --namespace default routes/bz1821095-c -o yaml

oc delete --namespace default routes/bz1821095-b

sleep 5

oc get --namespace default routes/bz1821095-c -o yaml

===== END test.sh =====

Observe that status isn't automatically recomputed.

I tried:

  * Using `oc patch` to directly set status to `null` or `{}`. No effect. Seems suspiciously similar to an upstream bug around status subresource patch handling[1].
  * Fixing the old v3 script[2] which issues a direct PUT to the apiserver. This works.
  * Restarting router pods. This causes the route status to be recomputed.

Two things stand out at me:

1. The patch behavior is concerning. Note the route resource itself is not CRD-based[3]. It's possible create works but patch/update is broken somehow.
2. The router controller should be re-admitting automatically in response to the delete event. There could be a bug there, or the controller could be updating and silently failing for reasons related to (1).

[1] https://github.com/kubernetes/kubernetes/issues/67455
[2] https://raw.githubusercontent.com/openshift/origin/release-3.11/images/router/clear-route-status.sh - the problem with the old script is invalid/outdated API paths — use /apis/route.openshift.io/v1/namespaces/${namespace}/routes/${route_name} throughout
[3] https://github.com/openshift/openshift-apiserver/blob/master/pkg/api/legacy/route.go

Comment 11 Dan Mace 2020-04-15 18:44:21 UTC

Created attachment 1679155 [details]
clear-route-status.sh updated for 4.x

Here's a version of clear-route-status.sh updated for 4.x

Comment 19 Dan Mace 2020-04-23 18:26:29 UTC

I believe I've identified a very, very long-standing bug in the router that, when fixed, should cause previously rejected routes to be considered for admission in response to delete events. I'm prototyping a fix in https://github.com/openshift/router/pull/122.

Comment 20 Dan Mace 2020-04-23 18:31:31 UTC

Separately, I still have reason to suspect a related but separate problem whereby route status (and possibly other) updates aren't causing rejected routes to be reconsidered for promotion. The delete handling fix in https://github.com/openshift/router/pull/122 is about getting the behavior correct at the time deletes occur, which should solve the root problem going forward. The possible separate problem with update handling might be useful to run down to help get existing routes fixed up without having to restart the router. Although I was able to repair routes using direct PUT (https://bugzilla.redhat.com/show_bug.cgi?id=1821095#c10), I'd like to understand why the regular patch API is insufficient.

Comment 21 Dan Mace 2020-04-23 18:38:45 UTC

(In reply to Dan Mace from comment #20)
> Separately, I still have reason to suspect a related but separate problem
> whereby route status (and possibly other) updates aren't causing rejected
> routes to be reconsidered for promotion. The delete handling fix in
> https://github.com/openshift/router/pull/122 is about getting the behavior
> correct at the time deletes occur, which should solve the root problem going
> forward. The possible separate problem with update handling might be useful
> to run down to help get existing routes fixed up without having to restart
> the router. Although I was able to repair routes using direct PUT
> (https://bugzilla.redhat.com/show_bug.cgi?id=1821095#c10), I'd like to
> understand why the regular patch API is insufficient.

And one more note as regards to updates. Given:

1. Restarting the routers during the upgrade to get the delete fix will repair existing route status
2. The direct PUT workaround still works post-restart

The status updating bug is probably less urgent and should be split off into a new bug. Once I do a little more testing to more thoroughly describe the problem I can do that and we can hopefully call https://github.com/openshift/router/pull/122 a fix for this bug.

Comment 22 Miciah Dashiel Butler Masters 2020-05-08 19:25:31 UTC

Fix is posted and awaiting review.

Comment 23 Andrew McDermott 2020-05-19 14:53:52 UTC

Moving out to 4.6.

Comment 24 aygarg 2020-05-20 02:28:32 UTC

(In reply to Andrew McDermott from comment #23)
> Moving out to 4.6.

Will it be getting backported to 3.11?

Comment 25 Miciah Dashiel Butler Masters 2020-05-20 15:13:55 UTC

The fix has merged and will be in 4.5.

We'll need to have a discussion about backports.

Comment 26 Arvind iyengar 2020-05-26 08:00:26 UTC

The PR merge made into "4.5.0-0.nightly-2020-05-20-175547" release. With this version, we see that the routes are getting promoted without any additional interventions: 

* Creation of conflicting routes: 
-----
$ oc create route edge --service=service-unsecure --hostname=unsecure.example.com
route.route.openshift.io/service-unsecure created

$ oc create route edge path1 --service=service-unsecure --hostname=unsecure.example.com --path=/path 
route.route.openshift.io/path1 created

$ oc get route 
NAME               HOST/PORT                         PATH    SERVICES           PORT   TERMINATION   WILDCARD
path1              unsecure.example.com ... 1 more   /path   service-unsecure   http   edge          None
service-unsecure   unsecure.example.com ... 1 more           service-unsecure   http   edge          None
    
$ oc create route edge path2 --service=service-unsecure --hostname=unsecure.example.com --path=/path
route.route.openshift.io/path2 created

$ oc get route
NAME               HOST/PORT                         PATH    SERVICES           PORT   TERMINATION   WILDCARD
path1              unsecure.example.com ... 1 more   /path   service-unsecure   http   edge          None
path2              HostAlreadyClaimed                /path   service-unsecure   http   edge          None
service-unsecure   unsecure.example.com ... 1 more           service-unsecure   http   edge          None
-----

* The second "path2" route gets prompted and admitted: 
------
$ oc delete  route path1 
route.route.openshift.io "path1" deleted

$ oc get route 
NAME               HOST/PORT                         PATH    SERVICES           PORT   TERMINATION   WILDCARD
path2              unsecure.example.com ... 1 more   /path   service-unsecure   http   edge          None
service-unsecure   unsecure.example.com ... 1 more           service-unsecure   http   edge          None
------

Comment 30 errata-xmlrpc 2020-07-13 17:25:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Comment 34 Miciah Dashiel Butler Masters 2021-09-12 05:45:39 UTC

A backport of the fix was verified in v3.11.492 per bug 1884421, comment 10.