Bug 1821095
Summary: | Wrong message error displayed when creating a route with path based | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | aygarg | ||||||
Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> | ||||||
Networking sub component: | router | QA Contact: | Arvind iyengar <aiyengar> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | medium | ||||||||
Priority: | medium | CC: | acomabon, aiyengar, amcdermo, aos-bugs, bbennett, dmace, mfisher, mmasters, skuznets, sychen | ||||||
Version: | 3.11.0 | Keywords: | Reopened | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.5.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: |
Cause: The router's enforcement of the constraint that each Route have a unique host had an error in the logic for handling the deletion of a Route.
Consequence: The router was not promoting inactive Routes when a conflicting Route was deleted.
Fix: When a Route is deleted, the router now reprocesses all inactive Routes.
Result: Deleting a Route now causes inactive routes that no longer have a conflict to be activated.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1884421 (view as bug list) | Environment: | |||||||
Last Closed: | 2020-07-13 17:25:33 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1884421 | ||||||||
Attachments: |
|
Description
aygarg
2020-04-06 02:01:25 UTC
Routes that are rejected won't be automatically re-processed for validity as you're wanting. In 3.11, we provide a script which clears route status[1] which can be used to cause the routes to be re-processed and which should achieve the desired effect. Although I do believe the router should handle this automatically, we won't be fixing it in 3.x. [1] https://github.com/openshift/origin/blob/release-3.11/images/router/clear-route-status.sh Setting the target release so that it gets attention. Once the issue has been identified, we can work out where it gets backported to. Just a quick update. Here's a self contained minimal reproducer script tested on a 4.5 cluster: ===== BEGIN test.sh ===== #!/bin/bash oc apply --filename - <<EOF apiVersion: route.openshift.io/v1 kind: Route metadata: namespace: default name: bz1821095-a spec: host: a.example.com to: kind: Service name: kubernetes port: targetPort: https EOF sleep 2 oc apply --filename - <<EOF apiVersion: route.openshift.io/v1 kind: Route metadata: namespace: default name: bz1821095-b spec: host: a.example.com path: /foo to: kind: Service name: kubernetes port: targetPort: https EOF sleep 2 oc apply --filename - <<EOF apiVersion: route.openshift.io/v1 kind: Route metadata: namespace: default name: bz1821095-c spec: host: a.example.com path: /foo to: kind: Service name: kubernetes port: targetPort: https EOF sleep 2 oc get --namespace default routes/bz1821095-c -o yaml oc delete --namespace default routes/bz1821095-b sleep 5 oc get --namespace default routes/bz1821095-c -o yaml ===== END test.sh ===== Observe that status isn't automatically recomputed. I tried: * Using `oc patch` to directly set status to `null` or `{}`. No effect. Seems suspiciously similar to an upstream bug around status subresource patch handling[1]. * Fixing the old v3 script[2] which issues a direct PUT to the apiserver. This works. * Restarting router pods. This causes the route status to be recomputed. Two things stand out at me: 1. The patch behavior is concerning. Note the route resource itself is not CRD-based[3]. It's possible create works but patch/update is broken somehow. 2. The router controller should be re-admitting automatically in response to the delete event. There could be a bug there, or the controller could be updating and silently failing for reasons related to (1). [1] https://github.com/kubernetes/kubernetes/issues/67455 [2] https://raw.githubusercontent.com/openshift/origin/release-3.11/images/router/clear-route-status.sh - the problem with the old script is invalid/outdated API paths — use /apis/route.openshift.io/v1/namespaces/${namespace}/routes/${route_name} throughout [3] https://github.com/openshift/openshift-apiserver/blob/master/pkg/api/legacy/route.go Created attachment 1679155 [details]
clear-route-status.sh updated for 4.x
Here's a version of clear-route-status.sh updated for 4.x
I believe I've identified a very, very long-standing bug in the router that, when fixed, should cause previously rejected routes to be considered for admission in response to delete events. I'm prototyping a fix in https://github.com/openshift/router/pull/122. Separately, I still have reason to suspect a related but separate problem whereby route status (and possibly other) updates aren't causing rejected routes to be reconsidered for promotion. The delete handling fix in https://github.com/openshift/router/pull/122 is about getting the behavior correct at the time deletes occur, which should solve the root problem going forward. The possible separate problem with update handling might be useful to run down to help get existing routes fixed up without having to restart the router. Although I was able to repair routes using direct PUT (https://bugzilla.redhat.com/show_bug.cgi?id=1821095#c10), I'd like to understand why the regular patch API is insufficient. (In reply to Dan Mace from comment #20) > Separately, I still have reason to suspect a related but separate problem > whereby route status (and possibly other) updates aren't causing rejected > routes to be reconsidered for promotion. The delete handling fix in > https://github.com/openshift/router/pull/122 is about getting the behavior > correct at the time deletes occur, which should solve the root problem going > forward. The possible separate problem with update handling might be useful > to run down to help get existing routes fixed up without having to restart > the router. Although I was able to repair routes using direct PUT > (https://bugzilla.redhat.com/show_bug.cgi?id=1821095#c10), I'd like to > understand why the regular patch API is insufficient. And one more note as regards to updates. Given: 1. Restarting the routers during the upgrade to get the delete fix will repair existing route status 2. The direct PUT workaround still works post-restart The status updating bug is probably less urgent and should be split off into a new bug. Once I do a little more testing to more thoroughly describe the problem I can do that and we can hopefully call https://github.com/openshift/router/pull/122 a fix for this bug. Fix is posted and awaiting review. Moving out to 4.6. (In reply to Andrew McDermott from comment #23) > Moving out to 4.6. Will it be getting backported to 3.11? The fix has merged and will be in 4.5. We'll need to have a discussion about backports. The PR merge made into "4.5.0-0.nightly-2020-05-20-175547" release. With this version, we see that the routes are getting promoted without any additional interventions: * Creation of conflicting routes: ----- $ oc create route edge --service=service-unsecure --hostname=unsecure.example.com route.route.openshift.io/service-unsecure created $ oc create route edge path1 --service=service-unsecure --hostname=unsecure.example.com --path=/path route.route.openshift.io/path1 created $ oc get route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD path1 unsecure.example.com ... 1 more /path service-unsecure http edge None service-unsecure unsecure.example.com ... 1 more service-unsecure http edge None $ oc create route edge path2 --service=service-unsecure --hostname=unsecure.example.com --path=/path route.route.openshift.io/path2 created $ oc get route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD path1 unsecure.example.com ... 1 more /path service-unsecure http edge None path2 HostAlreadyClaimed /path service-unsecure http edge None service-unsecure unsecure.example.com ... 1 more service-unsecure http edge None ----- * The second "path2" route gets prompted and admitted: ------ $ oc delete route path1 route.route.openshift.io "path1" deleted $ oc get route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD path2 unsecure.example.com ... 1 more /path service-unsecure http edge None service-unsecure unsecure.example.com ... 1 more service-unsecure http edge None ------ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 A backport of the fix was verified in v3.11.492 per bug 1884421, comment 10. |