A customer created a second ingress operator for a different domain but they forgot to apply a route or namespace selector. All projects got updated with the second router. They deleted this controller, but all routes are not updated and they're not able to edit their status field. This can easily be reproduced in a lab: ~~~ cat <<'EOF' | oc apply -f - apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: sharded namespace: openshift-ingress-operator spec: domain: shard.ipi-cluster.example.com nodePlacement: nodeSelector: matchLabels: node-role.kubernetes.io/worker: "" EOF oc scale ingresscontroller -n openshift-ingress-operator --replicas 1 sharded oc scale ingresscontroller -n openshift-ingress-operator --replicas 1 default ~~~ Wait until both ingress routers come up, then delete the new sharded router: ~~~ oc delete -n openshift-ingress-operator ingresscontroller sharded ~~~ All routes still show 2 entries in the status fields and only in the status field: ~~~ [root@openshift-jumpserver-0 ~]# oc get routes -A NAMESPACE NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD openshift-authentication oauth-openshift oauth-openshift.apps.ipi-cluster.example.com ... 1 more oauth-openshift 6443 passthrough/Redirect None openshift-console console console-openshift-console.apps.ipi-cluster.example.com ... 1 more console https reencrypt/Redirect None openshift-console downloads downloads-openshift-console.apps.ipi-cluster.example.com ... 1 more downloads http edge/Redirect None openshift-ingress-canary canary canary-openshift-ingress-canary.apps.ipi-cluster.example.com ... 1 more ingress-canary 8080 edge/Redirect None openshift-monitoring alertmanager-main alertmanager-main-openshift-monitoring.apps.ipi-cluster.example.com ... 1 more alertmanager-main web reencrypt/Redirect None openshift-monitoring grafana grafana-openshift-monitoring.apps.ipi-cluster.example.com ... 1 more grafana https reencrypt/Redirect None openshift-monitoring prometheus-k8s prometheus-k8s-openshift-monitoring.apps.ipi-cluster.example.com ... 1 more prometheus-k8s web reencrypt/Redirect None openshift-monitoring thanos-querier thanos-querier-openshift-monitoring.apps.ipi-cluster.example.com ... 1 more thanos-querier web reencrypt/Redirect None ~~~ ~~~ [root@openshift-jumpserver-0 ~]# oc get route -o json -n openshift-authentication oauth-openshift | jq '.status' { "ingress": [ { "conditions": [ { "lastTransitionTime": "2021-03-17T12:48:44Z", "status": "True", "type": "Admitted" } ], "host": "oauth-openshift.apps.ipi-cluster.example.com", "routerCanonicalHostname": "apps.ipi-cluster.example.com", "routerName": "default", "wildcardPolicy": "None" }, { "conditions": [ { "lastTransitionTime": "2021-03-30T18:25:55Z", "status": "True", "type": "Admitted" } ], "host": "oauth-openshift.apps.ipi-cluster.example.com", "routerCanonicalHostname": "shard.ipi-cluster.example.com", "routerName": "sharded", "wildcardPolicy": "None" } ] } ~~~ We do have an old bug for that which was reported at the times of OCP 3.x: https://bugzilla.redhat.com/show_bug.cgi?id=1356819 The solution at the time was to manually clean up the routes: https://docs.openshift.com/container-platform/3.11/architecture/networking/routes.html#route-status-field With: https://github.com/openshift/origin/blob/release-3.11/images/router/clear-route-status.sh That script, however, does not work in OCP 4.x: ~~~ [root@openshift-jumpserver-0 ~]# oc get routes -A NAMESPACE NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD openshift-authentication oauth-openshift oauth-openshift.apps.ipi-cluster.example.com ... 1 more oauth-openshift 6443 passthrough/Redirect None openshift-console console console-openshift-console.apps.ipi-cluster.example.com ... 1 more console https reencrypt/Redirect None openshift-console downloads downloads-openshift-console.apps.ipi-cluster.example.com ... 1 more downloads http edge/Redirect None openshift-ingress-canary canary canary-openshift-ingress-canary.apps.ipi-cluster.example.com ... 1 more ingress-canary 8080 edge/Redirect None openshift-monitoring alertmanager-main alertmanager-main-openshift-monitoring.apps.ipi-cluster.example.com ... 1 more alertmanager-main web reencrypt/Redirect None openshift-monitoring grafana grafana-openshift-monitoring.apps.ipi-cluster.example.com ... 1 more grafana https reencrypt/Redirect None openshift-monitoring prometheus-k8s prometheus-k8s-openshift-monitoring.apps.ipi-cluster.example.com ... 1 more prometheus-k8s web reencrypt/Redirect None openshift-monitoring thanos-querier thanos-querier-openshift-monitoring.apps.ipi-cluster.example.com ... 1 more thanos-querier web reencrypt/Redirect None [root@openshift-jumpserver-0 ~]# [root@openshift-jumpserver-0 ~]# [root@openshift-jumpserver-0 ~]# [root@openshift-jumpserver-0 ~]# bash ./clear-route-status.sh openshift-authentication ALL Error from server (NotFound): the server could not find the requested resource ~~~
I adjusted the cleanup script for OCP 4 [1]. There was only a minor change to make [0] Create the script in [1] and put it into file clear-route-status-ocp4.sh Then, run: ~~~ namespaces=$(oc get routes -A | tail -n+2 | awk '{print $1}' | uniq); for n in $namespaces ; do bash ./clear-route-status-ocp4.sh $n ALL ; done ~~~ The output will be: ~~~ [root@openshift-jumpserver-0 ~]# namespaces=$(oc get routes -A | tail -n+2 | awk '{print $1}' | uniq); for n in $namespaces ; do bash ./clear-route-status-ocp4.sh $n ALL ; done route status for route oauth-openshift in namespace openshift-authentication cleared route status for route console in namespace openshift-console cleared route status for route downloads in namespace openshift-console cleared route status for route canary in namespace openshift-ingress-canary cleared route status for route alertmanager-main in namespace openshift-monitoring cleared route status for route grafana in namespace openshift-monitoring cleared route status for route prometheus-k8s in namespace openshift-monitoring cleared route status for route thanos-querier in namespace openshift-monitoring cleared [root@openshift-jumpserver-0 ~]# oc get routes -A NAMESPACE NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD openshift-authentication oauth-openshift oauth-openshift.apps.ipi-cluster.example.com oauth-openshift 6443 passthrough/Redirect None openshift-console console console-openshift-console.apps.ipi-cluster.example.com console https reencrypt/Redirect None openshift-console downloads downloads-openshift-console.apps.ipi-cluster.example.com downloads http edge/Redirect None openshift-ingress-canary canary canary-openshift-ingress-canary.apps.ipi-cluster.example.com ingress-canary 8080 edge/Redirect None openshift-monitoring alertmanager-main alertmanager-main-openshift-monitoring.apps.ipi-cluster.example.com alertmanager-main web reencrypt/Redirect None openshift-monitoring grafana grafana-openshift-monitoring.apps.ipi-cluster.example.com grafana https reencrypt/Redirect None openshift-monitoring prometheus-k8s prometheus-k8s-openshift-monitoring.apps.ipi-cluster.example.com prometheus-k8s web reencrypt/Redirect None openshift-monitoring thanos-querier thanos-querier-openshift-monitoring.apps.ipi-cluster.example.com thanos-querier web reencrypt/Redirect None ~~~ --------------------------------------------------------------------------- [0] There's only a minor difference to the original script: ~~~ [root@openshift-jumpserver-0 ~]# diff -u clear-route-status* --- clear-route-status-ocp4.sh 2021-03-30 19:22:33.944568703 +0000 +++ clear-route-status.sh 2021-03-30 18:46:49.407568703 +0000 @@ -11,14 +11,10 @@ function clear_status() { local namespace="${1}" local route_name="${2}" - local my_json_blob; my_json_blob=$(oc get route -n ${namespace} ${route_name} -o json) - local modified_json; modified_json=$(echo "${my_json_blob}" | jq -c 'del(.status.ingress)') - curl -s -X PUT http://localhost:8001/apis/route.openshift.io/v1/namespaces/${namespace}/routes/"${route_name}"/status --data-binary "${modified_json}" -H "Content-Type: application/json" > /dev/null - if [ "$?" == "0" ] ; then - echo "route status for route ${route_name} in namespace ${namespace} cleared" - else - echo "error modifying route ${route_name} in namespace ${namespace}" - fi + local my_json_blob; my_json_blob=$(oc get --raw http://localhost:8001/oapi/v1/namespaces/${namespace}/routes/${route_name}/) + local modified_json; modified_json=$(echo "${my_json_blob}" | jq 'del(.status.ingress)') + curl -s -X PUT http://localhost:8001/oapi/v1/namespaces/"${namespace}"/routes/"${route_name}"/status --data-binary "${modified_json}" -H "Content-Type: application/json" > /dev/null + echo "route status for route "${route_name}" in namespace "${namespace}" cleared" } #sets up clearing a status set by a specific router ~~~ [1] Full script for OCP 4: ~~~ #!/bin/bash set -o errexit set -o pipefail set -o nounset # This allows for the clearing of route statuses, routers don't clear the routes status so some may be stale. # Upon deletion of the routes status active routers will immediately update with a vaild status #clears status of all routers function clear_status() { local namespace="${1}" local route_name="${2}" local my_json_blob; my_json_blob=$(oc get route -n ${namespace} ${route_name} -o json) local modified_json; modified_json=$(echo "${my_json_blob}" | jq -c 'del(.status.ingress)') curl -s -X PUT http://localhost:8001/apis/route.openshift.io/v1/namespaces/${namespace}/routes/"${route_name}"/status --data-binary "${modified_json}" -H "Content-Type: application/json" > /dev/null if [ "$?" == "0" ] ; then echo "route status for route ${route_name} in namespace ${namespace} cleared" else echo "error modifying route ${route_name} in namespace ${namespace}" fi } #sets up clearing a status set by a specific router function clear_status_set_by() { local router_name="${1}" for namespace in $( oc get namespaces -o 'jsonpath={.items[*].metadata.name}' ); do local routes; routes=($(oc get routes -o jsonpath='{.items[*].metadata.name}' --namespace="${namespace}" 2>/dev/null)) if [[ "${#routes[@]}" -ne 0 ]]; then for route in "${routes[@]}"; do clear_routers_status "${namespace}" "${route}" "${router_name}" done else echo "No routes found for namespace "${namespace}"" fi done } # clears the status field of a specific router name function clear_routers_status() { local namespace="${1}" local route_name="${2}" local router_name="${3}" local my_json_blob; my_json_blob=$(oc get --raw http://localhost:8001/oapi/v1/namespaces/"${namespace}"/routes/"${route_name}"/) local modified_json; modified_json=$(echo "${my_json_blob}" | jq '."status"."ingress"|=map(select(.routerName != "'${router_name}'"))') if [[ "${modified_json}" != "$(echo "${my_json_blob}" | jq '.')" ]]; then curl -s -X PUT http://localhost:8001/oapi/v1/namespaces/"${namespace}"/routes/"${route_name}"/status --data-binary "${modified_json}" -H "Content-Type: application/json" > /dev/null echo "route status for route "${route_name}" set by router "${router_name}" cleared" else echo "route "${route_name}" has no status set by "${router_name}"" fi } function cleanup() { if [[ -n "${PROXY_PID:+unset_check}" ]]; then kill "${PROXY_PID}" fi } trap cleanup EXIT USAGE="Usage: To clear only the status set by a specific router on all routes in all namespaces ./clear-router-status.sh -r [router_name] router_name is the name in the deployment config, not the name of the pod. If the router is running it will immediately update any cleared status. To clear the status field of a route or all routes in a given namespace ./clear-route-status.sh [namespace] [route-name | ALL] Example Usage -------------- To clear the status of all routes in all namespaces: oc get namespaces | awk '{if (NR!=1) print \$1}' | xargs -n 1 -I %% ./clear-route-status.sh %% ALL To clear the status of all routes in namespace default: ./clear-route-status.sh default ALL To clear the status of route example in namespace default: ./clear-route-status.sh default example NOTE: if a router that admits a route is running it will immediately update the cleared route status " if [[ ${#} -ne 2 || "${@}" == *" help "* ]]; then printf "%s" "${USAGE}" exit fi if ! command -v jq >/dev/null 2>&1; then printf "%s\n%s\n" "Command line JSON processor 'jq' not found." "please install 'jq' version greater than 1.4 to use this script." exit 1 fi if ! echo | jq '."status"."ingress"|=map(select(.routerName != "test"))' >/dev/null 2>&1; then printf "%s\n%s\n" "Command line JSON processor 'jq' version is incorrect." "Please install 'jq' version greater than 1.4 to use this script" exit 1 fi oc proxy > /dev/null & PROXY_PID="${!}" ## attempt to access the proxy until it is online until curl -s -X GET http://localhost:8001/oapi/v1/ >/dev/null; do sleep 1 done if [[ "${1}" == "-r" ]]; then clear_status_set_by "${2}" exit fi namespace="${1}" route_name="${2}" if [[ "${route_name}" == "ALL" ]]; then routes=($(oc get routes -o jsonpath='{.items[*].metadata.name}' --namespace="${namespace}" 2>/dev/null)) if [[ "${#routes[@]}" -ne 0 ]]; then for route in "${routes[@]}"; do clear_status "${namespace}" "${route}" done else echo "No routes found for namespace "${namespace}"" fi else clear_status "${namespace}" "${route_name}" fi ~~~
I really do not agree though with the conclusion from OCP 3: https://bugzilla.redhat.com/show_bug.cgi?id=1356819#c3 The fact that a customer hits the same issue years later IMO shows that the cleanup script should be replaced with automation through the controller. Back in the days, we argued that: ~~~ The route status is really meant to serve as a debugging indicator for why a route was allowed/not allowed in by a specific router. As routers go away, we don't really clear the status - look it as a more like an events/logs thing. (...) ~~~ Given that we now have the ingress operator, it should take care of updating a route's status whenever an owning ingresscontroller is deleted. That should be achievable by parsing the status/ingress/routerName field for all ingress entries and if the deleted ingresscontroller name matches the status part, we should delete it. At the least, we should add something that indicates that this was deleted and we should not show "1 more" in the overview Thanks, Andreas
Thanks for the script. We will review the script but also raise an RFE so that we can consider automating this in the operator.
*** Bug 1958088 has been marked as a duplicate of this bug. ***
we do have the same issue, I do suggest heavily that it will get automated in the operator. I do understand the fact, that it might not bee necessary to clear the status.. The problem is that the UI Frontend uses (for me falsely) the latest entry of the status entries array. So if the router witch the stale status gets to be the latest of all router statuses, the UI shows an stale route information... In my bug https://bugzilla.redhat.com/show_bug.cgi?id=1958088, I suggested to track this in two issues. One for the actual status clearing, and the other one for the UI changes.. not sure if that changes anything, but this UI bug makes this issue an urgent p for us internally, since we can't be sure user of the UI see the current & correct route information!
Verified it with 4.11.0-0.nightly-2022-06-22-190830 1. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-22-190830 True False 89m Cluster version is 4.11.0-0.nightly-2022-06-22-190830 % 2. % oc create -f ddds ingresscontroller.operator.openshift.io/sharded created % % cat ddds apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: sharded namespace: openshift-ingress-operator spec: domain: shard.shudi-411djp90.qe.gcp.devcluster.openshift.com nodePlacement: nodeSelector: matchLabels: node-role.kubernetes.io/worker: "" % 3. %oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-58bfd965c6-fk2w6 1/1 Running 0 5h25m router-default-58bfd965c6-t7tpx 1/1 Running 0 5h25m router-sharded-ccb7565bc-fkggh 1/1 Running 0 27s router-sharded-ccb7565bc-skc78 1/1 Running 0 27s % oc get route -o json -n openshift-authentication oauth-openshift | jq '.status' { "ingress": [ { "conditions": [ { "lastTransitionTime": "2022-06-23T01:27:40Z", "status": "True", "type": "Admitted" } ], "host": "oauth-openshift.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerCanonicalHostname": "router-default.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerName": "default", "wildcardPolicy": "None" }, { "conditions": [ { "lastTransitionTime": "2022-06-23T06:47:12Z", "status": "True", "type": "Admitted" } ], "host": "oauth-openshift.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerCanonicalHostname": "router-sharded.shard.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerName": "sharded", "wildcardPolicy": "None" } ] } % 4. % oc scale ingresscontroller -n openshift-ingress-operator --replicas 1 sharded ingresscontroller.operator.openshift.io/sharded scaled % oc scale ingresscontroller -n openshift-ingress-operator --replicas 1 default ingresscontroller.operator.openshift.io/default scaled % 5. % oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-58bfd965c6-fk2w6 1/1 Running 0 5h27m router-sharded-ccb7565bc-skc78 1/1 Running 0 2m37s % 6. %oc delete -n openshift-ingress-operator ingresscontroller sharded ingresscontroller.operator.openshift.io "sharded" deleted % 7. the "router-sharded" is removed from oauth-openshift route % oc get route -o json -n openshift-authentication oauth-openshift | jq '.status' { "ingress": [ { "conditions": [ { "lastTransitionTime": "2022-06-23T01:27:40Z", "status": "True", "type": "Admitted" } ], "host": "oauth-openshift.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerCanonicalHostname": "router-default.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerName": "default", "wildcardPolicy": "None" } ] } % 8. the "router-sharded" is removed from other routes, too % oc get route -o json -n openshift-console console | jq '.status' { "ingress": [ { "conditions": [ { "lastTransitionTime": "2022-06-23T01:27:40Z", "status": "True", "type": "Admitted" } ], "host": "console-openshift-console.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerCanonicalHostname": "router-default.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerName": "default", "wildcardPolicy": "None" } ] } % % oc get route -o json -n openshift-ingress-canary canary | jq '.status' { "ingress": [ { "conditions": [ { "lastTransitionTime": "2022-06-23T01:27:40Z", "status": "True", "type": "Admitted" } ], "host": "canary-openshift-ingress-canary.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerCanonicalHostname": "router-default.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerName": "default", "wildcardPolicy": "None" } ] } %
There is a follow-up fix in bug 2101878.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069