Bug 1944851
Summary: | List of ingress routes not cleaned up when routers no longer exist - take 2 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Andreas Karis <akaris> |
Component: | Networking | Assignee: | Grant Spence <gspence> |
Networking sub component: | router | QA Contact: | Shudi Li <shudili> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | low | CC: | alexander, amcdermo, aos-bugs, gspence, hongli, mjoseph, mmasters |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
If a previously admitted route's ingress controller is deleted or sharding configuration is added, the status will indicate it is still admitted, which is incorrect.
Consequence:
The route status will mislead users into thinking the route is still admitted when it is not.
Fix:
The ingress operator will clear the status of the route when a route is unadmitted.
Result:
When an ingress controller is updated to shard a route or it is deleted, the route status will be cleared.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 10:36:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Andreas Karis
2021-03-30 19:38:47 UTC
I adjusted the cleanup script for OCP 4 [1]. There was only a minor change to make [0] Create the script in [1] and put it into file clear-route-status-ocp4.sh Then, run: ~~~ namespaces=$(oc get routes -A | tail -n+2 | awk '{print $1}' | uniq); for n in $namespaces ; do bash ./clear-route-status-ocp4.sh $n ALL ; done ~~~ The output will be: ~~~ [root@openshift-jumpserver-0 ~]# namespaces=$(oc get routes -A | tail -n+2 | awk '{print $1}' | uniq); for n in $namespaces ; do bash ./clear-route-status-ocp4.sh $n ALL ; done route status for route oauth-openshift in namespace openshift-authentication cleared route status for route console in namespace openshift-console cleared route status for route downloads in namespace openshift-console cleared route status for route canary in namespace openshift-ingress-canary cleared route status for route alertmanager-main in namespace openshift-monitoring cleared route status for route grafana in namespace openshift-monitoring cleared route status for route prometheus-k8s in namespace openshift-monitoring cleared route status for route thanos-querier in namespace openshift-monitoring cleared [root@openshift-jumpserver-0 ~]# oc get routes -A NAMESPACE NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD openshift-authentication oauth-openshift oauth-openshift.apps.ipi-cluster.example.com oauth-openshift 6443 passthrough/Redirect None openshift-console console console-openshift-console.apps.ipi-cluster.example.com console https reencrypt/Redirect None openshift-console downloads downloads-openshift-console.apps.ipi-cluster.example.com downloads http edge/Redirect None openshift-ingress-canary canary canary-openshift-ingress-canary.apps.ipi-cluster.example.com ingress-canary 8080 edge/Redirect None openshift-monitoring alertmanager-main alertmanager-main-openshift-monitoring.apps.ipi-cluster.example.com alertmanager-main web reencrypt/Redirect None openshift-monitoring grafana grafana-openshift-monitoring.apps.ipi-cluster.example.com grafana https reencrypt/Redirect None openshift-monitoring prometheus-k8s prometheus-k8s-openshift-monitoring.apps.ipi-cluster.example.com prometheus-k8s web reencrypt/Redirect None openshift-monitoring thanos-querier thanos-querier-openshift-monitoring.apps.ipi-cluster.example.com thanos-querier web reencrypt/Redirect None ~~~ --------------------------------------------------------------------------- [0] There's only a minor difference to the original script: ~~~ [root@openshift-jumpserver-0 ~]# diff -u clear-route-status* --- clear-route-status-ocp4.sh 2021-03-30 19:22:33.944568703 +0000 +++ clear-route-status.sh 2021-03-30 18:46:49.407568703 +0000 @@ -11,14 +11,10 @@ function clear_status() { local namespace="${1}" local route_name="${2}" - local my_json_blob; my_json_blob=$(oc get route -n ${namespace} ${route_name} -o json) - local modified_json; modified_json=$(echo "${my_json_blob}" | jq -c 'del(.status.ingress)') - curl -s -X PUT http://localhost:8001/apis/route.openshift.io/v1/namespaces/${namespace}/routes/"${route_name}"/status --data-binary "${modified_json}" -H "Content-Type: application/json" > /dev/null - if [ "$?" == "0" ] ; then - echo "route status for route ${route_name} in namespace ${namespace} cleared" - else - echo "error modifying route ${route_name} in namespace ${namespace}" - fi + local my_json_blob; my_json_blob=$(oc get --raw http://localhost:8001/oapi/v1/namespaces/${namespace}/routes/${route_name}/) + local modified_json; modified_json=$(echo "${my_json_blob}" | jq 'del(.status.ingress)') + curl -s -X PUT http://localhost:8001/oapi/v1/namespaces/"${namespace}"/routes/"${route_name}"/status --data-binary "${modified_json}" -H "Content-Type: application/json" > /dev/null + echo "route status for route "${route_name}" in namespace "${namespace}" cleared" } #sets up clearing a status set by a specific router ~~~ [1] Full script for OCP 4: ~~~ #!/bin/bash set -o errexit set -o pipefail set -o nounset # This allows for the clearing of route statuses, routers don't clear the routes status so some may be stale. # Upon deletion of the routes status active routers will immediately update with a vaild status #clears status of all routers function clear_status() { local namespace="${1}" local route_name="${2}" local my_json_blob; my_json_blob=$(oc get route -n ${namespace} ${route_name} -o json) local modified_json; modified_json=$(echo "${my_json_blob}" | jq -c 'del(.status.ingress)') curl -s -X PUT http://localhost:8001/apis/route.openshift.io/v1/namespaces/${namespace}/routes/"${route_name}"/status --data-binary "${modified_json}" -H "Content-Type: application/json" > /dev/null if [ "$?" == "0" ] ; then echo "route status for route ${route_name} in namespace ${namespace} cleared" else echo "error modifying route ${route_name} in namespace ${namespace}" fi } #sets up clearing a status set by a specific router function clear_status_set_by() { local router_name="${1}" for namespace in $( oc get namespaces -o 'jsonpath={.items[*].metadata.name}' ); do local routes; routes=($(oc get routes -o jsonpath='{.items[*].metadata.name}' --namespace="${namespace}" 2>/dev/null)) if [[ "${#routes[@]}" -ne 0 ]]; then for route in "${routes[@]}"; do clear_routers_status "${namespace}" "${route}" "${router_name}" done else echo "No routes found for namespace "${namespace}"" fi done } # clears the status field of a specific router name function clear_routers_status() { local namespace="${1}" local route_name="${2}" local router_name="${3}" local my_json_blob; my_json_blob=$(oc get --raw http://localhost:8001/oapi/v1/namespaces/"${namespace}"/routes/"${route_name}"/) local modified_json; modified_json=$(echo "${my_json_blob}" | jq '."status"."ingress"|=map(select(.routerName != "'${router_name}'"))') if [[ "${modified_json}" != "$(echo "${my_json_blob}" | jq '.')" ]]; then curl -s -X PUT http://localhost:8001/oapi/v1/namespaces/"${namespace}"/routes/"${route_name}"/status --data-binary "${modified_json}" -H "Content-Type: application/json" > /dev/null echo "route status for route "${route_name}" set by router "${router_name}" cleared" else echo "route "${route_name}" has no status set by "${router_name}"" fi } function cleanup() { if [[ -n "${PROXY_PID:+unset_check}" ]]; then kill "${PROXY_PID}" fi } trap cleanup EXIT USAGE="Usage: To clear only the status set by a specific router on all routes in all namespaces ./clear-router-status.sh -r [router_name] router_name is the name in the deployment config, not the name of the pod. If the router is running it will immediately update any cleared status. To clear the status field of a route or all routes in a given namespace ./clear-route-status.sh [namespace] [route-name | ALL] Example Usage -------------- To clear the status of all routes in all namespaces: oc get namespaces | awk '{if (NR!=1) print \$1}' | xargs -n 1 -I %% ./clear-route-status.sh %% ALL To clear the status of all routes in namespace default: ./clear-route-status.sh default ALL To clear the status of route example in namespace default: ./clear-route-status.sh default example NOTE: if a router that admits a route is running it will immediately update the cleared route status " if [[ ${#} -ne 2 || "${@}" == *" help "* ]]; then printf "%s" "${USAGE}" exit fi if ! command -v jq >/dev/null 2>&1; then printf "%s\n%s\n" "Command line JSON processor 'jq' not found." "please install 'jq' version greater than 1.4 to use this script." exit 1 fi if ! echo | jq '."status"."ingress"|=map(select(.routerName != "test"))' >/dev/null 2>&1; then printf "%s\n%s\n" "Command line JSON processor 'jq' version is incorrect." "Please install 'jq' version greater than 1.4 to use this script" exit 1 fi oc proxy > /dev/null & PROXY_PID="${!}" ## attempt to access the proxy until it is online until curl -s -X GET http://localhost:8001/oapi/v1/ >/dev/null; do sleep 1 done if [[ "${1}" == "-r" ]]; then clear_status_set_by "${2}" exit fi namespace="${1}" route_name="${2}" if [[ "${route_name}" == "ALL" ]]; then routes=($(oc get routes -o jsonpath='{.items[*].metadata.name}' --namespace="${namespace}" 2>/dev/null)) if [[ "${#routes[@]}" -ne 0 ]]; then for route in "${routes[@]}"; do clear_status "${namespace}" "${route}" done else echo "No routes found for namespace "${namespace}"" fi else clear_status "${namespace}" "${route_name}" fi ~~~ I really do not agree though with the conclusion from OCP 3: https://bugzilla.redhat.com/show_bug.cgi?id=1356819#c3 The fact that a customer hits the same issue years later IMO shows that the cleanup script should be replaced with automation through the controller. Back in the days, we argued that: ~~~ The route status is really meant to serve as a debugging indicator for why a route was allowed/not allowed in by a specific router. As routers go away, we don't really clear the status - look it as a more like an events/logs thing. (...) ~~~ Given that we now have the ingress operator, it should take care of updating a route's status whenever an owning ingresscontroller is deleted. That should be achievable by parsing the status/ingress/routerName field for all ingress entries and if the deleted ingresscontroller name matches the status part, we should delete it. At the least, we should add something that indicates that this was deleted and we should not show "1 more" in the overview Thanks, Andreas Thanks for the script. We will review the script but also raise an RFE so that we can consider automating this in the operator. *** Bug 1958088 has been marked as a duplicate of this bug. *** we do have the same issue, I do suggest heavily that it will get automated in the operator. I do understand the fact, that it might not bee necessary to clear the status.. The problem is that the UI Frontend uses (for me falsely) the latest entry of the status entries array. So if the router witch the stale status gets to be the latest of all router statuses, the UI shows an stale route information... In my bug https://bugzilla.redhat.com/show_bug.cgi?id=1958088, I suggested to track this in two issues. One for the actual status clearing, and the other one for the UI changes.. not sure if that changes anything, but this UI bug makes this issue an urgent p for us internally, since we can't be sure user of the UI see the current & correct route information! Verified it with 4.11.0-0.nightly-2022-06-22-190830 1. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-22-190830 True False 89m Cluster version is 4.11.0-0.nightly-2022-06-22-190830 % 2. % oc create -f ddds ingresscontroller.operator.openshift.io/sharded created % % cat ddds apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: sharded namespace: openshift-ingress-operator spec: domain: shard.shudi-411djp90.qe.gcp.devcluster.openshift.com nodePlacement: nodeSelector: matchLabels: node-role.kubernetes.io/worker: "" % 3. %oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-58bfd965c6-fk2w6 1/1 Running 0 5h25m router-default-58bfd965c6-t7tpx 1/1 Running 0 5h25m router-sharded-ccb7565bc-fkggh 1/1 Running 0 27s router-sharded-ccb7565bc-skc78 1/1 Running 0 27s % oc get route -o json -n openshift-authentication oauth-openshift | jq '.status' { "ingress": [ { "conditions": [ { "lastTransitionTime": "2022-06-23T01:27:40Z", "status": "True", "type": "Admitted" } ], "host": "oauth-openshift.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerCanonicalHostname": "router-default.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerName": "default", "wildcardPolicy": "None" }, { "conditions": [ { "lastTransitionTime": "2022-06-23T06:47:12Z", "status": "True", "type": "Admitted" } ], "host": "oauth-openshift.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerCanonicalHostname": "router-sharded.shard.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerName": "sharded", "wildcardPolicy": "None" } ] } % 4. % oc scale ingresscontroller -n openshift-ingress-operator --replicas 1 sharded ingresscontroller.operator.openshift.io/sharded scaled % oc scale ingresscontroller -n openshift-ingress-operator --replicas 1 default ingresscontroller.operator.openshift.io/default scaled % 5. % oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-58bfd965c6-fk2w6 1/1 Running 0 5h27m router-sharded-ccb7565bc-skc78 1/1 Running 0 2m37s % 6. %oc delete -n openshift-ingress-operator ingresscontroller sharded ingresscontroller.operator.openshift.io "sharded" deleted % 7. the "router-sharded" is removed from oauth-openshift route % oc get route -o json -n openshift-authentication oauth-openshift | jq '.status' { "ingress": [ { "conditions": [ { "lastTransitionTime": "2022-06-23T01:27:40Z", "status": "True", "type": "Admitted" } ], "host": "oauth-openshift.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerCanonicalHostname": "router-default.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerName": "default", "wildcardPolicy": "None" } ] } % 8. the "router-sharded" is removed from other routes, too % oc get route -o json -n openshift-console console | jq '.status' { "ingress": [ { "conditions": [ { "lastTransitionTime": "2022-06-23T01:27:40Z", "status": "True", "type": "Admitted" } ], "host": "console-openshift-console.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerCanonicalHostname": "router-default.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerName": "default", "wildcardPolicy": "None" } ] } % % oc get route -o json -n openshift-ingress-canary canary | jq '.status' { "ingress": [ { "conditions": [ { "lastTransitionTime": "2022-06-23T01:27:40Z", "status": "True", "type": "Admitted" } ], "host": "canary-openshift-ingress-canary.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerCanonicalHostname": "router-default.apps.shudi-411djp90.qe.gcp.devcluster.openshift.com", "routerName": "default", "wildcardPolicy": "None" } ] } % There is a follow-up fix in bug 2101878. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |