Created attachment 1878629 [details] access prometheus route now is 404 after upgrade to 4.11 Description of problem: upgrade cluster from 4.10.13 to 4.11.0-0.nightly-2022-05-10-045003 in a SNO cluster # oc image info registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-05-10-045003 Name: registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-05-10-045003 Digest: sha256:0f0789cbecc2598d71fc8ff6e17a0e61c4e2067414388e82cbb3aaab5e6b535f ... # oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:0f0789cbecc2598d71fc8ff6e17a0e61c4e2067414388e82cbb3aaab5e6b535f --force # oc get clusterversion -oyaml ... desired: image: registry.ci.openshift.org/ocp/release@sha256:0f0789cbecc2598d71fc8ff6e17a0e61c4e2067414388e82cbb3aaab5e6b535f version: 4.11.0-0.nightly-2022-05-10-045003 history: - completionTime: "2022-05-11T08:38:18Z" image: registry.ci.openshift.org/ocp/release@sha256:0f0789cbecc2598d71fc8ff6e17a0e61c4e2067414388e82cbb3aaab5e6b535f startedTime: "2022-05-11T07:29:25Z" state: Completed verified: true version: 4.11.0-0.nightly-2022-05-10-045003 - completionTime: "2022-05-11T04:31:56Z" image: quay.io/openshift-release-dev/ocp-release@sha256:4f516616baed3cf84585e753359f7ef2153ae139c2e80e0191902fbd073c4143 startedTime: "2022-05-11T03:53:47Z" state: Completed verified: false version: 4.10.13 observedGeneration: 10 since 4.11, prometheus-k8s route is update with "path: /api", but after upgrade, it keeps the same with 4.10, access prometheus route would navigate to https://${prometheus-k8s-route}/${console-route}/monitoring/graph and error is 404 page not found in 4.11, prometheus route should response with "Application is not available" # oc -n openshift-monitoring get route | grep prometheus-k8s prometheus-k8s prometheus-k8s-openshift-monitoring.apps.qe-upg511.qe.devcluster.openshift.com prometheus-k8s web reencrypt/Redirect None prometheus-k8s-federate prometheus-k8s-federate-openshift-monitoring.apps.qe-upg511.qe.devcluster.openshift.com /federate prometheus-k8s web reencrypt/Redirect None # oc -n openshift-monitoring get route prometheus-k8s -oyaml apiVersion: route.openshift.io/v1 kind: Route metadata: annotations: openshift.io/host.generated: "true" creationTimestamp: "2022-05-11T04:30:30Z" name: prometheus-k8s namespace: openshift-monitoring resourceVersion: "23931" uid: aa65b977-35e9-4e8f-9078-ab34397b0d89 spec: host: prometheus-k8s-openshift-monitoring.apps.qe-upg511.qe.devcluster.openshift.com port: targetPort: web tls: insecureEdgeTerminationPolicy: Redirect termination: reencrypt to: kind: Service name: prometheus-k8s weight: 100 wildcardPolicy: None status: ingress: - conditions: - lastTransitionTime: "2022-05-11T04:30:30Z" status: "True" type: Admitted host: prometheus-k8s-openshift-monitoring.apps.qe-upg511.qe.devcluster.openshift.com routerCanonicalHostname: router-default.apps.qe-upg511.qe.devcluster.openshift.com routerName: default this is the only issue, we still can access API with prometheus route, example # token=`oc sa get-token prometheus-k8s -n openshift-monitoring` # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.qe-upg511.qe.devcluster.openshift.com/api/v1/query?query=cluster_infrastructure_provider' | jq { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "cluster_infrastructure_provider", "container": "kube-apiserver-operator", "endpoint": "https", "instance": "10.128.1.6:8443", "job": "metrics", "namespace": "openshift-kube-apiserver-operator", "pod": "kube-apiserver-operator-698f86d976-wxx44", "service": "metrics", "type": "None" }, "value": [ 1652265840.76, "0" ] } ] } } Version-Release number of selected component (if applicable): upgrade cluster from 4.10.13 to 4.11.0-0.nightly-2022-05-10-045003 How reproducible: always Steps to Reproduce: 1. see the description 2. 3. Actual results: prometheus route is not updated to "path: /api" after upgrade from 4.10 to 4.11 Expected results: should be updated Additional info: other monitoring routes don't have such issue
in another IPI vSphere cluster, 3 masters/2 workers, upgraded from 4.8.29 -> 4.9.0-0.nightly-2022-05-11-100812 -> 4.10.0-0.nightly-2022-05-11-183751 -> 4.11.0-0.nightly-2022-05-11-054135, same issue with alertmanager-main/thanos-querier/thanos-ruler routes # oc get node NAME STATUS ROLES AGE VERSION juzhao-48-8msj7-master-0 Ready master 9h v1.23.3+69213f8 juzhao-48-8msj7-master-1 Ready master 9h v1.23.3+69213f8 juzhao-48-8msj7-master-2 Ready master 9h v1.23.3+69213f8 juzhao-48-8msj7-worker-66zr5 Ready worker 9h v1.23.3+69213f8 juzhao-48-8msj7-worker-mnqjb Ready worker 9h v1.23.3+69213f8 # oc -n openshift-monitoring get route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD alertmanager-main alertmanager-main-openshift-monitoring.apps.juzhao-48.qe.devcluster.openshift.com alertmanager-main web reencrypt/Redirect None prometheus-k8s prometheus-k8s-openshift-monitoring.apps.juzhao-48.qe.devcluster.openshift.com prometheus-k8s web reencrypt/Redirect None prometheus-k8s-federate prometheus-k8s-federate-openshift-monitoring.apps.juzhao-48.qe.devcluster.openshift.com /federate prometheus-k8s web reencrypt/Redirect None thanos-querier thanos-querier-openshift-monitoring.apps.juzhao-48.qe.devcluster.openshift.com thanos-querier web reencrypt/Redirect None # oc -n openshift-user-workload-monitoring get route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD federate federate-openshift-user-workload-monitoring.apps.juzhao-48.qe.devcluster.openshift.com /federate prometheus-user-workload federate reencrypt/Redirect None thanos-ruler thanos-ruler-openshift-user-workload-monitoring.apps.juzhao-48.qe.devcluster.openshift.com thanos-ruler web reencrypt/Redirect None
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069