Description of problem: While checking the behaviour of recently modified alert "KubeDeploymentReplicasMismatch" in CI, it appears the alert is firing because thanos-query probes are timing out: https://search.ci.openshift.org/chart?search=alert+KubeDeploymentReplicasMismatch+fired&maxAge=24h&type=junit An example job can be investigated here: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/1148/pull-ci-openshift-cluster-network-operator-master-e2e-azure-ovn/1413425417573896192/artifacts/e2e-azure-ovn/gather-must-gather/artifacts/event-filter.html Version-Release number of selected component (if applicable): How reproducible: This alert is currently firing in about 1% of CI failures and the majority of these are thanos related which on investigation are the probe issues each time Expected results: Thanos probes do no timeout and hence this alert, or others do not fire. Additional info: Probes were removed for some other components and there is some discourse here https://github.com/prometheus-operator/prometheus-operator/pull/3502
*** Bug 1982757 has been marked as a duplicate of this bug. ***
searched the CI results https://search.ci.openshift.org/?search=KubeDeploymentGenerationMismatch&maxAge=336h&context=1&type=all&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job did not see KubeDeploymentReplicasMismatch alert for thanos-querier # oc -n openshift-monitoring get deploy thanos-querier -oyaml ... livenessProbe: failureThreshold: 4 httpGet: path: /-/healthy port: 9091 scheme: HTTPS initialDelaySeconds: 5 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 1 name: oauth-proxy ports: - containerPort: 9091 name: web protocol: TCP readinessProbe: failureThreshold: 20 httpGet: path: /-/ready port: 9091 scheme: HTTPS initialDelaySeconds: 5 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 1
*** Bug 1976940 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759