Description of problem: The test failure in #9255 showed that there is some race already in WAL replay even without snapshots. The race can also be detected in release-2.29 branch. This PR shows a unit test that can reproduce it. Version-Release number of selected component (if applicable): 2.29.1 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Update to 2.29.2 carries the fix.
*** Bug 1999580 has been marked as a duplicate of this bug. ***
checked with 4.9.0-0.nightly-2021-08-31-123131, prometheus version is 2.29.2 # oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0 level=info ts=2021-09-01T01:16:01.399Z caller=main.go:445 msg="Starting Prometheus" version="(version=2.29.2, branch=rhaos-4.9-rhel-8, revision=99e16e81fcaee8ef609985f306aced9a465304ab)" level=info ts=2021-09-01T01:16:01.399Z caller=main.go:450 build_context="(go=go1.16.6, user=root@e5c9e3ac803a, date=20210831-09:46:00)" ... but the related resources version is still 2.29.1, example: # oc -n openshift-monitoring get prometheus k8s -oyaml ... labels: app.kubernetes.io/component: prometheus app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: openshift-monitoring app.kubernetes.io/version: 2.29.1 prometheus: k8s name: k8s
(In reply to Junqi Zhao from comment #6) > checked with 4.9.0-0.nightly-2021-08-31-123131, prometheus version is 2.29.2 > # oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0 > level=info ts=2021-09-01T01:16:01.399Z caller=main.go:445 msg="Starting > Prometheus" version="(version=2.29.2, branch=rhaos-4.9-rhel-8, > revision=99e16e81fcaee8ef609985f306aced9a465304ab)" > level=info ts=2021-09-01T01:16:01.399Z caller=main.go:450 > build_context="(go=go1.16.6, user=root@e5c9e3ac803a, date=20210831-09:46:00)" > ... > > but the related resources version is still 2.29.1, example: > # oc -n openshift-monitoring get prometheus k8s -oyaml > ... > labels: > app.kubernetes.io/component: prometheus > app.kubernetes.io/name: prometheus > app.kubernetes.io/part-of: openshift-monitoring > app.kubernetes.io/version: 2.29.1 > prometheus: k8s > name: k8s Yes that is expected. The CMO asset sync is in PR https://github.com/openshift/cluster-monitoring-operator/pull/1353 and should fix this. It is now also linked with this bug.
Test with payload 4.9.0-0.nightly-2021-09-05-192114 $ oc -n openshift-monitoring get prometheus k8s --show-labels NAME VERSION REPLICAS AGE LABELS k8s 2.29.2 2 96m app.kubernetes.io/component=prometheus,app.kubernetes.io/name=prometheus,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=2.29.2,prometheus=k8s $ oc -n openshift-monitoring get pod --show-labels | grep app=prometheus prometheus-k8s-0 7/7 Running 0 84m ......app.kubernetes.io/version=2.29.2,app=prometheus,......prometheus=k8s,statefulset.kubernetes.io/pod-name=prometheus-k8s-0 prometheus-k8s-1 7/7 Running 0 84m ......app.kubernetes.io/version=2.29.2,app=prometheus,......prometheus=k8s,statefulset.kubernetes.io/pod-name=prometheus-k8s-1 $ oc get clusterrolebinding prometheus-k8s -n openshift-monitoring --show-labels NAME ROLE AGE LABELS prometheus-k8s ClusterRole/prometheus-k8s 89m app.kubernetes.io/component=prometheus,app.kubernetes.io/name=prometheus,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=2.29.2 $ oc get clusterrole prometheus-k8s -n openshift-monitoring --show-labels NAME CREATED AT LABELS prometheus-k8s 2021-09-05T23:58:08Z app.kubernetes.io/component=prometheus,app.kubernetes.io/name=prometheus,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=2.29.2 $ oc -n openshift-user-workload-monitoring get prometheus user-workload --show-labels NAME VERSION REPLICAS AGE LABELS user-workload 2.29.2 2 5m29s app.kubernetes.io/component=prometheus,app.kubernetes.io/name=prometheus,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=2.29.2,prometheus=user-workload $ oc -n openshift-user-workload-monitoring logs prometheus-user-workload-0 level=info ts=2021-09-06T01:34:00.770Z caller=main.go:445 msg="Starting Prometheus" version="(version=2.29.2, branch=rhaos-4.9-rhel-8, revision=99e16e81fcaee8ef609985f306aced9a465304ab)"
Complete Prometheus regression test, no issue
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759