Description of problem: There is no error in prometheus-operator pod when giving retentionSize: 10 for platform prometheus Version-Release number of selected component (if applicable): 4.11.0-0.ci.test-2022-04-11-075638-ci-ln-snr6hjb-latest PR: https://github.com/openshift/cluster-monitoring-operator/pull/1579 How reproducible: always Steps to Reproduce: % oc apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: retentionSize: 10 EOF configmap/cluster-monitoring-config configured % oc -n openshift-monitoring get pod|grep prometheus-operator prometheus-operator-59647ffb6-bs62l 2/2 Running 0 50m % oc -n openshift-monitoring logs prometheus-operator-8bf74c8f4-fdd6t ------ level=info ts=2022-04-11T08:54:32.788126321Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager" level=info ts=2022-04-11T08:54:46.535507251Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager" level=info ts=2022-04-11T08:54:46.535532906Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus" level=info ts=2022-04-11T08:56:28.845792055Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager" level=info ts=2022-04-11T08:56:28.845787915Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus" level=info ts=2022-04-11T08:59:42.490011178Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus" level=info ts=2022-04-11T08:59:42.497123595Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager" Actual results: There is no error in prometheus-operator pod when giving retentionSize: 10 Expected results: There is error in prometheus-operator pod when giving retentionSize: 10 Actual results: Expected results: Additional info: When give invalid value 10 for retention time, get error in prometheus-operator log When retention has no unit, there is error in prometheus-operator log oc apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: retention: 10 EOF % oc -n openshift-monitoring logs prometheus-operator-8bf74c8f4-fdd6t ------ level=error ts=2022-04-11T08:43:15.144698192Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\"" level=info ts=2022-04-11T08:43:25.387317899Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus" level=error ts=2022-04-11T08:43:25.660735179Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\"" level=info ts=2022-04-11T08:43:46.141826774Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus" level=error ts=2022-04-11T08:43:46.389486077Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\"" level=info ts=2022-04-11T08:44:27.350255074Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus" level=error ts=2022-04-11T08:44:27.561131773Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\"" level=info ts=2022-04-11T08:45:49.481959007Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus" level=error ts=2022-04-11T08:45:49.716985292Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\"" level=info ts=2022-04-11T08:46:10.46710091Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
Retention time and retention size should be verified at the same place. Currently retention time is validated with prometheus operator and retention size is validated with CMO.
We had added openapi validation at CRD level to retentionSize https://github.com/prometheus-operator/prometheus-operator/pull/4661 in prometheus-operator That change is synced to cmo as well as part of kube-prometheus update https://github.com/openshift/cluster-monitoring-operator/pull/1615/files. So RetentionSize now is validated by the API server. If the value is invalid CMO throws an error when applying the Prometheus spec, since its rejected prometheus-operator never even gets to see the value hence no logging is done there. Once https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged you will see same effect for retention as well which will log error to cmo log
For validate of retention time will be updated to be consistent with retention size, so I changed target release.
Test with ocp version 4.11.0-0.nightly-2022-04-12-072444 prometheus version 2.34.0 prometheus operator version 0.55.1
https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged https://github.com/openshift/cluster-monitoring-operator/pull/1643 once jsonnet dependencies synced time interval fields in prometheus will be logged to cmo logs itself
https://github.com/openshift/cluster-monitoring-operator/pull/1643 is merged which contains the retention time changes as well
test with payload 4.11.0-0.nightly-2022-04-23-153426 When give invalid retention or invalid rententionSize, both the error message is logged in CMO logs.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069