2073972 – Invalid retention time and invalid retention size should be validated at one place and have error log in one place for platform monitoring

Bug 2073972 - Invalid retention time and invalid retention size should be validated at one place and have error log in one place for platform monitoring

Summary: Invalid retention time and invalid retention size should be validated at one ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Jayapriya Pai
QA Contact:	hongyan li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-04-11 09:04 UTC by hongyan li
Modified:	2022-08-10 11:05 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 11:05:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 11:05:58 UTC

Description hongyan li 2022-04-11 09:04:11 UTC

Description of problem:
There is no error in prometheus-operator pod when giving retentionSize: 10 for platform prometheus

Version-Release number of selected component (if applicable):
4.11.0-0.ci.test-2022-04-11-075638-ci-ln-snr6hjb-latest
PR: https://github.com/openshift/cluster-monitoring-operator/pull/1579

How reproducible:
always

Steps to Reproduce:
% oc apply -f - <<EOF                                                     
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retentionSize: 10     
EOF
configmap/cluster-monitoring-config configured

% oc -n openshift-monitoring get pod|grep prometheus-operator         
prometheus-operator-59647ffb6-bs62l            2/2     Running   0          50m
% oc -n openshift-monitoring logs prometheus-operator-8bf74c8f4-fdd6t
------
level=info ts=2022-04-11T08:54:32.788126321Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2022-04-11T08:54:46.535507251Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2022-04-11T08:54:46.535532906Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2022-04-11T08:56:28.845792055Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2022-04-11T08:56:28.845787915Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2022-04-11T08:59:42.490011178Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2022-04-11T08:59:42.497123595Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"

Actual results:
There is no error in prometheus-operator pod when giving retentionSize: 10


Expected results:
There is error in prometheus-operator pod when giving retentionSize: 10

Actual results:


Expected results:


Additional info:
When give invalid value 10 for retention time, get error in prometheus-operator log

When retention has no unit, there is error in prometheus-operator log
oc apply -f - <<EOF                                                     
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 10
EOF

% oc -n openshift-monitoring logs prometheus-operator-8bf74c8f4-fdd6t
------
level=error ts=2022-04-11T08:43:15.144698192Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\""
level=info ts=2022-04-11T08:43:25.387317899Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=error ts=2022-04-11T08:43:25.660735179Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\""
level=info ts=2022-04-11T08:43:46.141826774Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=error ts=2022-04-11T08:43:46.389486077Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\""
level=info ts=2022-04-11T08:44:27.350255074Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=error ts=2022-04-11T08:44:27.561131773Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\""
level=info ts=2022-04-11T08:45:49.481959007Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=error ts=2022-04-11T08:45:49.716985292Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\""
level=info ts=2022-04-11T08:46:10.46710091Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"

Comment 1 hongyan li 2022-04-12 02:07:23 UTC

Retention time and retention size should be verified at the same place.

Currently retention time is validated with prometheus operator and retention size  is validated with CMO.

Comment 2 Jayapriya Pai 2022-04-12 08:18:59 UTC

We had added openapi validation at CRD level to retentionSize https://github.com/prometheus-operator/prometheus-operator/pull/4661 in prometheus-operator

That change is synced to cmo as well as part of kube-prometheus update https://github.com/openshift/cluster-monitoring-operator/pull/1615/files. So RetentionSize now is validated by the API server. If the value is invalid CMO throws an error when applying the Prometheus spec, since its rejected prometheus-operator never even gets to see the value hence no logging is done there.

Once https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged you will see same effect for retention as well which will log error to cmo log

Comment 3 hongyan li 2022-04-12 08:41:11 UTC

For validate of retention time will be updated to be consistent with retention size, so I changed target release.

Comment 4 hongyan li 2022-04-13 04:52:01 UTC

Test with ocp version 4.11.0-0.nightly-2022-04-12-072444
prometheus version 2.34.0
prometheus operator version 0.55.1

Comment 5 Jayapriya Pai 2022-04-18 04:45:28 UTC

https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged 
https://github.com/openshift/cluster-monitoring-operator/pull/1643 once jsonnet dependencies synced time interval fields in prometheus will be logged to cmo logs itself

Comment 6 Jayapriya Pai 2022-04-22 02:41:59 UTC

https://github.com/openshift/cluster-monitoring-operator/pull/1643 is merged which contains the retention time changes as well

Comment 8 hongyan li 2022-04-24 02:46:36 UTC

test with payload 
4.11.0-0.nightly-2022-04-23-153426
When give invalid retention or invalid rententionSize, both the error message is logged in CMO logs.

Comment 12 errata-xmlrpc 2022-08-10 11:05:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.