Bug 2073972 - Invalid retention time and invalid retention size should be validated at one place and have error log in one place for platform monitoring
Summary: Invalid retention time and invalid retention size should be validated at one ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.11
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.11.0
Assignee: Jayapriya Pai
QA Contact: hongyan li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-11 09:04 UTC by hongyan li
Modified: 2022-08-10 11:05 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:05:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:05:58 UTC

Description hongyan li 2022-04-11 09:04:11 UTC
Description of problem:
There is no error in prometheus-operator pod when giving retentionSize: 10 for platform prometheus

Version-Release number of selected component (if applicable):
4.11.0-0.ci.test-2022-04-11-075638-ci-ln-snr6hjb-latest
PR: https://github.com/openshift/cluster-monitoring-operator/pull/1579

How reproducible:
always

Steps to Reproduce:
% oc apply -f - <<EOF                                                     
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retentionSize: 10     
EOF
configmap/cluster-monitoring-config configured

% oc -n openshift-monitoring get pod|grep prometheus-operator         
prometheus-operator-59647ffb6-bs62l            2/2     Running   0          50m
% oc -n openshift-monitoring logs prometheus-operator-8bf74c8f4-fdd6t
------
level=info ts=2022-04-11T08:54:32.788126321Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2022-04-11T08:54:46.535507251Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2022-04-11T08:54:46.535532906Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2022-04-11T08:56:28.845792055Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2022-04-11T08:56:28.845787915Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2022-04-11T08:59:42.490011178Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2022-04-11T08:59:42.497123595Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"

Actual results:
There is no error in prometheus-operator pod when giving retentionSize: 10


Expected results:
There is error in prometheus-operator pod when giving retentionSize: 10

Actual results:


Expected results:


Additional info:
When give invalid value 10 for retention time, get error in prometheus-operator log

When retention has no unit, there is error in prometheus-operator log
oc apply -f - <<EOF                                                     
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 10
EOF

% oc -n openshift-monitoring logs prometheus-operator-8bf74c8f4-fdd6t
------
level=error ts=2022-04-11T08:43:15.144698192Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\""
level=info ts=2022-04-11T08:43:25.387317899Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=error ts=2022-04-11T08:43:25.660735179Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\""
level=info ts=2022-04-11T08:43:46.141826774Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=error ts=2022-04-11T08:43:46.389486077Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\""
level=info ts=2022-04-11T08:44:27.350255074Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=error ts=2022-04-11T08:44:27.561131773Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\""
level=info ts=2022-04-11T08:45:49.481959007Z caller=operator.go:1228 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=error ts=2022-04-11T08:45:49.716985292Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/k8s\" failed: creating config failed: generating config failed: invalid retention value specified: not a valid duration string: \"10\""
level=info ts=2022-04-11T08:46:10.46710091Z caller=operator.go:741 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"

Comment 1 hongyan li 2022-04-12 02:07:23 UTC
Retention time and retention size should be verified at the same place.

Currently retention time is validated with prometheus operator and retention size  is validated with CMO.

Comment 2 Jayapriya Pai 2022-04-12 08:18:59 UTC
We had added openapi validation at CRD level to retentionSize https://github.com/prometheus-operator/prometheus-operator/pull/4661 in prometheus-operator

That change is synced to cmo as well as part of kube-prometheus update https://github.com/openshift/cluster-monitoring-operator/pull/1615/files. So RetentionSize now is validated by the API server. If the value is invalid CMO throws an error when applying the Prometheus spec, since its rejected prometheus-operator never even gets to see the value hence no logging is done there.

Once https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged you will see same effect for retention as well which will log error to cmo log

Comment 3 hongyan li 2022-04-12 08:41:11 UTC
For validate of retention time will be updated to be consistent with retention size, so I changed target release.

Comment 4 hongyan li 2022-04-13 04:52:01 UTC
Test with ocp version 4.11.0-0.nightly-2022-04-12-072444
prometheus version 2.34.0
prometheus operator version 0.55.1

Comment 5 Jayapriya Pai 2022-04-18 04:45:28 UTC
https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged 
https://github.com/openshift/cluster-monitoring-operator/pull/1643 once jsonnet dependencies synced time interval fields in prometheus will be logged to cmo logs itself

Comment 6 Jayapriya Pai 2022-04-22 02:41:59 UTC
https://github.com/openshift/cluster-monitoring-operator/pull/1643 is merged which contains the retention time changes as well

Comment 8 hongyan li 2022-04-24 02:46:36 UTC
test with payload 
4.11.0-0.nightly-2022-04-23-153426
When give invalid retention or invalid rententionSize, both the error message is logged in CMO logs.

Comment 12 errata-xmlrpc 2022-08-10 11:05:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.