Bug 2073937
| Summary: | Invalid retention time and invalid retention size should be validated at one place and have error log in one place for UMW | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | hongyan li <hongyli> |
| Component: | Monitoring | Assignee: | Jayapriya Pai <janantha> |
| Status: | CLOSED ERRATA | QA Contact: | hongyan li <hongyli> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 4.11 | CC: | amuller, anpicker, aos-bugs, spasquie |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 11:05:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
hongyan li
2022-04-11 08:04:01 UTC
For latest ocp 4.11 without your pr, given invalid retention time value, will get error in the log W0411 09:29:18.951211 1 tasks.go:71] task 5 of 14: Updating Prometheus-user-workload failed: reconciling UserWorkload Prometheus object failed: updating Prometheus object failed: Prometheus.monitoring.coreos.com "user-workload" is invalid: spec.retentionSize: Invalid value: "10": spec.retentionSize in body should match '(^0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$' I see error in cluster-monitoring-operator logs Retention time and retention size should be verified at the same place. We had added openapi validation at CRD level to retentionSize https://github.com/prometheus-operator/prometheus-operator/pull/4661 in prometheus-operator That change is synced to cmo as well as part of kube-prometheus update https://github.com/openshift/cluster-monitoring-operator/pull/1615/files. So RetentionSize now is validated by the API server. If the value is invalid CMO throws an error when applying the Prometheus spec, since its rejected prometheus-operator never even gets to see the value hence no logging is done there. Once https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged you will see same effect for retention as well which will log error to cmo log For validate of retention time will be updated to be consistent with retention size, so I changed target release. Test with ocp version 4.11.0-0.nightly-2022-04-12-072444 prometheus version 2.34.0 prometheus operator version 0.55.1 https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged https://github.com/openshift/cluster-monitoring-operator/pull/1643 once jsonnet dependencies synced time interval fields in prometheus will be logged to cmo logs itself https://github.com/openshift/cluster-monitoring-operator/pull/1643 is merged which contains the retention time changes as well Test with payload 4.11.0-0.nightly-2022-04-23-153426 When give invalid retention time and invalid retention size, error information of both is logged in CMO % oc -n openshift-monitoring logs cluster-monitoring-operator-bdb67f748-fwznk cluster-monitoring-operator .... W0424 02:55:21.677523 1 tasks.go:71] task 5 of 14: Updating Prometheus-user-workload failed: reconciling UserWorkload Prometheus object failed: updating Prometheus object failed: Prometheus.monitoring.coreos.com "user-workload" is invalid: [spec.retention: Invalid value: "10": spec.retention in body should match '^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$', spec.retentionSize: Invalid value: "10": spec.retentionSize in body should match '(^0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$'] I0424 02:55:21.693870 1 tasks.go:74] ran task 10 of 14: Updating prometheus-adapter Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |