Bug 1918938 - ocs-operator has Error logs with "unable to deploy Prometheus rules"
Summary: ocs-operator has Error logs with "unable to deploy Prometheus rules"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: OCS 4.7.0
Assignee: umanga
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-21 18:10 UTC by Neha Berry
Modified: 2021-05-19 09:18 UTC (History)
8 users (show)

Fixed In Version: ocs-registry:4.7.0-241.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-19 09:18:16 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:2041 0 None None None 2021-05-19 09:18:43 UTC

Description Neha Berry 2021-01-21 18:10:15 UTC
Description of problem:
=============================
Installed OCS 4.7 and since last few builds, seeing this message in the ocs-operator logs. 

021-01-20T10:29:06.409838019Z {"level":"error","ts":1611138546.409633,"logger":"controllers.StorageCluster","msg":"prometheus rules file not found","error":"'/ocs-prometheus-rules/prometheus-ocs-rules.yaml' not found"

2021-01-20T10:29:06.409838019Z {"level":"error","ts":1611138546.4096823,"logger":"controllers.StorageCluster","msg":"unable to deploy Prometheus rules","error":"failed while creating PrometheusRule: expected pointer, but got nil"



Version-Release number of selected component (if applicable):
==============================================================
OCS 4.7.0-231.ci , 4.7.0-235.ci, etc

OCP  = 4.7.0-0.nightly-2021-01-19-095812

How reproducible:
==================
Seen in all recent builds but not sure which feature/dashboard is impacted

Steps to Reproduce:
======================
1. Install OCS 4.7 
2. Check the ocs-operator log
3.

Actual results:
=====================
The ocs-operator log is continuously logging the error messages pasted above

Expected results:
===================
There should not be any error message


Additional info:
=======================

Comment 4 Martin Bukatovic 2021-01-22 10:43:21 UTC
After quick discussion with Umanga, it seems that this breaks all OCS Alerts (so that no such alert could be raised).

Comment 5 Jose A. Rivera 2021-01-25 15:18:25 UTC
This is a valid problem that should be considered a blocker. Acking for OCS 4.7.

Comment 6 Martin Bukatovic 2021-01-25 16:59:00 UTC
QE will check that there are no prometheus errors in operator logs and via regression testing that alerting works.

Comment 9 Boris Ranto 2021-01-27 11:52:57 UTC
We might need to update the downstream Dockerfile to copy the prometheus rules.

Do you regularly update the files in

./metrics/deploy/prometheus-ocs-rules-external.yaml
./metrics/deploy/prometheus-ocs-rules.yaml

or do we need to generate them manually downstream? i.e. Can we directly use these files downstream?

If they need to be generated, we will have to do some big changes to the way we build ocs-operator downstream. Please do provide the steps and how you generate these files if that is the case.

Comment 10 umanga 2021-01-27 15:52:22 UTC
(In reply to Boris Ranto from comment #9)
> We might need to update the downstream Dockerfile to copy the prometheus
> rules.
> 
> Do you regularly update the files in
> 
> ./metrics/deploy/prometheus-ocs-rules-external.yaml
> ./metrics/deploy/prometheus-ocs-rules.yaml
> 

Yes these will be updated as required for each release. No need to generate anything.

Comment 11 Boris Ranto 2021-01-27 16:16:16 UTC
In that case, this should be fixed by

http://pkgs.devel.redhat.com/cgit/containers/ocs-operator/commit/Dockerfile?h=ocs-4.7-rhel-8&id=d0099ff5a24df54e011ccba415a0c925b12e1e76

If I understand this correctly, we didn't have to do this in OCS 4.6 since these yamls were somehow built-in in the source code/binary, right?

Comment 12 Boris Ranto 2021-01-28 02:22:49 UTC
This should be fixed in the latest build:

ocs-registry:4.7.0-241.ci

Comment 17 Neha Berry 2021-02-02 08:13:50 UTC
Moving the BZ to verified based on Comment#15

Also, verified that OCS, noobaa and ceph alerting rules exist in UI->Monitoring->Alerting->Alerting Rules.

Comment 20 errata-xmlrpc 2021-05-19 09:18:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041


Note You need to log in before you can comment on or make changes to this bug.