Description of problem: ============================= Installed OCS 4.7 and since last few builds, seeing this message in the ocs-operator logs. 021-01-20T10:29:06.409838019Z {"level":"error","ts":1611138546.409633,"logger":"controllers.StorageCluster","msg":"prometheus rules file not found","error":"'/ocs-prometheus-rules/prometheus-ocs-rules.yaml' not found" 2021-01-20T10:29:06.409838019Z {"level":"error","ts":1611138546.4096823,"logger":"controllers.StorageCluster","msg":"unable to deploy Prometheus rules","error":"failed while creating PrometheusRule: expected pointer, but got nil" Version-Release number of selected component (if applicable): ============================================================== OCS 4.7.0-231.ci , 4.7.0-235.ci, etc OCP = 4.7.0-0.nightly-2021-01-19-095812 How reproducible: ================== Seen in all recent builds but not sure which feature/dashboard is impacted Steps to Reproduce: ====================== 1. Install OCS 4.7 2. Check the ocs-operator log 3. Actual results: ===================== The ocs-operator log is continuously logging the error messages pasted above Expected results: =================== There should not be any error message Additional info: =======================
After quick discussion with Umanga, it seems that this breaks all OCS Alerts (so that no such alert could be raised).
This is a valid problem that should be considered a blocker. Acking for OCS 4.7.
QE will check that there are no prometheus errors in operator logs and via regression testing that alerting works.
We might need to update the downstream Dockerfile to copy the prometheus rules. Do you regularly update the files in ./metrics/deploy/prometheus-ocs-rules-external.yaml ./metrics/deploy/prometheus-ocs-rules.yaml or do we need to generate them manually downstream? i.e. Can we directly use these files downstream? If they need to be generated, we will have to do some big changes to the way we build ocs-operator downstream. Please do provide the steps and how you generate these files if that is the case.
(In reply to Boris Ranto from comment #9) > We might need to update the downstream Dockerfile to copy the prometheus > rules. > > Do you regularly update the files in > > ./metrics/deploy/prometheus-ocs-rules-external.yaml > ./metrics/deploy/prometheus-ocs-rules.yaml > Yes these will be updated as required for each release. No need to generate anything.
In that case, this should be fixed by http://pkgs.devel.redhat.com/cgit/containers/ocs-operator/commit/Dockerfile?h=ocs-4.7-rhel-8&id=d0099ff5a24df54e011ccba415a0c925b12e1e76 If I understand this correctly, we didn't have to do this in OCS 4.6 since these yamls were somehow built-in in the source code/binary, right?
This should be fixed in the latest build: ocs-registry:4.7.0-241.ci
Moving the BZ to verified based on Comment#15 Also, verified that OCS, noobaa and ceph alerting rules exist in UI->Monitoring->Alerting->Alerting Rules.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041