Description of problem (please be detailed as possible and provide log snippests): ----------------------------------------------------- After deployment failures since some time, the latest stable build of OCS which passed deployment is 4.7.0-228.ci. Installed OCS 4.7.0-228.ci on a vmware dynamic cluster and it is seen that the Object Service Dashboard and no metrics can be viewed. Also, since almost the beginning of OCS 4.7, we are seeing the following error message in the ocs-operator logs, but due to unsuccessful deployments, we could never check the dashboard before: 2021-01-07T14:50:39.427852375Z {"level":"error","ts":1610031039.427809,"logger":"controllers.StorageCluster","msg":"failed to reconcile metrics exporter","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","error":"failed to retrieve metrics exporter servicemonitor openshift-storage/ocs-metrics-exporter. no kind is registered for the type v1.ServiceMonitor in scheme \"pkg/runtime/scheme.go:101\" Note: Due to clock skew in mons, the cluster is in health warn state since the beginning, but that should not be the cause of the issue Version of all relevant components (if applicable): ===================================================== OCP = 4.7.0-0.nightly-2021-01-07-034013 OCS = ocs-operator.v4.7.0-228.ci and ocs-operator.v4.7.0-229.ci too Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? ==================================================================== Yes no MCG or RGW metrics are available in the dashboard Is there any workaround available to the best of your knowledge? ============================================================ No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? ============================ Can this issue reproducible? ============================= Yes observed on 2 clusters 1. Vmware dynamic - OCs build 4.7.0-228.ci 2. Vmware LSO with arbiter enabled - OCS build 4.7.0-229.ci Can this issue reproduce from the UI? ===================================== OCS was installed via UI. If this is a regression, please provide more details to justify this: ===================================================== Yes Steps to Reproduce: ======================= 1. Install latest OCP 4.7 nightly build 2. For vmware dynamic, Install OCS operator , here 4.7.0-228.ci. The operator pods are created (though the pods resppinned a couple of time- bug 1909268) 3. Install Storagecluster a) the namespace openshift-storage gets labelled with [openshift.io/cluster-monitoring: "true"] once storagecluster creation starts b) All pods are UP. But due to NTP issue, ceph was in health warn with clock skew. See next comment. 4. Logged into UI and checked the dashboards. Actual results: =================== The Object Service dashboard is not displaying any metrics Expected results: ===================== The Object Service dashboard should display some status and information for both MCG and RGW.
I reproduced the issue with ocs-operator.v4.7.0-228.ci, where I see 2 TargetDown alerts (100% of the noobaa-mgmt/noobaa-mgmt targets in openshift-storage namespace are down, 100% of the s3/s3 targets in openshift-storage namespace are down), no NooBaa metrics (I checked NooBaa_bucket_status) can be queried via OCP Prometheus. Object dashboard is empty. With 4.7.0-249.ci, there are no such TargetDown alerts, and NooBaa_bucket_status metric is present in OCP Prometheus. There is no delay comapred to other ceph metrics. Object dashboard reports data. Verified
*** Bug 1919385 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041