Bug 1924185

Summary: Object Service Dashboard shows alerts related to "system-internal-storage-pool" in OCS 4.7
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Neha Berry <nberry>
Component: Multi-Cloud Object GatewayAssignee: Jacky Albo <jalbo>
Status: CLOSED ERRATA QA Contact: Neha Berry <nberry>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.7CC: ebenahar, etamir, mbukatov, muagarwa, nbecker, ocs-bugs
Target Milestone: ---   
Target Release: OCS 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.7.0-277.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-19 09:18:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 4 Martin Bukatovic 2021-02-05 00:31:39 UTC
I see this on a cloud platform (GCP) as well.

I also noticed suspicious events openshift-storage namespace, out of which some could be related (I'm listing these here so that we can locate this bug via bugzilla search):

```
$ oc get events -n openshift-storage | grep Warning | grep -i noobaa
146m        Warning   ProvisioningFailed             persistentvolumeclaim/db-noobaa-db-pg-0                                           failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = Internal desc = pool not found: pool (ocs-storagecluster-cephblockpool) not found in Ceph cluster
146m        Warning   FailedScheduling               pod/noobaa-db-pg-0                                                                0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
146m        Warning   FailedScheduling               pod/noobaa-db-pg-0                                                                0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
46s         Warning   BackingStorePhaseRejected      backingstore/noobaa-default-backing-store                                         Backing store mode: ALL_NODES_OFFLINE
28s         Warning   RejectedBackingStore           bucketclass/noobaa-default-bucket-class                                           NooBaa BackingStore "noobaa-default-backing-store" is in rejected phase
145m        Warning   Unhealthy                      pod/noobaa-endpoint-58656b7c8d-rz7sg                                              Readiness probe failed: dial tcp 10.129.2.33:6001: connect: connection refused
144m        Warning   FailedGetResourceMetric        horizontalpodautoscaler/noobaa-endpoint                                           failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
144m        Warning   FailedComputeMetricsReplicas   horizontalpodautoscaler/noobaa-endpoint                                           invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
141m        Warning   FailedGetResourceMetric        horizontalpodautoscaler/noobaa-endpoint                                           failed to get cpu utilization: did not receive metrics for any ready pods
142m        Warning   FailedComputeMetricsReplicas   horizontalpodautoscaler/noobaa-endpoint                                           invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods
```

And I also see that after a while, backing stores are switching between ready and rejected state:

```
$ noobaa -n openshift-storage backingstore list
NAME                           TYPE                   TARGET-BUCKET           PHASE      AGE        
bz1874367backingstore          google-cloud-storage   noobaabz1874367bucket   Rejected   1h2m27s    
noobaa-default-backing-store   google-cloud-storage   noobaabucketrxnty       Rejected   2h29m54s   
$ noobaa -n openshift-storage backingstore list
NAME                           TYPE                   TARGET-BUCKET           PHASE   AGE        
bz1874367backingstore          google-cloud-storage   noobaabz1874367bucket   Ready   1h2m44s    
noobaa-default-backing-store   google-cloud-storage   noobaabucketrxnty       Ready   2h30m11s   
```

Comment 5 Martin Bukatovic 2021-02-05 00:42:51 UTC
To avoid confusing, these "failed to get cpu utilization" and "invalid metrics" events from comment 4 are not related, it's known BZ 1885524.

Comment 11 errata-xmlrpc 2021-05-19 09:18:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041