1924185 – Object Service Dashboard shows alerts related to "system-internal-storage-pool" in OCS 4.7

Bug 1924185 - Object Service Dashboard shows alerts related to "system-internal-storage-pool" in OCS 4.7

Summary: Object Service Dashboard shows alerts related to "system-internal-storage-poo...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	OCS 4.7.0
Assignee:	Jacky Albo
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-02 18:24 UTC by Neha Berry
Modified:	2021-05-19 09:19 UTC (History)
CC List:	6 users (show)
Fixed In Version:	v4.7.0-277.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-19 09:18:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	noobaa noobaa-core pull 6388	0	None	open	Backport to 5.7: Mongo pool to be removed from cloud_pool_stats	2021-02-24 07:20:49 UTC
Red Hat Product Errata	RHSA-2021:2041	0	None	None	None	2021-05-19 09:19:51 UTC

Comment 4 Martin Bukatovic 2021-02-05 00:31:39 UTC

I see this on a cloud platform (GCP) as well.

I also noticed suspicious events openshift-storage namespace, out of which some could be related (I'm listing these here so that we can locate this bug via bugzilla search):

```
$ oc get events -n openshift-storage | grep Warning | grep -i noobaa
146m        Warning   ProvisioningFailed             persistentvolumeclaim/db-noobaa-db-pg-0                                           failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = Internal desc = pool not found: pool (ocs-storagecluster-cephblockpool) not found in Ceph cluster
146m        Warning   FailedScheduling               pod/noobaa-db-pg-0                                                                0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
146m        Warning   FailedScheduling               pod/noobaa-db-pg-0                                                                0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
46s         Warning   BackingStorePhaseRejected      backingstore/noobaa-default-backing-store                                         Backing store mode: ALL_NODES_OFFLINE
28s         Warning   RejectedBackingStore           bucketclass/noobaa-default-bucket-class                                           NooBaa BackingStore "noobaa-default-backing-store" is in rejected phase
145m        Warning   Unhealthy                      pod/noobaa-endpoint-58656b7c8d-rz7sg                                              Readiness probe failed: dial tcp 10.129.2.33:6001: connect: connection refused
144m        Warning   FailedGetResourceMetric        horizontalpodautoscaler/noobaa-endpoint                                           failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
144m        Warning   FailedComputeMetricsReplicas   horizontalpodautoscaler/noobaa-endpoint                                           invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
141m        Warning   FailedGetResourceMetric        horizontalpodautoscaler/noobaa-endpoint                                           failed to get cpu utilization: did not receive metrics for any ready pods
142m        Warning   FailedComputeMetricsReplicas   horizontalpodautoscaler/noobaa-endpoint                                           invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods
```

And I also see that after a while, backing stores are switching between ready and rejected state:

```
$ noobaa -n openshift-storage backingstore list
NAME                           TYPE                   TARGET-BUCKET           PHASE      AGE        
bz1874367backingstore          google-cloud-storage   noobaabz1874367bucket   Rejected   1h2m27s    
noobaa-default-backing-store   google-cloud-storage   noobaabucketrxnty       Rejected   2h29m54s   
$ noobaa -n openshift-storage backingstore list
NAME                           TYPE                   TARGET-BUCKET           PHASE   AGE        
bz1874367backingstore          google-cloud-storage   noobaabz1874367bucket   Ready   1h2m44s    
noobaa-default-backing-store   google-cloud-storage   noobaabucketrxnty       Ready   2h30m11s   
```

Comment 5 Martin Bukatovic 2021-02-05 00:42:51 UTC

To avoid confusing, these "failed to get cpu utilization" and "invalid metrics" events from comment 4 are not related, it's known BZ 1885524.

Comment 11 errata-xmlrpc 2021-05-19 09:18:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041

Note You need to log in before you can comment on or make changes to this bug.