Bug 1876633 - Attempt to remove pv-pool based noobaa-default-backing-store fails and makes this pool stuck in Rejected state - CLI Flow only
Summary: Attempt to remove pv-pool based noobaa-default-backing-store fails and makes ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: OCS 4.8.0
Assignee: Nimrod Becker
QA Contact: Ben Eli
URL:
Whiteboard:
Depends On: 1966231
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-07 18:27 UTC by Martin Bukatovic
Modified: 2021-08-03 18:19 UTC (History)
7 users (show)

Fixed In Version: v4.8.0-404.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-03 18:19:42 UTC
Embargoed:


Attachments (Terms of Use)
screenshot #1: list of backing stores, with noobaa-default-backing-store in Rejected state (199.19 KB, image/png)
2020-09-07 18:27 UTC, Martin Bukatovic
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-operator pull 179 0 None closed Fixing BZ issues - 1781287 and 1779732 2021-06-01 07:20:43 UTC
Red Hat Product Errata RHBA-2021:3002 0 None None None 2021-08-03 18:19:44 UTC

Description Martin Bukatovic 2020-09-07 18:27:50 UTC
Created attachment 1713994 [details]
screenshot #1: list of backing stores, with noobaa-default-backing-store in Rejected state

Description of problem
======================

When one tries to remove BackingStore hosted on a pv-pool, such attempt fails
on:

```
DeletePoolAPI cannot complete because pool "noobaa-default-backing-store" has buckets attached
```

With the backingstore stuck in Rejected state.

Version-Release number of selected component
============================================

OCP 4.5.0-0.ci-2020-09-04-161649
OCS 4.5.0-546.ci

Full version report
-------------------

cluster channel: stable-4.5
cluster version: 4.5.0-0.ci-2020-09-04-161649
cluster image: registry.svc.ci.openshift.org/ocp/release@sha256:2664dc11a62754b04abb8d9bb67113af87084c31dba7495cd65f18e6a3a9a507

storage namespace openshift-cluster-storage-operator
image registry.svc.ci.openshift.org/ocp/4.5-2020-09-04-161649@sha256:4dbc8466b0f384078286dc9462bf60e4185ac30fe434d04a6988e3bfbb0a605b
 * registry.svc.ci.openshift.org/ocp/4.5-2020-09-04-161649@sha256:4dbc8466b0f384078286dc9462bf60e4185ac30fe434d04a6988e3bfbb0a605b
image registry.svc.ci.openshift.org/ocp/4.5-2020-09-04-161649@sha256:e69949554a71f58b02984e14eecbad4a79812c7295cac1dda1382751b5bf818f
 * registry.svc.ci.openshift.org/ocp/4.5-2020-09-04-161649@sha256:e69949554a71f58b02984e14eecbad4a79812c7295cac1dda1382751b5bf818f
image registry.svc.ci.openshift.org/ocp/4.5-2020-09-04-161649@sha256:29f8813cfe44206c2d66311e33075afb210f923fb6bf2c9f8f49e88c0b6e6da6
 * registry.svc.ci.openshift.org/ocp/4.5-2020-09-04-161649@sha256:29f8813cfe44206c2d66311e33075afb210f923fb6bf2c9f8f49e88c0b6e6da6

storage namespace openshift-kube-storage-version-migrator
image registry.svc.ci.openshift.org/ocp/4.5-2020-09-04-161649@sha256:ead86cb945a675f34fef0fc468a1b76fe2c26302c8fd9f830c00ead7852596c9
 * registry.svc.ci.openshift.org/ocp/4.5-2020-09-04-161649@sha256:ead86cb945a675f34fef0fc468a1b76fe2c26302c8fd9f830c00ead7852596c9

storage namespace openshift-kube-storage-version-migrator-operator
image registry.svc.ci.openshift.org/ocp/4.5-2020-09-04-161649@sha256:52a9da86b9b533da8355e6ef2183e83f12413596abfe1984ff903360eb01dfbe
 * registry.svc.ci.openshift.org/ocp/4.5-2020-09-04-161649@sha256:52a9da86b9b533da8355e6ef2183e83f12413596abfe1984ff903360eb01dfbe

storage namespace openshift-storage
image quay.io/rhceph-dev/cephcsi@sha256:fa003ab56d59653b4143cd78997677854d227f38492e4f3ddbcd0a6262381494
 * quay.io/rhceph-dev/cephcsi@sha256:a2e07bceff940cac650abca0f5eff13933fd8d147c6585d80bcb901f038614f3
image registry.redhat.io/openshift4/ose-csi-driver-registrar@sha256:424566e4a110a9fc5964f87e22ee6f87a00db2b00636780461bc5e234ca4e6e7
 * registry.redhat.io/openshift4/ose-csi-driver-registrar@sha256:424566e4a110a9fc5964f87e22ee6f87a00db2b00636780461bc5e234ca4e6e7
image registry.redhat.io/openshift4/ose-csi-external-attacher@sha256:2d89d6d1b9bc0f8a3603bb68859617cbc9a04584933f5534362ca68ab590e8ce
 * registry.redhat.io/openshift4/ose-csi-external-attacher@sha256:27945464c6dd60bde78052294d456d0a1d6978ec095dfc5d14660b6fb2c0b532
image registry.redhat.io/openshift4/ose-csi-external-provisioner-rhel7@sha256:5d1a62e07f844d6c55a4983aeb1326c474622b0a5c886e897d1441b3ba74daa6
 * registry.redhat.io/openshift4/ose-csi-external-provisioner-rhel7@sha256:2a0b6ed5bc6ee19f05b4d312a9017c831b280933aa8d2db5f9216139512f44ae
image registry.redhat.io/openshift4/ose-csi-external-resizer-rhel7@sha256:279a12fb2095c7c7f7429135317c53a3f821d6a5a7b89b2f49fc4f84d5cfba42
 * registry.redhat.io/openshift4/ose-csi-external-resizer-rhel7@sha256:279a12fb2095c7c7f7429135317c53a3f821d6a5a7b89b2f49fc4f84d5cfba42
image quay.io/rhceph-dev/mcg-core@sha256:fa9ab8465d698823e8eaa41f12384fb420819aa6c0849cbadda5e14d46495d27
 * quay.io/rhceph-dev/mcg-core@sha256:e421dbc06483936da690d775b7eeef157d52336559179ace828a8e31a91d9115
image registry.redhat.io/rhscl/mongodb-36-rhel7@sha256:ba74027bb4b244df0b0823ee29aa927d729da33edaa20ebdf51a2430cc6b4e95
 * registry.redhat.io/rhscl/mongodb-36-rhel7@sha256:ba74027bb4b244df0b0823ee29aa927d729da33edaa20ebdf51a2430cc6b4e95
image quay.io/rhceph-dev/mcg-operator@sha256:b38f1b44077e2f39ccc32486d517cbf0ccbeda93d3895a693f853756b6e0d6c0
 * quay.io/rhceph-dev/mcg-operator@sha256:2cd43f974f68654a7d7380f802f63ebd5ea1963d8b097e3f728da57bdcc04861
image quay.io/rhceph-dev/ocs-operator@sha256:587772f3d8fa2712c89d80ba2af62b2eaa743eb0048368ad8c4ba1c9fb2f30b1
 * quay.io/rhceph-dev/ocs-operator@sha256:587772f3d8fa2712c89d80ba2af62b2eaa743eb0048368ad8c4ba1c9fb2f30b1
image quay.io/rhceph-dev/rhceph@sha256:eafd1acb0ada5d7cf93699056118aca19ed7a22e4938411d307ef94048746cc8
 * quay.io/rhceph-dev/rhceph@sha256:3def885ad9e8440c5bd6d5c830dafdd59edf9c9e8cce0042b0f44a5396b5b0f6
image quay.io/rhceph-dev/rook-ceph@sha256:a31d1ad207868fd9e4aa98133fd6c3b8addf03b5aefa837143c758abd3b033b6
 * quay.io/rhceph-dev/rook-ceph@sha256:071e020ab4b628a86275869dd5e45ddc15e7ca531eedb318d7da9e450cdf6ed7

How reproducible
================

1/1

Steps to Reproduce
==================

1. Install OCP/OCS cluster on GCP (this is the most simple way to get pv-pool
   backing store out of the box)
2. Go to OCP Console, navigate to list of backing stores
3. Locate noobaa-default-backing-store and use 3-dots menu to delete it

Actual results
==============

The noobaa-default-backing-store is not deleted, because of the following
error:

```
DeletePoolAPI cannot complete because pool "noobaa-default-backing-store" has buckets attached
```

And remains in Rejected state.

The same state is visible via noobaa cli (so it's not reporting problem in OCP
Console):

```
$ noobaa backingstore list -n openshift-storage
NAME                           TYPE                   TARGET-BUCKET                                PHASE      AGE
gcp-backing-store              google-cloud-storage   mbukatov-2020-09-07-noobaa-backing-store-1   Ready      46m23s   
noobaa-default-backing-store   pv-pool                                                             Rejected   59m43s
```

Expected results
================

The noobaa-default-backing-store is not deleted, because of the following
error:

```
DeletePoolAPI cannot complete because pool "noobaa-default-backing-store" has buckets attached
```

The removal is not even started and so the backing store remains in Ready
state.

Additional info
===============

This is problematic because the removal got stuck somewhere in between, leaving
the customer to figure out what to do now with such rejected backing store.

The check which disallows the removal should either pass and then the backing
store is deleted, or fail and then the removal is not started at all.

Comment 2 Martin Bukatovic 2020-09-07 18:41:54 UTC
Additional info
===============

Output of `noobaa backingstore status` command for the Rejected backing store:

```
$ noobaa backingstore status noobaa-default-backing-store -n openshift-storage
INFO[0000] ✅ Exists: BackingStore "noobaa-default-backing-store" 
ERRO[0000] ❌ BackingStore "noobaa-default-backing-store" Phase is "Rejected": ResourceInUse DeletePoolAPI cannot complete because pool "noobaa-default-backing-store" has buckets attached 

# BackingStore spec:
pvPool:
  numVolumes: 1
  resources:
    requests:
      storage: 50Gi
type: pv-pool

```

Comment 4 Nimrod Becker 2020-09-08 05:53:02 UTC
As the error states, you cannot delete the pool because there are buckets attached to it and no other resources in those buckets policies.

Would you rather the pool was deleted and all the data was lost ?
Also, we don't allow the deletion of the last BS, do you have more BS in the system?

Sounds like works as designed and not a bug (don't allow to delete the last BS and don't allow to delete a BS and get data lost if there's nowhere to move it to).

Comment 5 Martin Bukatovic 2020-09-08 09:18:36 UTC
(In reply to Nimrod Becker from comment #4)
> As the error states, you cannot delete the pool because there are buckets
> attached to it and no other resources in those buckets policies.
> 
> Would you rather the pool was deleted and all the data was lost ?
> Also, we don't allow the deletion of the last BS, do you have more BS in the
> system?

I'm not saying that the BS should have been deleted. But I'm not ok with the
fact that when check preventing data loss fails, the deletion goes on and
fails in some middle state, leaving the BS in Rejected state.

Yes, I had multiple BS in the system, as can be seen in the screenshot.

Btw both information were already provided in the bug report.

> Sounds like works as designed and not a bug (don't allow to delete the last
> BS and don't allow to delete a BS and get data lost if there's nowhere to
> move it to).

I disagree. And I already explained in the original bug report why it is.

Comment 6 Nimrod Becker 2020-09-08 09:25:17 UTC
What would you expect to happen when the user deletes via the oc command ?

you cannot present him from deleting anything.
The CRD can be either rejected or bound.
leaving it on bound would seem like an error since "nothing happened" form the user perspective after he tried to delete

Comment 9 Nimrod Becker 2020-09-30 12:18:31 UTC
Since there was not reponse on the mail, probably since there is no solution. 
Closing this issue.

Comment 10 Martin Bukatovic 2020-10-02 16:27:40 UTC
(In reply to Nimrod Becker from comment #9)
> Since there was not reponse on the mail, probably since there is no
> solution. 

I don't understand what you mean. I actually see that:

- this is not reasonable design choice, as expressed in the bug
- there are 2 responses from OCP team to my email sent to ocp list, but I don't see MCG dev participating the discussion

Comment 28 Martin Bukatovic 2021-06-15 12:16:07 UTC
I cloned this bug as BZ 1972190, because meaning of this bug changed, and now it represents a cli only hack which I don't consider a proper fix for the original problem.

As I noted in comment 22, this is not a good approach, and I'm not sure if having a cli only fix (as represented by this bug now) was wort the effort.

The better approach would be to:

- evaluate this bug properly (this hasn't happened, instead we come up with few incorrect ideas about the nature of the problem and a fix)
- propose https://bugzilla.redhat.com/show_bug.cgi?id=1966231#c8 as fix of this bug
- target and implement such fix

Comment 29 Ben Eli 2021-06-16 06:38:44 UTC
$ ./usr/share/mcg/linux/noobaa backingstore delete pv-backingstore-482417423768410b83e93dce
INFO[0002] ✅ Exists: NooBaa "noobaa"
INFO[0002] ✅ Exists: Service "noobaa-mgmt"
INFO[0002] ✅ Exists: Secret "noobaa-operator"
INFO[0002] ✅ Exists: Secret "noobaa-admin"
INFO[0003] ✈️  RPC: pool.read_pool() Request: {Name:pv-backingstore-482417423768410b83e93dce}
WARN[0003] RPC: GetConnection creating connection to wss://localhost:43639/rpc/ 0xc0002858b0
INFO[0003] RPC: Connecting websocket (0xc0002858b0) &{RPC:0xc0002b2f00 Address:wss://localhost:43639/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s}
INFO[0004] RPC: Connected websocket (0xc0002858b0) &{RPC:0xc0002b2f00 Address:wss://localhost:43639/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s}
INFO[0004] ✅ RPC: pool.read_pool() Response OK: took 6.2ms
FATA[0004] ❌ Could not delete BackingStore "pv-backingstore-482417423768410b83e93dce" in namespace "openshift-storage" as it is being used by one or more buckets

Verified.

Comment 33 errata-xmlrpc 2021-08-03 18:19:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 RPMs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3002


Note You need to log in before you can comment on or make changes to this bug.