Bug 2104844 - [RDR] Cleanup of primary cluster is stuck and never completes when relocate operation is performed
Summary: [RDR] Cleanup of primary cluster is stuck and never completes when relocate o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.12.0
Assignee: Ilya Dryomov
QA Contact: Sidhant Agrawal
URL:
Whiteboard:
Depends On: 2105454
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-07 09:41 UTC by Aman Agrawal
Modified: 2023-12-08 04:29 UTC (History)
13 users (show)

Fixed In Version: 4.11.0-137
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2105454 2116493 (view as bug list)
Environment:
Last Closed: 2023-01-31 00:19:40 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2023:0551 0 None None None 2023-01-31 00:19:59 UTC

Comment 6 Benamar Mekhissi 2022-07-07 12:59:57 UTC
@mrajanna @idryomov I see the following. Any ideas???

rbd mirror image status on C2 is reporting an error:
```
ceph-tools-55b98f657d-h647k -- rbd mirror image status csi-vol-89d68c37-fc45-11ec-824c-0a580a850227 --pool ocs-storagecluster-cephblockpool
csi-vol-89d68c37-fc45-11ec-824c-0a580a850227:
  global_id:   5334c11b-2d9c-47af-9c5c-f56fc31f4407
  state:       up+error
  description: incomplete local non-primary snapshot
  service:     a on dhcp161-177.lab.eng.blr.redhat.com
  last_update: 2022-07-07 12:55:18
  peer_sites:
    name: c93bfe26-f907-4492-9bb8-f6d93fdbe5a8
    state: up+stopped
    description: local image is primary
    last_update: 2022-07-07 12:55:31
```

The VR on C2 is reporting failure to disable volume replication
```
{"level":"error","timestamp":"2022-07-07T12:25:57.113Z","logger":"controllers.VolumeReplication","caller":"controllers/volumereplication_controller.go:198","msg":"failed to disable volume replication","Request.Name":"busybox-pvc-85","Request.Namespace":"busybox-workloads-5","error":"rpc error: code = InvalidArgument desc = secondary image status is up=true and state=error"}
```

On C1, I didn't see that we force promoted:
```
{"level":"info","timestamp":"2022-07-05T09:36:39.589Z","logger":"controllers.VolumeReplication","caller":"controllers/volumereplication_controller.go:191","msg":"adding finalizer to PersistentVolumeClaim object","Request.Name":"busybox-pvc-85","Request.Namespace":"busybox-workloads-5","Finalizer":"replication.storage.openshift.io/pvc-protection"}
{"level":"error","timestamp":"2022-07-05T09:36:40.618Z","logger":"controllers.VolumeReplication","caller":"controllers/volumereplication_controller.go:248","msg":"failed to promote volume","Request.Name":"busybox-pvc-85","Request.Namespace":"busybox-workloads-5","error":"rpc error: code = Internal desc = ocs-storagecluster-cephblockpool/csi-vol-89d68c37-fc45-11ec-824c-0a580a850227 mirrored image is not healthy. State is up=false, state=\"unknown\""}
{"level":"error","timestamp":"2022-07-05T09:36:40.618Z","logger":"controllers.VolumeReplication","caller":"controller/controller.go:298","msg":"failed to Replicate","Request.Name":"busybox-pvc-85","Request.Namespace":"busybox-workloads-5","ReplicationState":"primary","error":"rpc error: code = Internal desc = ocs-storagecluster-cephblockpool/csi-vol-89d68c37-fc45-11ec-824c-0a580a850227 mirrored image is not healthy. State is up=false, state=\"unknown\""}
{"level":"error","timestamp":"2022-07-05T09:36:40.624Z","logger":"controller-runtime.manager.controller.volumereplication","caller":"controller/controller.go:253","msg":"Reconciler error","reconciler group":"replication.storage.openshift.io","reconciler kind":"VolumeReplication","name":"busybox-pvc-85","namespace":"busybox-workloads-5","error":"rpc error: code = Internal desc = ocs-storagecluster-cephblockpool/csi-vol-89d68c37-fc45-11ec-824c-0a580a850227 mirrored image is not healthy. State is up=false, state=\"unknown\""}
```

Comment 33 Mudit Agarwal 2022-08-11 04:58:04 UTC
Pls provide doc text

Comment 38 Mudit Agarwal 2022-08-17 13:17:31 UTC
Karthick, this means that we need to move it back to 4.11.0 and mark it ON_QA. Please confirm.

Comment 39 krishnaram Karthick 2022-08-18 14:37:28 UTC
(In reply to Mudit Agarwal from comment #38)
> Karthick, this means that we need to move it back to 4.11.0 and mark it
> ON_QA. Please confirm.

yes, you are correct. could you please move it to on_qa.

Comment 60 errata-xmlrpc 2023-01-31 00:19:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.12.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:0551

Comment 61 Red Hat Bugzilla 2023-12-08 04:29:30 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.