2104844 – [RDR] Cleanup of primary cluster is stuck and never completes when relocate operation is performed

Bug 2104844 - [RDR] Cleanup of primary cluster is stuck and never completes when relocate operation is performed

Summary: [RDR] Cleanup of primary cluster is stuck and never completes when relocate o...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ceph
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.12.0
Assignee:	Ilya Dryomov
QA Contact:	Sidhant Agrawal
Docs Contact:
URL:
Whiteboard:
Depends On:	2105454
Blocks:
TreeView+	depends on / blocked

Reported:	2022-07-07 09:41 UTC by Aman Agrawal
Modified:	2023-12-08 04:29 UTC (History)
CC List:	13 users (show)
Fixed In Version:	4.11.0-137
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	2105454 2116493 (view as bug list)
Environment:
Last Closed:	2023-01-31 00:19:40 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2023:0551	0	None	None	None	2023-01-31 00:19:59 UTC

Comment 6 Benamar Mekhissi 2022-07-07 12:59:57 UTC

@mrajanna @idryomov I see the following. Any ideas???

rbd mirror image status on C2 is reporting an error:
```
ceph-tools-55b98f657d-h647k -- rbd mirror image status csi-vol-89d68c37-fc45-11ec-824c-0a580a850227 --pool ocs-storagecluster-cephblockpool
csi-vol-89d68c37-fc45-11ec-824c-0a580a850227:
  global_id:   5334c11b-2d9c-47af-9c5c-f56fc31f4407
  state:       up+error
  description: incomplete local non-primary snapshot
  service:     a on dhcp161-177.lab.eng.blr.redhat.com
  last_update: 2022-07-07 12:55:18
  peer_sites:
    name: c93bfe26-f907-4492-9bb8-f6d93fdbe5a8
    state: up+stopped
    description: local image is primary
    last_update: 2022-07-07 12:55:31
```

The VR on C2 is reporting failure to disable volume replication
```
{"level":"error","timestamp":"2022-07-07T12:25:57.113Z","logger":"controllers.VolumeReplication","caller":"controllers/volumereplication_controller.go:198","msg":"failed to disable volume replication","Request.Name":"busybox-pvc-85","Request.Namespace":"busybox-workloads-5","error":"rpc error: code = InvalidArgument desc = secondary image status is up=true and state=error"}
```

On C1, I didn't see that we force promoted:
```
{"level":"info","timestamp":"2022-07-05T09:36:39.589Z","logger":"controllers.VolumeReplication","caller":"controllers/volumereplication_controller.go:191","msg":"adding finalizer to PersistentVolumeClaim object","Request.Name":"busybox-pvc-85","Request.Namespace":"busybox-workloads-5","Finalizer":"replication.storage.openshift.io/pvc-protection"}
{"level":"error","timestamp":"2022-07-05T09:36:40.618Z","logger":"controllers.VolumeReplication","caller":"controllers/volumereplication_controller.go:248","msg":"failed to promote volume","Request.Name":"busybox-pvc-85","Request.Namespace":"busybox-workloads-5","error":"rpc error: code = Internal desc = ocs-storagecluster-cephblockpool/csi-vol-89d68c37-fc45-11ec-824c-0a580a850227 mirrored image is not healthy. State is up=false, state=\"unknown\""}
{"level":"error","timestamp":"2022-07-05T09:36:40.618Z","logger":"controllers.VolumeReplication","caller":"controller/controller.go:298","msg":"failed to Replicate","Request.Name":"busybox-pvc-85","Request.Namespace":"busybox-workloads-5","ReplicationState":"primary","error":"rpc error: code = Internal desc = ocs-storagecluster-cephblockpool/csi-vol-89d68c37-fc45-11ec-824c-0a580a850227 mirrored image is not healthy. State is up=false, state=\"unknown\""}
{"level":"error","timestamp":"2022-07-05T09:36:40.624Z","logger":"controller-runtime.manager.controller.volumereplication","caller":"controller/controller.go:253","msg":"Reconciler error","reconciler group":"replication.storage.openshift.io","reconciler kind":"VolumeReplication","name":"busybox-pvc-85","namespace":"busybox-workloads-5","error":"rpc error: code = Internal desc = ocs-storagecluster-cephblockpool/csi-vol-89d68c37-fc45-11ec-824c-0a580a850227 mirrored image is not healthy. State is up=false, state=\"unknown\""}
```

Comment 33 Mudit Agarwal 2022-08-11 04:58:04 UTC

Pls provide doc text

Comment 38 Mudit Agarwal 2022-08-17 13:17:31 UTC

Karthick, this means that we need to move it back to 4.11.0 and mark it ON_QA. Please confirm.

Comment 39 krishnaram Karthick 2022-08-18 14:37:28 UTC

(In reply to Mudit Agarwal from comment #38)
> Karthick, this means that we need to move it back to 4.11.0 and mark it
> ON_QA. Please confirm.

yes, you are correct. could you please move it to on_qa.

Comment 60 errata-xmlrpc 2023-01-31 00:19:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.12.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:0551

Comment 61 Red Hat Bugzilla 2023-12-08 04:29:30 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.