2019931 – [DR] When Relocate action is performed PVC's are in Termination state and workload is not moved to preferred cluster

Bug 2019931 - [DR] When Relocate action is performed PVC's are in Termination state and workload is not moved to preferred cluster

Summary: [DR] When Relocate action is performed PVC's are in Termination state and wor...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ceph
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Deepika Upadhyay
QA Contact:	krishnaram Karthick
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2011326 2030744
TreeView+	depends on / blocked

Reported:	2021-11-03 16:28 UTC by Pratik Surve
Modified:	2023-12-08 04:26 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	.Relocate action results in PVC's in Termination state and workload is not moved to a preferred cluster While relocating a disaster recovery (DR) protected workload, results in workload not stopping on the current primary cluster and PVCs of the workload remaining in the terminating state. This prevents pods and PVCs from being relocated to the preferred cluster. To recover the issue perform a failover action to move the workload to the preferred cluster. The workload would be recovered on the preferred cluster but may include a loss of data as the action is a failover.
Clone Of:
Clones:	2030744 (view as bug list)
Environment:
Last Closed:	2022-03-30 12:57:48 UTC
Embargoed:

Attachments	(Terms of Use)
rbd-mirror.log (from primary) (6.38 KB, text/plain) 2021-11-04 16:13 UTC, Benamar Mekhissi	no flags	Details
VR log from Primary (61.92 KB, text/plain) 2021-11-04 16:14 UTC, Benamar Mekhissi	no flags	Details
VR log from secondary (173.33 KB, text/plain) 2021-11-04 16:14 UTC, Benamar Mekhissi	no flags	Details
rbdplugin log from primary (2.41 MB, text/plain) 2021-11-04 16:16 UTC, Benamar Mekhissi	no flags	Details
csi-rbdplugin from secondary (2.30 MB, text/plain) 2021-11-04 16:16 UTC, Benamar Mekhissi	no flags	Details
View All

Description Pratik Surve 2021-11-03 16:28:01 UTC

Description of problem (please be detailed as possible and provide log
snippests):

[DR] When Relocate action is performed PVC's are in Termination state and workload is not moved to a preferred cluster

Version of all relevant components (if applicable):

ODF version:- 4.9.0-212.ci
OCP version:- 4.9.0-0.nightly-2021-10-28-052726


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

3

Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy DR cluster
2. Deploy workloads
3. Perform failover of workloads
4. Relocate the workloads
5. Check pvc status and workloads state


Actual results:
#oc get pvc -n busybox-workloads                                          
NAME             STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
busybox-pvc      Terminating   pvc-73e64578-88e5-4b82-bde8-4831a9187e75   1Gi        RWO            ocs-storagecluster-ceph-rbd   28h
busybox-pvc-1    Terminating   pvc-0d39ff58-2bb1-4c44-92ab-6058753a6f45   5Gi        RWO            ocs-storagecluster-ceph-rbd   28h
busybox-pvc-2    Terminating   pvc-ad7fa6ce-9dd6-4cf7-809b-59810314a4fa   8Gi        RWO            ocs-storagecluster-ceph-rbd   28h
busybox-pvc-3    Terminating   pvc-05ae2d0a-77c1-4806-a969-d876bae1a6e9   10Gi       RWO            ocs-storagecluster-ceph-rbd   28h
busybox-pvc-4    Terminating   pvc-5b31ab80-cf45-47be-9958-bc971d8dd0b7   3Gi        RWO            ocs-storagecluster-ceph-rbd   28h
busybox-pvc-5    Terminating   pvc-209e19ac-6f2e-4b4c-864b-f150a3d61d27   8Gi        RWO            ocs-storagecluster-ceph-rbd   28h
mysql-pv-claim   Terminating   pvc-76502a33-701c-423c-a097-1d1078a29fd6   24Gi       RWO            ocs-storagecluster-ceph-rbd   28h


Expected results:
Workloads should be relocated to prefered cluster with no issue

Additional info:

Comment 2 Benamar Mekhissi 2021-11-04 16:10:45 UTC

The user requested the workload to be relocated to the preferred cluster. In summary, during the relocation, the VR is requested by VRG to move VR state to secondary.  During that process, we demote the image, we resync if necessary, and then report success.

Our investigation revealed that the primary image was deleted due to a resync operation. The VR requested a resync due to split-brain error as seen from the VR log snippet on the Primary. Out of the 7 PVCs, the image for the mysql-pv-claim pvc failed with split-brain error. 
```
"2021-11-02T11:39:10.473Z","logger":"controllers.VolumeReplication","caller":"controllers/volumereplication_controller.go:164","msg":"adding finalizer to PersistentVolumeClaim object","Request.Name":"mysql-pv-claim","Request.Namespace":"busybox-workloads","Finalizer":"replication.storage.openshift.io/pvc-protection"}
{"level":"info","timestamp":"2021-11-02T11:39:11.490Z","logger":"controllers.VolumeReplication","caller":"controller/controller.go:298","msg":"volume is marked primary","Reques
t.Name":"mysql-pv-claim","Request.Namespace":"busybox-workloads"}

"2021-11-02T13:56:21.002Z","logger":"controllers.VolumeReplication","caller":"controller/controller.go:298","msg":"volume handle","Request.Name":"my
sql-pv-claim","Request.Namespace":"busybox-workloads","VolumeHandleName":"0001-0011-openshift-storage-0000000000000001-f038968b-389c-11ec-bca9-0a580a800241"}
{"level":"info","timestamp":"2021-11-02T13:56:22.384Z","logger":"controllers.VolumeReplication","caller":"controller/controller.go:298","msg":"volume is not ready to use","Requ
est.Name":"mysql-pv-claim","Request.Namespace":"busybox-workloads"}

"2021-11-02T13:56:37.875Z","logger":"controllers.VolumeReplication","caller":"controllers/volumereplication_controller.go:236","msg":"failed to resync volume","Request.Name":"mysql-pv-claim","Request.Namespace":"busybox-workloads","error":"rpc error: code = Unavailable desc = awaiting initial resync due to split brain"}
```

We beleive that the resync ended up deleting the local image and the VR ended up in "kind of" deadlock waiting for the image to reappear with no further progress as there is no primary.
```
"2021-11-02T13:57:13.199Z","logger":"controllers.VolumeReplication","caller":"controllers/volumereplication_controller.go:198","msg":"failed to enable volume replication","Request.Name":"mysql-pv-claim","Request.Namespace":"busybox-workloads","error":"rpc error: code = NotFound desc = volume 0001-0011-openshift-storage-0000000000000001-f038968b-389c-11ec-bca9-0a580a800241 not found"}

```

The rbd-mirror log shows the following:
```
debug 2021-11-02T13:56:35.104+0000 7fa98c2ac700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x564bbfe2b400 scan_remote_mirror_snapshots: split-brain detected: failed to find matching non-primary snapshot in remote image: local_snap_id_start=16491, local_snap_ns=[mirror state=primary (demoted), complete=1, mirror_peer_uuids=d3aaa712-4bc9-4aef-967e-56dcf3ae3585, primary_mirror_uuid=, primary_snap_id=head, last_copied_object_number=0, snap_seqs={}]
```

The error is "failed to find matching non-primary snapshot in remote image".
```
rbd snap ls csi-vol-f038968b-389c-11ec-bca9-0a580a800241 --pool=ocs-storagecluster-cephblockpool
```
show no snapshots in the remote cluster or the local cluster.

Next, we will be investigating RBD to see if we can determine the reason behind the split-brain error.

One last thing to note; on the primary cluster, the VR requested a resync due to split-brain error (which ended up deleting the primary image). We believe that's wrong. A resyc should be requested from the secondary cluster and not the primary.

-- Log files attached ---

Comment 3 Benamar Mekhissi 2021-11-04 16:13:24 UTC

Created attachment 1839887 [details]
rbd-mirror.log (from primary)

Comment 4 Benamar Mekhissi 2021-11-04 16:14:07 UTC

Created attachment 1839888 [details]
VR log from Primary

Comment 5 Benamar Mekhissi 2021-11-04 16:14:48 UTC

Created attachment 1839889 [details]
VR log from secondary

Comment 6 Benamar Mekhissi 2021-11-04 16:16:12 UTC

Created attachment 1839890 [details]
rbdplugin log from primary

Comment 7 Benamar Mekhissi 2021-11-04 16:16:58 UTC

Created attachment 1839891 [details]
csi-rbdplugin from secondary

Comment 8 Shyamsundar 2021-11-05 12:06:03 UTC

Adding one more observation from the csi-rbdplugin logs from the current primary (comment #6):

The following request is the one that detects a split-brain report from RBD, and issues a resync that ends up deleting the image with no known primary to sync from.

I1102 13:56:37.529982       1 utils.go:177] ID: 6096 Req-ID: 0001-0011-openshift-storage-0000000000000001-f038968b-389c-11ec-bca9-0a580a800241 GRPC call: /replication.Controller/ResyncVolume

I1102 13:56:37.530223       1 utils.go:179] ID: 6096 Req-ID: 0001-0011-openshift-storage-0000000000000001-f038968b-389c-11ec-bca9-0a580a800241 GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"5m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-f038968b-389c-11ec-bca9-0a580a800241"}

I1102 13:56:37.539866       1 omap.go:87] ID: 6096 Req-ID: 0001-0011-openshift-storage-0000000000000001-f038968b-389c-11ec-bca9-0a580a800241 got omap values: (pool="ocs-storagecluster-cephblockpool", namespace="", name="csi.volume.f038968b-389c-11ec-bca9-0a580a800241"): map[csi.imageid:63cb7d9a035e csi.imagename:csi-vol-f038968b-389c-11ec-bca9-0a580a800241 csi.volname:pvc-76502a33-701c-423c-a097-1d1078a29fd6 csi.volume.owner:busybox-workloads]

I1102 13:56:37.611589       1 replicationcontrollerserver.go:612] ID: 6096 Req-ID: 0001-0011-openshift-storage-0000000000000001-f038968b-389c-11ec-bca9-0a580a800241 local status: daemon up=true, image mirroring state="error", description="split-brain" and lastUpdate=2021-11-02 13:56:35 +0000 UTC
===> NOTE: The response from RBD on image status reads "up+error" with a description of split-brain, causing the resync to be issued.

E1102 13:56:37.874855       1 utils.go:186] ID: 6096 Req-ID: 0001-0011-openshift-storage-0000000000000001-f038968b-389c-11ec-bca9-0a580a800241 GRPC error: rpc error: code = Unavailable desc = awaiting initial resync due to split brain

Further notes:
- As the image was demoted and later promoted on a peer cluster, the subsequent demote on the peer cluster should not result in a split-brain report from RBD. The status should have ideally been up+stopped (post syncing the last snapshot).

- As Benamar points out in comment #2 the RBD mirror daemon decides the image is in split-brain due to missing snapshots, which seems to be the issue causing the problem here

Comment 19 Mudit Agarwal 2021-12-09 05:16:42 UTC

For doc text confirmation.

Comment 24 Scott Ostapovicz 2022-02-03 14:28:12 UTC

Pratik please confirm that this works.  It can be tested with 4.10.

Comment 28 Scott Ostapovicz 2022-03-15 13:40:55 UTC

So this issue is no longer reproducible?  Removing the test blocker flag.

If this issue is no longer reproducible, please close it as such and reopen it if it occurs again.

Comment 30 Red Hat Bugzilla 2023-12-08 04:26:32 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.