Back to bug 2116605
| Who | When | What | Removed | Added |
|---|---|---|---|---|
| Shyamsundar | 2022-08-09 02:45:41 UTC | Status | NEW | ASSIGNED |
| Depends On | 2097511 | |||
| Ilya Dryomov | 2022-08-09 13:51:55 UTC | CC | idryomov | |
| Mudit Agarwal | 2022-08-20 02:59:13 UTC | Blocks | 2119932 | |
| Mudit Agarwal | 2022-08-20 03:00:23 UTC | Status | ASSIGNED | POST |
| Link ID | Github RamenDR/ramen/pull/525 | |||
| Shyamsundar | 2022-08-23 13:59:25 UTC | Doc Text | Cause: Due to a bug in the DR reconciler, during deletion of internal VolumeReplicaitonGroup resource on a managed cluster, from where a workload was failed over or relocated from, a PVC is attempted to be protected. The resulting cleanup operation does not complete and report PeerReady condition on the DRPlacementControl for the the application Consequence: An application that was failed over or relocated, cannot be relocated or failed over again due to DRPlacementControl resource reporting its PeerReady condition as false Workaround (if any): Before applying the workaround, determine the cause is due to protecting a PVC during VolumeReplicationGroup deletion as follows: - Ensure the VolumeReplicationGroup resource in the workload namespace on the managed cluster from where it was relocated or failed over from has the following values: - VRG metadata.deletionTimestamp is non-zero - VRG spec.replicationState is "Secondary" - List VolumeReplication resources in the workload namespace as above, and ensure the resource has the following values: - metadata.generation is 1 - spec.replicationState is "Secondary" - The VolumeReplication resource reports no status - For each VolumeReplication resource in the above state, their corresponding PVC resource (as seen in the VRG spec.dataSource field) should have the following values: - metadata.deletionTimestamp is non-zero To recover, - Remove the finalizer "volumereplicationgroups.ramendr.openshift.io/vrg-protection" from the VRG resource - Remove the finalizer "volumereplicationgroups.ramendr.openshift.io/pvc-vr-protection" from the respective PVC resources Result: DRPlacementControl at the hub cluster reports PeerReady condition as "true" and enables further workload relocation or failover actions |
|
| Mudit Agarwal | 2022-08-23 14:20:20 UTC | Blocks | 2094357 | |
| Olive Lakra | 2022-08-23 14:55:43 UTC | Doc Text | Cause: Due to a bug in the DR reconciler, during deletion of internal VolumeReplicaitonGroup resource on a managed cluster, from where a workload was failed over or relocated from, a PVC is attempted to be protected. The resulting cleanup operation does not complete and report PeerReady condition on the DRPlacementControl for the the application Consequence: An application that was failed over or relocated, cannot be relocated or failed over again due to DRPlacementControl resource reporting its PeerReady condition as false Workaround (if any): Before applying the workaround, determine the cause is due to protecting a PVC during VolumeReplicationGroup deletion as follows: - Ensure the VolumeReplicationGroup resource in the workload namespace on the managed cluster from where it was relocated or failed over from has the following values: - VRG metadata.deletionTimestamp is non-zero - VRG spec.replicationState is "Secondary" - List VolumeReplication resources in the workload namespace as above, and ensure the resource has the following values: - metadata.generation is 1 - spec.replicationState is "Secondary" - The VolumeReplication resource reports no status - For each VolumeReplication resource in the above state, their corresponding PVC resource (as seen in the VRG spec.dataSource field) should have the following values: - metadata.deletionTimestamp is non-zero To recover, - Remove the finalizer "volumereplicationgroups.ramendr.openshift.io/vrg-protection" from the VRG resource - Remove the finalizer "volumereplicationgroups.ramendr.openshift.io/pvc-vr-protection" from the respective PVC resources Result: DRPlacementControl at the hub cluster reports PeerReady condition as "true" and enables further workload relocation or failover actions | . Volume replication group deletion is stuck on a fresh volume replication created during deletion, which is stuck as the persistent volume claim cannot be updated with a finalizer Due to a bug in the disaster recovery (DR) reconciler, during deletion of the internal `VolumeReplicaitonGroup` resource on a managed cluster, from where a workload failed over or relocated from, a persistent volume claim (PVC) is attempted to be protected. The resulting cleanup operation does not complete and reports the `PeerReady` condition on the `DRPlacementControl` for the application. This results in the application that was failed over or relocated, cannot be relocated or failed over again due to `DRPlacementControl` resource reporting its `PeerReady` condition as `false`. Workaround: Before applying the workaround, determine if the cause is due to protecting a PVC during `VolumeReplicationGroup` deletion as follows: . Ensure the VolumeReplicationGroup resource in the workload namespace on the managed cluster from where it was relocated or failed over from has the following values: - VRG `metadata.deletionTimestamp` is `non-zero` - VRG `spec.replicationState` is `Secondary` . List the `VolumeReplication` resources in the workload namespace as above, and ensure the resource have the following values: - `metadata.generation` is `1` - `spec.replicationState` is `Secondary` - The VolumeReplication resource reports no status . For each VolumeReplication resource in the above state, their corresponding PVC resource (as seen in the VRG `spec.dataSource` field) should have the values `metadata.deletionTimestamp` as `non-zero` . To recover, remove the finalizer - `volumereplicationgroups.ramendr.openshift.io/vrg-protection` from the VRG resource - `volumereplicationgroups.ramendr.openshift.io/pvc-vr-protection` from the respective PVC resources Result: `DRPlacementControl` at the hub cluster reports the `PeerReady` condition as `true` and enables further workload relocation or failover actions. |
| CC | olakra | |||
| Olive Lakra | 2022-08-24 13:52:28 UTC | Doc Text | . Volume replication group deletion is stuck on a fresh volume replication created during deletion, which is stuck as the persistent volume claim cannot be updated with a finalizer Due to a bug in the disaster recovery (DR) reconciler, during deletion of the internal `VolumeReplicaitonGroup` resource on a managed cluster, from where a workload failed over or relocated from, a persistent volume claim (PVC) is attempted to be protected. The resulting cleanup operation does not complete and reports the `PeerReady` condition on the `DRPlacementControl` for the application. This results in the application that was failed over or relocated, cannot be relocated or failed over again due to `DRPlacementControl` resource reporting its `PeerReady` condition as `false`. Workaround: Before applying the workaround, determine if the cause is due to protecting a PVC during `VolumeReplicationGroup` deletion as follows: . Ensure the VolumeReplicationGroup resource in the workload namespace on the managed cluster from where it was relocated or failed over from has the following values: - VRG `metadata.deletionTimestamp` is `non-zero` - VRG `spec.replicationState` is `Secondary` . List the `VolumeReplication` resources in the workload namespace as above, and ensure the resource have the following values: - `metadata.generation` is `1` - `spec.replicationState` is `Secondary` - The VolumeReplication resource reports no status . For each VolumeReplication resource in the above state, their corresponding PVC resource (as seen in the VRG `spec.dataSource` field) should have the values `metadata.deletionTimestamp` as `non-zero` . To recover, remove the finalizer - `volumereplicationgroups.ramendr.openshift.io/vrg-protection` from the VRG resource - `volumereplicationgroups.ramendr.openshift.io/pvc-vr-protection` from the respective PVC resources Result: `DRPlacementControl` at the hub cluster reports the `PeerReady` condition as `true` and enables further workload relocation or failover actions. | .Volume replication group deletion is stuck on a fresh volume replication created during deletion, which is stuck as the persistent volume claim cannot be updated with a finalizer Due to a bug in the disaster recovery (DR) reconciler, during deletion of the internal `VolumeReplicaitonGroup` resource on a managed cluster, from where a workload failed over or relocated from, a persistent volume claim (PVC) is attempted to be protected. The resulting cleanup operation does not complete and reports the `PeerReady` condition on the `DRPlacementControl` for the application. This results in the application that was failed over or relocated, cannot be relocated or failed over again due to `DRPlacementControl` resource reporting its `PeerReady` condition as `false`. Workaround: Before applying the workaround, determine if the cause is due to protecting a PVC during `VolumeReplicationGroup` deletion as follows: . Ensure the VolumeReplicationGroup resource in the workload namespace on the managed cluster from where it was relocated or failed over from has the following values: - VRG `metadata.deletionTimestamp` is `non-zero` - VRG `spec.replicationState` is `Secondary` . List the `VolumeReplication` resources in the workload namespace as above, and ensure the resource have the following values: - `metadata.generation` is `1` - `spec.replicationState` is `Secondary` - The VolumeReplication resource reports no status . For each VolumeReplication resource in the above state, their corresponding PVC resource (as seen in the VR `spec.dataSource` field) should have the values `metadata.deletionTimestamp` as `non-zero` . To recover, remove the finalizer - `volumereplicationgroups.ramendr.openshift.io/vrg-protection` from the VRG resource - `volumereplicationgroups.ramendr.openshift.io/pvc-vr-protection` from the respective PVC resources Result: `DRPlacementControl` at the hub cluster reports the `PeerReady` condition as `true` and enables further workload relocation or failover actions. |
| Mudit Agarwal | 2022-09-26 23:22:38 UTC | Doc Type | If docs needed, set a value | Bug Fix |
| Doc Text | .Volume replication group deletion is stuck on a fresh volume replication created during deletion, which is stuck as the persistent volume claim cannot be updated with a finalizer Due to a bug in the disaster recovery (DR) reconciler, during deletion of the internal `VolumeReplicaitonGroup` resource on a managed cluster, from where a workload failed over or relocated from, a persistent volume claim (PVC) is attempted to be protected. The resulting cleanup operation does not complete and reports the `PeerReady` condition on the `DRPlacementControl` for the application. This results in the application that was failed over or relocated, cannot be relocated or failed over again due to `DRPlacementControl` resource reporting its `PeerReady` condition as `false`. Workaround: Before applying the workaround, determine if the cause is due to protecting a PVC during `VolumeReplicationGroup` deletion as follows: . Ensure the VolumeReplicationGroup resource in the workload namespace on the managed cluster from where it was relocated or failed over from has the following values: - VRG `metadata.deletionTimestamp` is `non-zero` - VRG `spec.replicationState` is `Secondary` . List the `VolumeReplication` resources in the workload namespace as above, and ensure the resource have the following values: - `metadata.generation` is `1` - `spec.replicationState` is `Secondary` - The VolumeReplication resource reports no status . For each VolumeReplication resource in the above state, their corresponding PVC resource (as seen in the VR `spec.dataSource` field) should have the values `metadata.deletionTimestamp` as `non-zero` . To recover, remove the finalizer - `volumereplicationgroups.ramendr.openshift.io/vrg-protection` from the VRG resource - `volumereplicationgroups.ramendr.openshift.io/pvc-vr-protection` from the respective PVC resources Result: `DRPlacementControl` at the hub cluster reports the `PeerReady` condition as `true` and enables further workload relocation or failover actions. | |||
| Flags | needinfo?(kramdoss) | |||
| CC | kramdoss | |||
| Status | POST | ON_QA | ||
| krishnaram Karthick | 2022-10-12 07:53:02 UTC | QA Contact | kramdoss | prsurve |
| RHEL Program Management | 2022-10-12 07:53:12 UTC | Target Release | --- | ODF 4.12.0 |
| Sidhant Agrawal | 2022-11-08 12:57:55 UTC | Blocks | 2115507 | |
| Sunil Kumar Acharya | 2022-12-08 12:55:57 UTC | Flags | needinfo?(srangana) | |
| Shyamsundar | 2022-12-08 16:20:54 UTC | Flags | needinfo?(srangana) | needinfo?(olakra) |
| Pratik Surve | 2022-12-12 06:02:53 UTC | QA Contact | prsurve | sagrawal |
| Sidhant Agrawal | 2022-12-26 12:09:08 UTC | Status | ON_QA | VERIFIED |
| Red Hat Bugzilla | 2022-12-31 19:21:12 UTC | QA Contact | sagrawal | kramdoss |
| Red Hat Bugzilla | 2022-12-31 19:32:31 UTC | CC | pdhiran | |
| Red Hat Bugzilla | 2022-12-31 19:59:56 UTC | CC | sseshasa | |
| Red Hat Bugzilla | 2022-12-31 20:00:24 UTC | CC | olakra | |
| Red Hat Bugzilla | 2022-12-31 20:04:21 UTC | CC | amagrawa | |
| Red Hat Bugzilla | 2022-12-31 22:37:11 UTC | CC | ebenahar | |
| Red Hat Bugzilla | 2023-01-01 05:47:47 UTC | CC | srangana | |
| Assignee | srangana | nobody | ||
| Red Hat Bugzilla | 2023-01-01 06:02:19 UTC | CC | bniver | |
| Red Hat Bugzilla | 2023-01-01 08:30:05 UTC | CC | bmekhiss | |
| Red Hat Bugzilla | 2023-01-01 08:31:58 UTC | CC | kramdoss | |
| QA Contact | kramdoss | |||
| Red Hat Bugzilla | 2023-01-01 08:38:26 UTC | CC | nojha | |
| Red Hat Bugzilla | 2023-01-01 08:49:48 UTC | CC | vumrao | |
| Alasdair Kergon | 2023-01-04 04:42:51 UTC | CC | amagrawa | |
| Alasdair Kergon | 2023-01-04 04:47:42 UTC | QA Contact | sagrawal | |
| Alasdair Kergon | 2023-01-04 04:48:40 UTC | CC | bmekhiss | |
| Alasdair Kergon | 2023-01-04 04:52:56 UTC | Assignee | nobody | srangana |
| Alasdair Kergon | 2023-01-04 05:07:00 UTC | CC | kramdoss | |
| Alasdair Kergon | 2023-01-04 05:21:38 UTC | CC | nojha | |
| Alasdair Kergon | 2023-01-04 05:25:54 UTC | CC | olakra | |
| Alasdair Kergon | 2023-01-04 05:30:13 UTC | CC | pdhiran | |
| Alasdair Kergon | 2023-01-04 05:46:39 UTC | CC | srangana | |
| Alasdair Kergon | 2023-01-04 05:59:30 UTC | CC | vumrao | |
| Alasdair Kergon | 2023-01-04 06:11:25 UTC | CC | bniver | |
| Alasdair Kergon | 2023-01-04 06:41:59 UTC | CC | ebenahar | |
| Alasdair Kergon | 2023-01-04 06:56:31 UTC | CC | sseshasa | |
| Erin Donnelly | 2023-01-06 18:49:21 UTC | Blocks | 2107226 | |
| CC | edonnell | |||
| Flags | needinfo?(srangana) | |||
| Shyamsundar | 2023-01-10 18:14:16 UTC | Doc Text | Cause: Due to a bug in the disaster recovery (DR) reconciler, during deletion of the internal VolumeReplicaitonGroup resource on a managed cluster, from where a workload failed over or relocated from, a persistent volume claim (PVC) is attempted to be protected. The resulting cleanup operation does not complete and reports the PeerReady condition on the DRPlacementControl for the application as False. Consequence: This results in the application that was failed over or relocated, cannot be relocated or failed over again due to DRPlacementControl resource reporting its PeerReady condition as false. Fix: With this update, during deletion of the internal VolumeReplicationGroup resource a PVC is not attempted to be protected again, thereby avoiding the issue of a stalled cleanup. Result: Resulting in DRPlacementControl reporting PeerReady as True post auto completion of the cleanup | |
| Flags | needinfo?(kramdoss) needinfo?(olakra) needinfo?(srangana) | |||
| Erin Donnelly | 2023-01-12 17:20:29 UTC | Doc Text | Cause: Due to a bug in the disaster recovery (DR) reconciler, during deletion of the internal VolumeReplicaitonGroup resource on a managed cluster, from where a workload failed over or relocated from, a persistent volume claim (PVC) is attempted to be protected. The resulting cleanup operation does not complete and reports the PeerReady condition on the DRPlacementControl for the application as False. Consequence: This results in the application that was failed over or relocated, cannot be relocated or failed over again due to DRPlacementControl resource reporting its PeerReady condition as false. Fix: With this update, during deletion of the internal VolumeReplicationGroup resource a PVC is not attempted to be protected again, thereby avoiding the issue of a stalled cleanup. Result: Resulting in DRPlacementControl reporting PeerReady as True post auto completion of the cleanup | .Deleting the internal `VolumeReplicaitonGroup` resource from where a workload failed over or relocated from no longer causes errors Due to a bug in the disaster recovery (DR) reconciler, during deletion of the internal `VolumeReplicaitonGroup` resource on a managed cluster, from where a workload failed over or relocated from, a persistent volume claim (PVC) was attempted to be protected. The resulting cleanup operation did not complete and would report the `PeerReady` condition on the `DRPlacementControl` for the application to be `False`. This meant the application that was failed over or relocated, could not be relocated or failed over again because the `DRPlacementControl` resource was reporting its `PeerReady` condition as `False`. With this update, during deletion of the internal `VolumeReplicationGroup` resource, a PVC is not attempted to be protected again, thereby avoiding the issue of a stalled cleanup. This results in `DRPlacementControl` reporting `PeerReady` as `True` post auto completion of the cleanup. |
| Red Hat Bugzilla | 2023-01-31 23:38:23 UTC | CC | madam | |
| Rejy M Cyriac | 2023-02-08 14:06:28 UTC | Resolution | --- | CURRENTRELEASE |
| Status | VERIFIED | CLOSED | ||
| Last Closed | 2023-02-08 14:06:28 UTC | |||
| Elad | 2023-08-09 17:00:43 UTC | CC | odf-bz-bot |
Back to bug 2116605