Bug 2308801 - [RDR] [Hub recovery] [Neutral] When the backed-up state is not latest, deployment/pod is lost for the apps in relocated state [NEEDINFO]
Summary: [RDR] [Hub recovery] [Neutral] When the backed-up state is not latest, deploy...
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-dr
Version: 4.16
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.18.0
Assignee: Benamar Mekhissi
QA Contact: Aman Agrawal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-08-31 09:44 UTC by Aman Agrawal
Modified: 2024-10-28 13:44 UTC (History)
5 users (show)

Fixed In Version: 4.17.0-103
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
sheggodu: needinfo? (bmekhiss)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github RamenDR ramen pull 1546 0 None Merged Fix PlacementDecision Exclusion from Hub Backup Due to Missing Label 2024-09-11 12:51:26 UTC
Github red-hat-storage ramen pull 354 0 None open Bug 2308801: Fix PlacementDecision Exclusion from Hub Backup Due to Missing Label 2024-09-13 13:33:06 UTC
Red Hat Issue Tracker OCSBZM-8920 0 None None None 2024-08-31 09:44:37 UTC

Description Aman Agrawal 2024-08-31 09:44:15 UTC
Description of problem (please be detailed as possible and provide log
snippests):


Version of all relevant components (if applicable):
OCP 4.16.0-0.nightly-2024-08-29-060830
ODF 4.16.1-8
ACM 2.11.2 GA'ed
OADP 1.4.0
MCE 2.6.2
RH Gitops 1.13.1
Submariner 0.18.0
VolSync 0.10.0
ceph version 18.2.1-229.el9cp (ef652b206f2487adfc86613646a4cac946f6b4e0) reef (stable)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. On a RDR setup with multiple workloads, rbd-appset(pull)/sub, cephfs-appset(pull)/sub all in Deployed, FailedOver and Relocated state running on any one of the managed clusters and rbd-appset(pull)/sub, cephfs-appset(pull)/sub 1 each in Deployed state on another managed cluster,then configure it for hub recovery but do not start taking new backups.
2. Before backups are taken, ensure the above state is achieved.
3. Now start taking backups and when we have 1 or 2 successful backups, either stop the backup or increase the backup time to allow certain action in between so that no new backup is taken.
Collect outputs and note down other observations. 
4. Now failover/relocate the workloads which are already in FailedOver or Relocated state to another cluster.
Meaning, move the workloads which are primary on C1 to C2 or the other way round. Let the workloads in Deployed state remain as it is on both the managed clusters.
5. Make sure that the latest state of workloads and drpc is **NOT** backed up as mentioned in Step 3 above. We do not want to latest backups to be taken.

Collect outputs and note down other observations. 

After all the operations complete, let IOs run for some time and then
perform hub recovery by bringing active hub cluster down.

6. After moving to new hub, ensure drpolicy is validated and drpc is restored.
7. Check the drpc status (it should match with the last backed up state of drpc as in Step 3 above) before we stopped taking backups.
8. Now workloads will try to move to different managed cluster as per drpc state which is restored. Apps in relocated state will waitforuser action.
9. Relocate all such apps via ACM UI


Actual results:

================================================================================================================================================================
DRPC state when backup was taken:



At date -u
Fri Aug 30 10:26:25 UTC 2024


Backups on active hub

backup
NAMESPACE                        NAME                                            AGE
open-cluster-management-backup   acm-credentials-schedule-20240830101055         14m
open-cluster-management-backup   acm-credentials-schedule-20240830101554         9m34s
open-cluster-management-backup   acm-credentials-schedule-20240830102054         4m34s
open-cluster-management-backup   acm-managed-clusters-schedule-20240830101055    14m
open-cluster-management-backup   acm-managed-clusters-schedule-20240830101554    9m34s
open-cluster-management-backup   acm-managed-clusters-schedule-20240830102054    4m34s
open-cluster-management-backup   acm-resources-generic-schedule-20240830101055   14m
open-cluster-management-backup   acm-resources-generic-schedule-20240830101554   9m34s
open-cluster-management-backup   acm-resources-generic-schedule-20240830102054   4m34s
open-cluster-management-backup   acm-resources-schedule-20240830101055           14m
open-cluster-management-backup   acm-resources-schedule-20240830101554           9m34s
open-cluster-management-backup   acm-resources-schedule-20240830102054           4m34s
open-cluster-management-backup   acm-validation-policy-schedule-20240830101554   9m34s
open-cluster-management-backup   acm-validation-policy-schedule-20240830102054   4m34s
////////////
NAME           PHASE     MESSAGE
schedule-acm   Enabled   Velero schedules are enabled
////////////
NAME                     REMEDIATION ACTION   COMPLIANCE STATE   AGE
backup-restore-enabled   inform               Compliant          2d12h
////////////
NAME      PHASE       LAST VALIDATED   AGE     DEFAULT
default   Available   43s              2d12h   true
////////////
NAME                             STATUS    SCHEDULE       LASTBACKUP   AGE   PAUSED
acm-credentials-schedule         Enabled   0 */99 * * *   4m38s        14m
acm-managed-clusters-schedule    Enabled   0 */99 * * *   4m38s        14m
acm-resources-generic-schedule   Enabled   0 */99 * * *   4m38s        14m
acm-resources-schedule           Enabled   0 */99 * * *   4m38s        14m
acm-validation-policy-schedule   Enabled   0 */99 * * *   4m38s        14m






Backups on passive hub

backup
NAMESPACE                        NAME                                            AGE
open-cluster-management-backup   acm-credentials-schedule-20240830101055         15m
open-cluster-management-backup   acm-credentials-schedule-20240830101554         10m
open-cluster-management-backup   acm-credentials-schedule-20240830102054         5m6s
open-cluster-management-backup   acm-managed-clusters-schedule-20240830101055    15m
open-cluster-management-backup   acm-managed-clusters-schedule-20240830101554    10m
open-cluster-management-backup   acm-managed-clusters-schedule-20240830102054    5m5s
open-cluster-management-backup   acm-resources-generic-schedule-20240830101055   13m
open-cluster-management-backup   acm-resources-generic-schedule-20240830101554   8m6s
open-cluster-management-backup   acm-resources-generic-schedule-20240830102054   3m6s
open-cluster-management-backup   acm-resources-schedule-20240830101055           13m
open-cluster-management-backup   acm-resources-schedule-20240830101554           8m5s
open-cluster-management-backup   acm-resources-schedule-20240830102054           3m5s
open-cluster-management-backup   acm-validation-policy-schedule-20240830101554   8m5s
open-cluster-management-backup   acm-validation-policy-schedule-20240830102054   3m5s
////////////
No resources found in open-cluster-management-backup namespace.
////////////
NAME                     REMEDIATION ACTION   COMPLIANCE STATE   AGE
backup-restore-enabled   inform               Compliant          16h
////////////
NAME      PHASE       LAST VALIDATED   AGE     DEFAULT
default   Available   10s              3h12m   true
////////////
No resources found in open-cluster-management-backup namespace.



DRPC at active hub

drpc
////////////////////////////////
Fri Aug 30 10:26:49 UTC 2024
*******
NAMESPACE              NAME                                    AGE   PREFERREDCLUSTER    FAILOVERCLUSTER     DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION          PEER READY
busybox-workloads-10   rbd-sub-busybox10-placement-1-drpc      38h   amagrawa-c2-28aug   amagrawa-c1-28aug   Relocate       Relocated      Completed     2024-08-29T11:43:31Z   4m7.14536939s     True
busybox-workloads-11   rbd-sub-busybox11-placement-1-drpc      38h   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-28T19:35:56Z   16.037619869s     True
busybox-workloads-12   rbd-sub-busybox12-placement-1-drpc      38h   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-28T19:36:52Z   22.031782646s     True
busybox-workloads-13   cephfs-sub-busybox13-placement-1-drpc   38h   amagrawa-c1-28aug   amagrawa-c2-28aug   Failover       FailedOver     Completed     2024-08-29T11:45:04Z   2m39.777750435s   True
busybox-workloads-14   cephfs-sub-busybox14-placement-1-drpc   38h   amagrawa-c2-28aug   amagrawa-c1-28aug   Relocate       Relocated      Completed     2024-08-29T11:45:13Z   3m27.543350385s   True
busybox-workloads-15   cephfs-sub-busybox15-placement-1-drpc   38h   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-28T19:49:28Z   37.092845577s     True
busybox-workloads-16   cephfs-sub-busybox16-placement-1-drpc   38h   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-28T19:50:18Z   43.102846191s     True
busybox-workloads-9    rbd-sub-busybox9-placement-1-drpc       38h   amagrawa-c1-28aug   amagrawa-c2-28aug   Failover       FailedOver     Completed     2024-08-29T11:43:17Z   4m40.034887668s   True
openshift-gitops       cephfs-appset-busybox5-placement-drpc   42h   amagrawa-c1-28aug   amagrawa-c2-28aug   Failover       FailedOver     Completed     2024-08-29T11:44:45Z   2m48.484996609s   True
openshift-gitops       cephfs-appset-busybox6-placement-drpc   42h   amagrawa-c2-28aug   amagrawa-c1-28aug   Relocate       Relocated      Completed     2024-08-29T11:44:50Z   5m20.161782009s   True
openshift-gitops       cephfs-appset-busybox7-placement-drpc   42h   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-28T16:15:35Z   45.117386476s     True
openshift-gitops       cephfs-appset-busybox8-placement-drpc   42h   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-28T16:16:28Z   48.103711373s     True
openshift-gitops       rbd-appset-busybox1-placement-drpc      42h   amagrawa-c1-28aug   amagrawa-c2-28aug   Failover       FailedOver     Completed     2024-08-29T11:42:58Z   5m21.790526304s   True
openshift-gitops       rbd-appset-busybox2-placement-drpc      42h   amagrawa-c2-28aug   amagrawa-c1-28aug   Relocate       Relocated      Completed     2024-08-29T11:43:05Z   6m34.107132984s   True
openshift-gitops       rbd-appset-busybox3-placement-drpc      42h   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-28T16:11:39Z   5.045949912s      True
openshift-gitops       rbd-appset-busybox4-placement-drpc      42h   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-28T16:12:36Z   1.042272661s      True




================================================================================================================================================================
DRPC state after backup was stopped:

From active hub-

drpc
////////////////////////////////
Fri Aug 30 17:12:07 UTC 2024
*******
NAMESPACE              NAME                                    AGE    PREFERREDCLUSTER    FAILOVERCLUSTER     DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION          PEER READY
busybox-workloads-10   rbd-sub-busybox10-placement-1-drpc      45h    amagrawa-c1-28aug   amagrawa-c2-28aug   Relocate       Relocated      Completed     2024-08-30T10:33:03Z   4m31.136839895s   True
busybox-workloads-11   rbd-sub-busybox11-placement-1-drpc      45h    amagrawa-c1-28aug                                      Deployed       Completed     2024-08-28T19:35:56Z   16.037619869s     True
busybox-workloads-12   rbd-sub-busybox12-placement-1-drpc      45h    amagrawa-c2-28aug                                      Deployed       Completed     2024-08-28T19:36:52Z   22.031782646s     True
busybox-workloads-13   cephfs-sub-busybox13-placement-1-drpc   45h    amagrawa-c1-28aug   amagrawa-c2-28aug   Relocate       Relocated      Completed     2024-08-30T10:31:41Z   3m34.276720598s   True
busybox-workloads-14   cephfs-sub-busybox14-placement-1-drpc   45h    amagrawa-c2-28aug   amagrawa-c1-28aug   Failover       FailedOver     Completed     2024-08-30T10:31:49Z   3m11.99710934s    True
busybox-workloads-15   cephfs-sub-busybox15-placement-1-drpc   45h    amagrawa-c1-28aug                                      Deployed       Completed     2024-08-28T19:49:28Z   37.092845577s     True
busybox-workloads-16   cephfs-sub-busybox16-placement-1-drpc   45h    amagrawa-c2-28aug                                      Deployed       Completed     2024-08-28T19:50:18Z   43.102846191s     True
busybox-workloads-9    rbd-sub-busybox9-placement-1-drpc       45h    amagrawa-c2-28aug   amagrawa-c1-28aug   Failover       FailedOver     Completed     2024-08-30T10:32:55Z   4m25.040487678s   True
openshift-gitops       cephfs-appset-busybox5-placement-drpc   2d     amagrawa-c2-28aug   amagrawa-c1-28aug   Failover       FailedOver     Completed     2024-08-30T10:32:11Z   3m9.894131846s    True
openshift-gitops       cephfs-appset-busybox6-placement-drpc   2d     amagrawa-c1-28aug   amagrawa-c2-28aug   Relocate       Relocated      Completed     2024-08-30T10:32:18Z   3m17.106363595s   True
openshift-gitops       cephfs-appset-busybox7-placement-drpc   2d     amagrawa-c1-28aug                                      Deployed       Completed     2024-08-28T16:15:35Z   45.117386476s     True
openshift-gitops       cephfs-appset-busybox8-placement-drpc   2d     amagrawa-c2-28aug                                      Deployed       Completed     2024-08-28T16:16:28Z   48.103711373s     True
openshift-gitops       rbd-appset-busybox1-placement-drpc      2d1h   amagrawa-c1-28aug   amagrawa-c2-28aug   Relocate       Relocated      Completed     2024-08-30T10:33:24Z   5m26.300788158s   True
openshift-gitops       rbd-appset-busybox2-placement-drpc      2d1h   amagrawa-c2-28aug   amagrawa-c1-28aug   Failover       FailedOver     Completed     2024-08-30T10:33:32Z   4m48.836673355s   True
openshift-gitops       rbd-appset-busybox3-placement-drpc      2d1h   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-28T16:11:39Z   5.045949912s      True
openshift-gitops       rbd-appset-busybox4-placement-drpc      2d1h   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-28T16:12:36Z   1.042272661s      True



group
******************************
Fri Aug 30 17:12:12 UTC 2024
*******
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-10
    namespace: busybox-workloads-10
      namespace: busybox-workloads-10
    lastGroupSyncTime: "2024-08-30T17:10:00Z"
        namespace: busybox-workloads-10
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-11
    namespace: busybox-workloads-11
      namespace: busybox-workloads-11
    lastGroupSyncTime: "2024-08-30T17:10:00Z"
        namespace: busybox-workloads-11
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-12
    namespace: busybox-workloads-12
      namespace: busybox-workloads-12
    lastGroupSyncTime: "2024-08-30T17:10:00Z"
        namespace: busybox-workloads-12
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-13
    namespace: busybox-workloads-13
      namespace: busybox-workloads-13
    lastGroupSyncTime: "2024-08-30T17:10:45Z"
        namespace: busybox-workloads-13
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-14
    namespace: busybox-workloads-14
      namespace: busybox-workloads-14
    lastGroupSyncTime: "2024-08-30T17:10:45Z"
        namespace: busybox-workloads-14
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-15
    namespace: busybox-workloads-15
      namespace: busybox-workloads-15
    lastGroupSyncTime: "2024-08-30T17:10:57Z"
        namespace: busybox-workloads-15
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-16
    namespace: busybox-workloads-16
      namespace: busybox-workloads-16
    lastGroupSyncTime: "2024-08-30T17:11:05Z"
        namespace: busybox-workloads-16
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-9
    namespace: busybox-workloads-9
      namespace: busybox-workloads-9
    lastGroupSyncTime: "2024-08-30T17:10:00Z"
        namespace: busybox-workloads-9
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-5
    namespace: openshift-gitops
      namespace: openshift-gitops
    lastGroupSyncTime: "2024-08-30T17:10:46Z"
        namespace: busybox-workloads-5
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-6
    namespace: openshift-gitops
      namespace: openshift-gitops
    lastGroupSyncTime: "2024-08-30T17:10:48Z"
        namespace: busybox-workloads-6
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-7
    namespace: openshift-gitops
      namespace: openshift-gitops
    lastGroupSyncTime: "2024-08-30T17:11:02Z"
        namespace: busybox-workloads-7
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-8
    namespace: openshift-gitops
      namespace: openshift-gitops
    lastGroupSyncTime: "2024-08-30T17:11:13Z"
        namespace: busybox-workloads-8
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-1
    namespace: openshift-gitops
      namespace: openshift-gitops
    lastGroupSyncTime: "2024-08-30T17:10:00Z"
        namespace: busybox-workloads-1
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-2
    namespace: openshift-gitops
      namespace: openshift-gitops
    lastGroupSyncTime: "2024-08-30T17:10:00Z"
        namespace: busybox-workloads-2
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-3
    namespace: openshift-gitops
      namespace: openshift-gitops
    lastGroupSyncTime: "2024-08-30T17:10:00Z"
        namespace: busybox-workloads-3
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-4
    namespace: openshift-gitops
      namespace: openshift-gitops
    lastGroupSyncTime: "2024-08-30T17:10:00Z"
        namespace: busybox-workloads-4


date -u
Fri Aug 30 17:12:15 UTC 2024


backup
NAMESPACE                        NAME                                            AGE
open-cluster-management-backup   acm-credentials-schedule-20240830101055         7h1m
open-cluster-management-backup   acm-credentials-schedule-20240830101554         6h56m
open-cluster-management-backup   acm-credentials-schedule-20240830102054         6h51m
open-cluster-management-backup   acm-managed-clusters-schedule-20240830101055    7h1m
open-cluster-management-backup   acm-managed-clusters-schedule-20240830101554    6h56m
open-cluster-management-backup   acm-managed-clusters-schedule-20240830102054    6h51m
open-cluster-management-backup   acm-resources-generic-schedule-20240830101055   7h1m
open-cluster-management-backup   acm-resources-generic-schedule-20240830101554   6h56m
open-cluster-management-backup   acm-resources-generic-schedule-20240830102054   6h51m
open-cluster-management-backup   acm-resources-schedule-20240830101055           7h1m
open-cluster-management-backup   acm-resources-schedule-20240830101554           6h56m
open-cluster-management-backup   acm-resources-schedule-20240830102054           6h51m
////////////
NAME           PHASE     MESSAGE
schedule-acm   Enabled   Velero schedules are enabled
////////////
NAME                     REMEDIATION ACTION   COMPLIANCE STATE   AGE
backup-restore-enabled   inform               NonCompliant       2d19h
////////////
NAME      PHASE       LAST VALIDATED   AGE     DEFAULT
default   Available   37s              2d19h   true
////////////
NAME                             STATUS    SCHEDULE       LASTBACKUP   AGE    PAUSED
acm-credentials-schedule         Enabled   0 */99 * * *   6h51m        7h1m
acm-managed-clusters-schedule    Enabled   0 */99 * * *   6h51m        7h1m
acm-resources-generic-schedule   Enabled   0 */99 * * *   6h51m        7h1m
acm-resources-schedule           Enabled   0 */99 * * *   6h51m        7h1m
acm-validation-policy-schedule   Enabled   0 */99 * * *   6h51m        7h1m





From passive hub-


backup
NAMESPACE                        NAME                                            AGE
open-cluster-management-backup   acm-credentials-schedule-20240830101055         7h1m
open-cluster-management-backup   acm-credentials-schedule-20240830101554         6h56m
open-cluster-management-backup   acm-credentials-schedule-20240830102054         6h51m
open-cluster-management-backup   acm-managed-clusters-schedule-20240830101055    7h1m
open-cluster-management-backup   acm-managed-clusters-schedule-20240830101554    6h56m
open-cluster-management-backup   acm-managed-clusters-schedule-20240830102054    6h51m
open-cluster-management-backup   acm-resources-generic-schedule-20240830101055   6h59m
open-cluster-management-backup   acm-resources-generic-schedule-20240830101554   6h54m
open-cluster-management-backup   acm-resources-generic-schedule-20240830102054   6h49m
open-cluster-management-backup   acm-resources-schedule-20240830101055           6h59m
open-cluster-management-backup   acm-resources-schedule-20240830101554           6h54m
open-cluster-management-backup   acm-resources-schedule-20240830102054           6h49m
////////////
No resources found in open-cluster-management-backup namespace.
////////////
NAME                     REMEDIATION ACTION   COMPLIANCE STATE   AGE
backup-restore-enabled   inform               NonCompliant       23h
////////////
NAME      PHASE       LAST VALIDATED   AGE   DEFAULT
default   Available   22s              9h    true
////////////
No resources found in open-cluster-management-backup namespace.



================================================================================
Active hub was brought down around Fri Aug 30 17:14:12 UTC 2024
================================================================================

================================================================================
Restored backups on passive hub at around date -u
Fri Aug 30 17:23:02 UTC 2024
================================================================================

================================================================================
DRpolicy got validated at about date -u
Fri Aug 30 17:24:28 UTC 2024
================================================================================

DRPC state after hub recovery:

From new active hub-


drpc
////////////////////////////////
Fri Aug 30 21:51:31 UTC 2024
*******
NAMESPACE              NAME                                    AGE     PREFERREDCLUSTER    FAILOVERCLUSTER     DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION            PEER READY
busybox-workloads-10   rbd-sub-busybox10-placement-1-drpc      4h30m   amagrawa-c2-28aug   amagrawa-c1-28aug   Relocate       WaitForUser    Paused                                                   True
busybox-workloads-11   rbd-sub-busybox11-placement-1-drpc      4h30m   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-30T17:24:36Z   898.25644ms         True
busybox-workloads-12   rbd-sub-busybox12-placement-1-drpc      4h30m   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-30T17:24:59Z   997.389937ms        True
busybox-workloads-13   cephfs-sub-busybox13-placement-1-drpc   4h30m   amagrawa-c1-28aug   amagrawa-c2-28aug   Failover       FailedOver     Completed     2024-08-30T17:24:59Z   2m37.256839681s     True
busybox-workloads-14   cephfs-sub-busybox14-placement-1-drpc   4h30m   amagrawa-c2-28aug   amagrawa-c1-28aug   Relocate       WaitForUser    Paused                                                   True
busybox-workloads-15   cephfs-sub-busybox15-placement-1-drpc   4h30m   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-30T17:24:59Z   1.797831304s        True
busybox-workloads-16   cephfs-sub-busybox16-placement-1-drpc   4h30m   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-30T17:24:59Z   1.297596905s        True
busybox-workloads-9    rbd-sub-busybox9-placement-1-drpc       4h30m   amagrawa-c1-28aug   amagrawa-c2-28aug   Failover       FailedOver     Completed     2024-08-30T17:24:37Z   1h4m2.552357112s    True
openshift-gitops       cephfs-appset-busybox5-placement-drpc   4h30m   amagrawa-c1-28aug   amagrawa-c2-28aug   Failover       FailedOver     Completed     2024-08-30T17:25:00Z   2m36.525322643s     True
openshift-gitops       cephfs-appset-busybox6-placement-drpc   4h30m   amagrawa-c2-28aug   amagrawa-c1-28aug   Relocate       WaitForUser    Paused                                                   True
openshift-gitops       cephfs-appset-busybox7-placement-drpc   4h30m   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-30T17:24:59Z   996.947009ms        True
openshift-gitops       cephfs-appset-busybox8-placement-drpc   4h30m   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-30T17:25:02Z   597.880944ms        True
openshift-gitops       rbd-appset-busybox1-placement-drpc      4h30m   amagrawa-c1-28aug   amagrawa-c2-28aug   Failover       FailedOver     Completed     2024-08-30T17:24:36Z   1h3m32.768428942s   True
openshift-gitops       rbd-appset-busybox2-placement-drpc      4h30m   amagrawa-c2-28aug   amagrawa-c1-28aug   Relocate       WaitForUser    Paused                                                   True
openshift-gitops       rbd-appset-busybox3-placement-drpc      4h30m   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-30T17:24:36Z   297.132956ms        True
openshift-gitops       rbd-appset-busybox4-placement-drpc      4h30m   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-30T17:24:59Z   1.498228814s        True

================================================================================

Then I relocated below apps via ACM UI around Fri Aug 30 21:53:30 UTC 2024

(This is after evictiontimeout period of 1hr passed)

rbd-sub-busybox10-placement-1-drpc
cephfs-sub-busybox14-placement-1-drpc
cephfs-appset-busybox6-placement-drpc
rbd-appset-busybox2-placement-drpc


DRPC state after relocate-



drpc
////////////////////////////////
Fri Aug 30 22:14:38 UTC 2024
*******
NAMESPACE              NAME                                    AGE     PREFERREDCLUSTER    FAILOVERCLUSTER     DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION            PEER READY
busybox-workloads-10   rbd-sub-busybox10-placement-1-drpc      4h53m   amagrawa-c1-28aug   amagrawa-c2-28aug   Relocate       Relocated      Completed                                                True
busybox-workloads-11   rbd-sub-busybox11-placement-1-drpc      4h53m   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-30T17:24:36Z   898.25644ms         True
busybox-workloads-12   rbd-sub-busybox12-placement-1-drpc      4h53m   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-30T17:24:59Z   997.389937ms        True
busybox-workloads-13   cephfs-sub-busybox13-placement-1-drpc   4h53m   amagrawa-c1-28aug   amagrawa-c2-28aug   Failover       FailedOver     Completed     2024-08-30T17:24:59Z   2m37.256839681s     True
busybox-workloads-14   cephfs-sub-busybox14-placement-1-drpc   4h53m   amagrawa-c1-28aug   amagrawa-c2-28aug   Relocate       WaitForUser    Paused                                                   True
busybox-workloads-15   cephfs-sub-busybox15-placement-1-drpc   4h53m   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-30T17:24:59Z   1.797831304s        True
busybox-workloads-16   cephfs-sub-busybox16-placement-1-drpc   4h53m   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-30T17:24:59Z   1.297596905s        True
busybox-workloads-9    rbd-sub-busybox9-placement-1-drpc       4h53m   amagrawa-c1-28aug   amagrawa-c2-28aug   Failover       FailedOver     Completed     2024-08-30T17:24:37Z   1h4m2.552357112s    True
openshift-gitops       cephfs-appset-busybox5-placement-drpc   4h53m   amagrawa-c1-28aug   amagrawa-c2-28aug   Failover       FailedOver     Completed     2024-08-30T17:25:00Z   2m36.525322643s     True
openshift-gitops       cephfs-appset-busybox6-placement-drpc   4h53m   amagrawa-c2-28aug   amagrawa-c1-28aug   Relocate       WaitForUser    Paused                                                   True
openshift-gitops       cephfs-appset-busybox7-placement-drpc   4h53m   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-30T17:24:59Z   996.947009ms        True
openshift-gitops       cephfs-appset-busybox8-placement-drpc   4h53m   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-30T17:25:02Z   597.880944ms        True
openshift-gitops       rbd-appset-busybox1-placement-drpc      4h53m   amagrawa-c1-28aug   amagrawa-c2-28aug   Failover       FailedOver     Completed     2024-08-30T17:24:36Z   1h3m32.768428942s   True
openshift-gitops       rbd-appset-busybox2-placement-drpc      4h53m   amagrawa-c1-28aug   amagrawa-c2-28aug   Relocate       WaitForUser    Paused                                                   True
openshift-gitops       rbd-appset-busybox3-placement-drpc      4h53m   amagrawa-c1-28aug                                      Deployed       Completed     2024-08-30T17:24:36Z   297.132956ms        True
openshift-gitops       rbd-appset-busybox4-placement-drpc      4h53m   amagrawa-c2-28aug                                      Deployed       Completed     2024-08-30T17:24:59Z   1.498228814s        True


================================================================================================================================================================



There was no change in the drpc state for workloads 
cephfs-sub-busybox14-placement-1-drpc
cephfs-appset-busybox6-placement-drpc
rbd-appset-busybox2-placement-drpc


For rbd-sub-busybox10-placement-1-drpc, Relocate was marked as Completed but the workload still had issues.


NAMESPACE              NAME                                    AGE     PREFERREDCLUSTER    FAILOVERCLUSTER     DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION            PEER READY
busybox-workloads-10   rbd-sub-busybox10-placement-1-drpc      4h53m   amagrawa-c1-28aug   amagrawa-c2-28aug   Relocate       Relocated      Completed                                                True



C1-

busybox-10
Already on project "busybox-workloads-10" on server "https://api.amagrawa-c1-28aug.qe.rh-ocs.com:6443".
NAME                                   STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE   VOLUMEMODE
persistentvolumeclaim/busybox-pvc-41   Terminating   pvc-6c950553-71ad-4eb6-ac17-7ada3c5a7c2c   42Gi       RWO            ocs-storagecluster-ceph-rbd   <unset>                 22h   Filesystem

NAME                                                                AGE   VOLUMEREPLICATIONCLASS                  PVCNAME          DESIREDSTATE   CURRENTSTATE
volumereplication.replication.storage.openshift.io/busybox-pvc-41   22h   rbd-volumereplicationclass-1625360775   busybox-pvc-41   primary        Primary

NAME                                                                             DESIREDSTATE   CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/rbd-sub-busybox10-placement-1-drpc   primary        Primary

NAME                              READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
pod/busybox-41-5c55b45d49-6vb7h   0/1     Pending   0          11h   <none>   <none>   <none>           <none>


Here PVC and Pod is in terminating state


oc get deploy
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
busybox-41   0/1     1            0           11h


From C2- NA



Other outputs:

C1-

busybox-14
Now using project "busybox-workloads-14" on server "https://api.amagrawa-c1-28aug.qe.rh-ocs.com:6443".
NAME                                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                    VOLUMEATTRIBUTESCLASS   AGE     VOLUMEMODE
persistentvolumeclaim/busybox-pvc-1               Bound    pvc-3bd6202c-5aba-478b-86f4-571d9adb9334   94Gi       RWX            ocs-storagecluster-cephfs       <unset>                 2d13h   Filesystem
persistentvolumeclaim/volsync-busybox-pvc-1-src   Bound    pvc-b79186e4-a159-4896-8571-964fd041e5cc   94Gi       ROX            ocs-storagecluster-cephfs-vrg   <unset>                 28s     Filesystem

NAME                                                                                DESIREDSTATE   CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/cephfs-sub-busybox14-placement-1-drpc   primary        Primary

NAME                                            READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
pod/volsync-rsync-tls-src-busybox-pvc-1-cttmh   1/1     Running   0          29s   10.129.3.110   compute-0   <none>           <none>


Here deployment/pod is lost

oc get deploy
No resources found in busybox-workloads-14 namespace.


C2-

busybox-14
Now using project "busybox-workloads-14" on server "https://api.amagrawa-c2-28aug.qe.rh-ocs.com:6443".
NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                VOLUMEATTRIBUTESCLASS   AGE     VOLUMEMODE
persistentvolumeclaim/busybox-pvc-1   Bound    pvc-9e4f5561-4deb-43e2-9c13-eb98895fe550   94Gi       RWX            ocs-storagecluster-cephfs   <unset>                 2d13h   Filesystem

NAME                                                                                DESIREDSTATE   CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/cephfs-sub-busybox14-placement-1-drpc   secondary      Secondary

NAME                                            READY   STATUS    RESTARTS   AGE     IP             NODE        NOMINATED NODE   READINESS GATES
pod/volsync-rsync-tls-dst-busybox-pvc-1-bnbd7   1/1     Running   0          4m59s   10.128.2.204   compute-1   <none>           <none>




C1-


busybox-6
Now using project "busybox-workloads-6" on server "https://api.amagrawa-c1-28aug.qe.rh-ocs.com:6443".
NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                VOLUMEATTRIBUTESCLASS   AGE     VOLUMEMODE
persistentvolumeclaim/busybox-pvc-1   Bound    pvc-0e84e495-fe6e-45b1-8c27-9f4c92055b29   94Gi       RWX            ocs-storagecluster-cephfs   <unset>                 2d17h   Filesystem

NAME                                                                                DESIREDSTATE   CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/cephfs-appset-busybox6-placement-drpc   primary        Primary


Here deployment/pod is lost


oc get deploy
No resources found in busybox-workloads-6 namespace.

C2-

busybox-6
Now using project "busybox-workloads-6" on server "https://api.amagrawa-c2-28aug.qe.rh-ocs.com:6443".
NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                VOLUMEATTRIBUTESCLASS   AGE     VOLUMEMODE
persistentvolumeclaim/busybox-pvc-1   Bound    pvc-867dedb7-d424-43ab-bf0e-ec213fa6b295   94Gi       RWX            ocs-storagecluster-cephfs   <unset>                 2d17h   Filesystem

NAME                                                                                DESIREDSTATE   CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/cephfs-appset-busybox6-placement-drpc   secondary      Secondary

NAME                                            READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
pod/volsync-rsync-tls-dst-busybox-pvc-1-bfx9r   1/1     Running   0          56s   10.128.2.213   compute-1   <none>           <none>



C1-

busybox-2
Now using project "busybox-workloads-2" on server "https://api.amagrawa-c1-28aug.qe.rh-ocs.com:6443".
NAME                                   STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE   VOLUMEMODE
persistentvolumeclaim/busybox-pvc-41   Terminating   pvc-fe3f096a-323f-4592-93a8-376249d948f2   42Gi       RWO            ocs-storagecluster-ceph-rbd   <unset>                 22h   Filesystem

NAME                                                                AGE   VOLUMEREPLICATIONCLASS                  PVCNAME          DESIREDSTATE   CURRENTSTATE
volumereplication.replication.storage.openshift.io/busybox-pvc-41   22h   rbd-volumereplicationclass-1625360775   busybox-pvc-41   primary        Primary

NAME                                                                             DESIREDSTATE   CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/rbd-appset-busybox2-placement-drpc   primary        Primary


Here PVC is in terminating state and deployment/pod is lost


oc get deploy
No resources found in busybox-workloads-2 namespace.



C2- NA


Expected results: Deployment/pod should not be lost and apps should be relocated successfully when relocate operation is triggered post hub recovery for above mentioned apps.

Additional info:

Logs collected after triggering relocate operation post hub recovery- 

http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/31aug24/

Comment 9 Sunil Kumar Acharya 2024-09-18 12:06:54 UTC
Please update the RDT flag/text appropriately.


Note You need to log in before you can comment on or make changes to this bug.