Bug 2203182

Summary:	[MDR] After hub recovery ApplicationSet apps are not present
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	avdhoot <asagare>
Component:	odf-dr	Assignee:	Benamar Mekhissi <bmekhiss>
odf-dr sub component:	unclassified	QA Contact:	Shrivaibavi Raghaventhiran <sraghave>
Status:	CLOSED CURRENTRELEASE	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	akrai, amagrawa, bmekhiss, fxiang, hnallurv, jpacker, kseeger, muagarwa, odf-bz-bot, pbyregow, thnguyen, vbirsan
Version:	4.13	Flags:	akrai: needinfo-
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-12-07 17:19:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description avdhoot 2023-05-11 12:57:13 UTC

Description of problem (please be detailed as possible and provide log
snippests):

Brought down active zone(i.e. active hub,c1,ceph nodes). After passive hub recovery subscription based apps are present but applicationset apps are not shown in new active hub.

avd@Avdhoots-MBP hub-passive % oc get drpc -A
NAMESPACE          NAME                                AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
busybox-rbd        busybox-cephrbd-placement-1-drpc    25m   asagare-clut1                                       Deployed
job-cephfs         job-cephfs-placement-1-drpc         25m   asagare-clut1                                       Deployed
openshift-gitops   busybox-rbd-appset-placement-drpc   25m   asagare-clut1                                       
openshift-gitops   demo-app-placement-drpc             25m   asagare-clust2     asagare-clut1     Relocate       
openshift-gitops   job-appset-placement-drpc           25m   asagare-clut1  

Attached screenshots only subscription based app are present.


Version of all relevant components (if applicable):

OCP- 4.13-nightly
ODF- 4.13-170
ACM- 2.7.3

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

YES, since appset apps are not present after hub recovery.

Is there any workaround available to the best of your knowledge?
NO

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
5

Can this issue reproducible?
1/1

Can this issue reproduce from the UI?
yes

If this is a regression, please provide more details to justify this:
No

Steps to Reproduce:
1.Configure MDR with c1,c2,active hub and passive hub
2.configure DR policy.
3.create subscription and applicationset based Apps(I created 2 subscription based and 2 applicationset based app in c1 cluster).
4.Apply DR policy to all apps.
5.configure backup for hub recovery
6..bring down zone belonging to active hub.
7.Restore backup on passive hub


Actual results:
After hub recovery, ApplicationSets apps are not shown in new active hub

Expected results:
 
After hub recovery, ApplicationSets apps should be shown in new active hub.

Additional info:

Comment 12 vbirsan 2023-05-16 20:32:23 UTC

1. Can you attach the yaml file for the acm-resources-schedule backup restored on the new hub
The name is shown under the Restore status, on the veleroResourcesRestoreName property - the name is prefixed by your Restore resource name, so take that out when computing the name  ( in the attached image, the backup name you are looking for is acm-resources-schedule-202203.. etc )

I want to verify that the spec.includedResources section include the argoproj.io.applicationset
This would confirm that the backup is looking for this type of resource and backing it up

2. If the spec in 1. above shows this CRD is backed up, then check when your ApplicationSet was created on the initial hub and when backup was taken ( is the ApplicationSet create timestamp before the backup create time ? )

3. Can you let us know what ACM version are you using

Comment 13 vbirsan 2023-05-16 20:32:53 UTC

1. Can you attach the yaml file for the acm-resources-schedule backup restored on the new hub
The name is shown under the Restore status, on the veleroResourcesRestoreName property - the name is prefixed by your Restore resource name, so take that out when computing the name  ( in the attached image, the backup name you are looking for is acm-resources-schedule-202203.. etc )

I want to verify that the spec.includedResources section include the argoproj.io.applicationset
This would confirm that the backup is looking for this type of resource and backing it up

2. If the spec in 1. above shows this CRD is backed up, then check when your ApplicationSet was created on the initial hub and when backup was taken ( is the ApplicationSet create timestamp before the backup create time ? )

3. Can you let us know what ACM version are you using

Comment 15 Parikshith 2023-05-17 09:20:19 UTC

(In reply to vbirsan from comment #13)
> 1. Can you attach the yaml file for the acm-resources-schedule backup
> restored on the new hub
> The name is shown under the Restore status, on the
> veleroResourcesRestoreName property - the name is prefixed by your Restore
> resource name, so take that out when computing the name  ( in the attached
> image, the backup name you are looking for is
> acm-resources-schedule-202203.. etc )
> 
> I want to verify that the spec.includedResources section include the
> argoproj.io.applicationset
> This would confirm that the backup is looking for this type of resource and
> backing it up
> 
> 2. If the spec in 1. above shows this CRD is backed up, then check when your
> ApplicationSet was created on the initial hub and when backup was taken ( is
> the ApplicationSet create timestamp before the backup create time ? )
> 
> 3. Can you let us know what ACM version are you using

Hello, 

I checked Avdhoot's cluster.

1. Restore yaml:

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Restore
metadata:
  creationTimestamp: '2023-05-11T12:15:45Z'
  generation: 1
  managedFields:
    - apiVersion: cluster.open-cluster-management.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec':
          .: {}
          'f:cleanupBeforeRestore': {}
          'f:veleroCredentialsBackupName': {}
          'f:veleroManagedClustersBackupName': {}
          'f:veleroResourcesBackupName': {}
      manager: kubectl-create
      operation: Update
      time: '2023-05-11T12:15:45Z'
    - apiVersion: cluster.open-cluster-management.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          .: {}
          'f:lastMessage': {}
          'f:phase': {}
          'f:veleroCredentialsRestoreName': {}
          'f:veleroManagedClustersRestoreName': {}
          'f:veleroResourcesRestoreName': {}
      manager: manager
      operation: Update
      subresource: status
      time: '2023-05-11T12:16:35Z'
  name: restore-acm
  namespace: open-cluster-management-backup
  resourceVersion: '8420332'
  uid: 3bff736d-33d3-4cde-963f-0f7641cd119d
spec:
  cleanupBeforeRestore: CleanupRestored
  veleroCredentialsBackupName: latest
  veleroManagedClustersBackupName: latest
  veleroResourcesBackupName: latest
status:
  lastMessage: All Velero restores have run successfully
  phase: Finished
  veleroCredentialsRestoreName: restore-acm-acm-credentials-schedule-20230511120040
  veleroManagedClustersRestoreName: restore-acm-acm-managed-clusters-schedule-20230511120040
  veleroResourcesRestoreName: restore-acm-acm-resources-schedule-20230511120040.                    ======>>> schedule is acm-resources-schedule-20230511120040


Its corresponding backup yaml ie, acm-resources-schedule-20230511120040:

  includeClusterResources: true
  includedResources:
  - managedproxyserviceresolver.proxy.open-cluster-management.io
  - managedproxyconfiguration.proxy.open-cluster-management.io
  - clusterstatus.proxy.open-cluster-management.io
  - subscription.apps.open-cluster-management.io
  - deployable.apps.open-cluster-management.io
  - channel.apps.open-cluster-management.io
  - helmrelease.apps.open-cluster-management.io
  - placementrule.apps.open-cluster-management.io
  - gitopscluster.apps.open-cluster-management.io
  - subscriptionstatus.apps.open-cluster-management.io
  - subscriptionreport.apps.open-cluster-management.io
  - managedclusterset.cluster.open-cluster-management.io
  - managedclustersetbinding.cluster.open-cluster-management.io
  - placementdecision.cluster.open-cluster-management.io
  - placement.cluster.open-cluster-management.io
  - addonplacementscore.cluster.open-cluster-management.io
  - userpreference.console.open-cluster-management.io
  - discoveryconfig.discovery.open-cluster-management.io
  - policy.policy.open-cluster-management.io
  - placementbinding.policy.open-cluster-management.io
  - configurationpolicy.policy.open-cluster-management.io
  - iampolicy.policy.open-cluster-management.io
  - certificatepolicy.policy.open-cluster-management.io
  - policyset.policy.open-cluster-management.io
  - policyautomation.policy.open-cluster-management.io
  - addondeploymentconfig.addon.open-cluster-management.io
  - applicationset.argoproj.io                              ================> applicationset.argoproj.io is backed up. You mentioned it should be 'argoproj.io.applicationset' in comment#13, guessing it is a typo?
  - appproject.argoproj.io
  - argocd.argoproj.io
  - application.argoproj.io
  - observatorium.core.observatorium.io
  - managedclusterimageregistry.imageregistry.open-cluster-management.io
  - submarinerdiagnoseconfig.submarineraddon.open-cluster-management.io
  - submarinerconfig.submarineraddon.open-cluster-management.io
  - managedclusteraction.action.open-cluster-management.io
  - infraenv.agent-install.openshift.io
  - hypershiftagentserviceconfig.agent-install.openshift.io
  - agentserviceconfig.agent-install.openshift.io
  - agentclassification.agent-install.openshift.io
  - nmstateconfig.agent-install.openshift.io
  - agent.agent-install.openshift.io
  - application.app.k8s.io
  - multiclusterobservability.observability.open-cluster-management.io
  - managedclusterview.view.open-cluster-management.io
  - managedclusterset.clusterview.open-cluster-management.io
  labelSelector:
    matchExpressions:
    - key: policy.open-cluster-management.io/root-policy
      operator: DoesNotExist
    - key: cluster.open-cluster-management.io/backup
      operator: DoesNotExist
  metadata: {}
  storageLocation: dpa-1
  ttl: 2h0m0s
status:
  completionTimestamp: "2023-05-11T12:04:02Z"
  expiration: "2023-05-11T14:02:58Z"
  formatVersion: 1.1.0
  phase: Completed
  progress:
    itemsBackedUp: 65
    totalItems: 65
  startTimestamp: "2023-05-11T12:02:58Z"
  version: 1


2. Application set app creation time is '11 May 2023, 11:06 UTC', so the app creation time is before the time backup is taken, which is "2023-05-11T12:04:02Z".
 
One observation, the drpc shown below of the created appset apps are restored but not the apps in acm on new hub. 
 
NAMESPACE          NAME                                AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
openshift-gitops   busybox-rbd-appset-placement-drpc   25m   asagare-clut1                                       
openshift-gitops   job-appset-placement-drpc           25m   asagare-clut1  
 
3. ACM version is 2.7.3(GA)

You can find complete yaml files of backup/restore at:

backup: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-2203182/acm/registry-redhat-io-rhacm2-acm-must-gather-rhel8-sha256-5c79bf93599b792c20c76d9a0a35532ad79a45f3765dccfbafab9a273e338e52/namespaces/open-cluster-management-backup/velero.io/backups/
restore: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-2203182/acm/registry-redhat-io-rhacm2-acm-must-gather-rhel8-sha256-5c79bf93599b792c20c76d9a0a35532ad79a45f3765dccfbafab9a273e338e52/namespaces/open-cluster-management-backup/velero.io/restores/

Comment 16 vbirsan 2023-05-17 13:47:20 UTC

Thank you for the extra details

On your question : applicationset.argoproj.io is backed up. You mentioned it should be 'argoproj.io.applicationset' in comment#13, guessing it is a typo?
Yes, I meant applicationset.argoproj.io

So it seems that the applicationset.argoproj.io are supposed to be backed up and the appset was created on the hub before the backup run. 

From the attached backup yaml I see that there were 64 items backup and 63 restored

From the backup :

  name: acm-resources-schedule-20230511120040
  namespace: open-cluster-management-backup
...
  progress:
    itemsBackedUp: 65
    totalItems: 65

From restore

  name: restore-acm-acm-resources-schedule-20230511120040
  namespace: open-cluster-management-backup
...
status:
  completionTimestamp: "2023-05-11T12:15:50Z"
  phase: Completed
  progress:
    itemsRestored: 63
    totalItems: 63


I want to check if the applicationsets are indeed backed up , so I want to check the backup log; and if they are restore, checking the restore log

Can you please attach the 2 logs here

To get the backup log, either hubs, create the resource below; you should get a url in the status, this is the log download url

apiVersion: velero.io/v1
kind: DownloadRequest
metadata:
  name: downloadrequest-backup
  namespace: open-cluster-management-backup
spec:
  target:
    kind: BackupLog
    name: acm-resources-schedule-20230511120040

To get the restore log, from the restore hub, create the resource below; you should get a url in the status, this is the log download url

apiVersion: velero.io/v1
kind: DownloadRequest
metadata:
  name: downloadrequest-restore
  namespace: open-cluster-management-backup
spec:
  target:
    kind: RestoreLog
    name: restore-acm-acm-resources-schedule-20230511120040


These DownloadRequest have a ttl of few minutes so they are deleted soon after creation

Comment 17 avdhoot 2023-05-17 14:46:26 UTC

@vbirsan 

I have created two resources as per mentioned above but in status I am not able to get url for download.

avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-backup -n open-cluster-management-backup -o jsonpath='{.status.url}'
avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-restore -n open-cluster-management-backup -o jsonpath='{.status.url}'
avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-backup -n open-cluster-management-backup                             
NAME                     AGE
downloadrequest-backup   11m
avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-restore -n open-cluster-management-backup                            
NAME                      AGE
downloadrequest-restore   97s
avd@Avdhoots-MBP hub-passive %

Comment 18 avdhoot 2023-05-17 14:52:29 UTC

@vbirsan 

Please check above comment.

Comment 19 vbirsan 2023-05-17 15:18:24 UTC

Can you open the yaml, prop name is downloadURL

apiVersion: velero.io/v1
kind: DownloadRequest
metadata:
  creationTimestamp: "2023-05-17T15:13:29Z"
  generation: 2
  managedFields:
  - apiVersion: velero.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:target:
          .: {}
          f:kind: {}
          f:name: {}
    manager: Mozilla
    operation: Update
    time: "2023-05-17T15:13:29Z"
  - apiVersion: velero.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:downloadURL: {}
        f:expiration: {}
        f:phase: {}
    manager: velero-server
    operation: Update
    time: "2023-05-17T15:13:29Z"
  name: downloadrequest-backup-1
  namespace: open-cluster-management-backup
  resourceVersion: "9889126"
  uid: 14a5d1e1-a8a0-4d5f-a6f4-8beb81fc60a9
spec:
  target:
    kind: BackupLog
    name: acm-resources-schedule-20230516201845
status:
  downloadURL: https://vb-velero-backup.s3.amazonaws.com/acm-hub-1/backups/acm-resources-schedule-2023051

Comment 20 avdhoot 2023-05-18 09:45:41 UTC

avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-backup -n open-cluster-management-backup -o yaml
apiVersion: velero.io/v1
kind: DownloadRequest
metadata:
  creationTimestamp: "2023-05-18T09:33:32Z"
  generation: 5
  name: downloadrequest-backup
  namespace: open-cluster-management-backup
  resourceVersion: "21872380"
  uid: d9470aaf-de31-40b3-8baf-b537269fd56e
spec:
  target:
    kind: BackupLog
    name: acm-resources-schedule-20230511120040
status:
  expiration: "2023-05-18T09:43:37Z"


avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-restore -n open-cluster-management-backup -o yaml
apiVersion: velero.io/v1
kind: DownloadRequest
metadata:
  creationTimestamp: "2023-05-18T09:35:35Z"
  generation: 5
  name: downloadrequest-restore
  namespace: open-cluster-management-backup
  resourceVersion: "21875318"
  uid: 038ae330-c4fd-4c63-98c8-6e2ba5070651
spec:
  target:
    kind: RestoreLog
    name: restore-acm-acm-resources-schedule-20230511120040
status:
  expiration: "2023-05-18T09:45:44Z"



Please correct me if i am wrong. Backup was automatically deleted from s3 storage(2hrs ttl), so may be it is not able to generate url. 

I will try to reproduce issue again and will update the bz.

Comment 21 vbirsan 2023-05-18 13:22:27 UTC

That is correct, the log can be retrieved only if the resource ( backup , restore ) is still available. (If you can's see the backup or restore CR using the oc get .. command line than they no linger exist)
The log is actually located in the s3 storage, same folder with the actual backup zip so you can look for it there too; I tried to give you the instructions to get the logs without the need to login to s3

Comment 22 avdhoot 2023-05-22 12:55:07 UTC

HI,


I tried to reproduce issue again but issue could not be reproduced  

That's why reducing the severity for now to high. I Will close BZ if not reproduced after few more tries.

Comment 23 Mudit Agarwal 2023-05-23 07:47:53 UTC

Removing the blocker flag which was set because of the urgent severity.

Comment 24 Mudit Agarwal 2023-05-30 11:27:54 UTC

Not a 4.13 blocker

Comment 30 Shrivaibavi Raghaventhiran 2023-10-20 13:54:57 UTC

Tested versions:
----------------
OCP - 4.14.0-0.nightly-2023-10-08-220853
ODF - 4.14.0-146.stable
ACM - 2.9.0-180

Test Steps:
------------
1. Create 4 OCP clusters such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster.
   Deploy cluster in such a way that
	zone a: arbiter ceph node
	zone b: c1, h1, 3 ceph nodes
	zone c: c2, h2, 3 ceph nodes
2. Configure MDR and deploy applications(appset and subscription) on each managed clusters. Apply drpolicy to all apps.
3. Initiate a backup process, such that the active and passive hubs are in sync
4. Made zone b down, ie c1, h1 and 3 ceph nodes
5. INSTALL GITOPS operator on passive hub before restore process
6. Initiate the restore process on h2
7. Restore succeeded in new-hub, dr policy on h2 in validated state
8. Check if apps displayed on ACM UI
9. Check DRPC statuses on ACM UI

Validation:
-------------
1. ACM UI displays appset apps and sub apps
2. DRPCs of all appsets and subscription are in good state


With above observation moving the BZ to verified.

Comment 32 Red Hat Bugzilla 2024-04-06 04:25:05 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days