Description of problem (please be detailed as possible and provide log snippests): Brought down active zone(i.e. active hub,c1,ceph nodes). After passive hub recovery subscription based apps are present but applicationset apps are not shown in new active hub. avd@Avdhoots-MBP hub-passive % oc get drpc -A NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE busybox-rbd busybox-cephrbd-placement-1-drpc 25m asagare-clut1 Deployed job-cephfs job-cephfs-placement-1-drpc 25m asagare-clut1 Deployed openshift-gitops busybox-rbd-appset-placement-drpc 25m asagare-clut1 openshift-gitops demo-app-placement-drpc 25m asagare-clust2 asagare-clut1 Relocate openshift-gitops job-appset-placement-drpc 25m asagare-clut1 Attached screenshots only subscription based app are present. Version of all relevant components (if applicable): OCP- 4.13-nightly ODF- 4.13-170 ACM- 2.7.3 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? YES, since appset apps are not present after hub recovery. Is there any workaround available to the best of your knowledge? NO Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 5 Can this issue reproducible? 1/1 Can this issue reproduce from the UI? yes If this is a regression, please provide more details to justify this: No Steps to Reproduce: 1.Configure MDR with c1,c2,active hub and passive hub 2.configure DR policy. 3.create subscription and applicationset based Apps(I created 2 subscription based and 2 applicationset based app in c1 cluster). 4.Apply DR policy to all apps. 5.configure backup for hub recovery 6..bring down zone belonging to active hub. 7.Restore backup on passive hub Actual results: After hub recovery, ApplicationSets apps are not shown in new active hub Expected results: After hub recovery, ApplicationSets apps should be shown in new active hub. Additional info:
1. Can you attach the yaml file for the acm-resources-schedule backup restored on the new hub The name is shown under the Restore status, on the veleroResourcesRestoreName property - the name is prefixed by your Restore resource name, so take that out when computing the name ( in the attached image, the backup name you are looking for is acm-resources-schedule-202203.. etc ) I want to verify that the spec.includedResources section include the argoproj.io.applicationset This would confirm that the backup is looking for this type of resource and backing it up 2. If the spec in 1. above shows this CRD is backed up, then check when your ApplicationSet was created on the initial hub and when backup was taken ( is the ApplicationSet create timestamp before the backup create time ? ) 3. Can you let us know what ACM version are you using
(In reply to vbirsan from comment #13) > 1. Can you attach the yaml file for the acm-resources-schedule backup > restored on the new hub > The name is shown under the Restore status, on the > veleroResourcesRestoreName property - the name is prefixed by your Restore > resource name, so take that out when computing the name ( in the attached > image, the backup name you are looking for is > acm-resources-schedule-202203.. etc ) > > I want to verify that the spec.includedResources section include the > argoproj.io.applicationset > This would confirm that the backup is looking for this type of resource and > backing it up > > 2. If the spec in 1. above shows this CRD is backed up, then check when your > ApplicationSet was created on the initial hub and when backup was taken ( is > the ApplicationSet create timestamp before the backup create time ? ) > > 3. Can you let us know what ACM version are you using Hello, I checked Avdhoot's cluster. 1. Restore yaml: apiVersion: cluster.open-cluster-management.io/v1beta1 kind: Restore metadata: creationTimestamp: '2023-05-11T12:15:45Z' generation: 1 managedFields: - apiVersion: cluster.open-cluster-management.io/v1beta1 fieldsType: FieldsV1 fieldsV1: 'f:spec': .: {} 'f:cleanupBeforeRestore': {} 'f:veleroCredentialsBackupName': {} 'f:veleroManagedClustersBackupName': {} 'f:veleroResourcesBackupName': {} manager: kubectl-create operation: Update time: '2023-05-11T12:15:45Z' - apiVersion: cluster.open-cluster-management.io/v1beta1 fieldsType: FieldsV1 fieldsV1: 'f:status': .: {} 'f:lastMessage': {} 'f:phase': {} 'f:veleroCredentialsRestoreName': {} 'f:veleroManagedClustersRestoreName': {} 'f:veleroResourcesRestoreName': {} manager: manager operation: Update subresource: status time: '2023-05-11T12:16:35Z' name: restore-acm namespace: open-cluster-management-backup resourceVersion: '8420332' uid: 3bff736d-33d3-4cde-963f-0f7641cd119d spec: cleanupBeforeRestore: CleanupRestored veleroCredentialsBackupName: latest veleroManagedClustersBackupName: latest veleroResourcesBackupName: latest status: lastMessage: All Velero restores have run successfully phase: Finished veleroCredentialsRestoreName: restore-acm-acm-credentials-schedule-20230511120040 veleroManagedClustersRestoreName: restore-acm-acm-managed-clusters-schedule-20230511120040 veleroResourcesRestoreName: restore-acm-acm-resources-schedule-20230511120040. ======>>> schedule is acm-resources-schedule-20230511120040 Its corresponding backup yaml ie, acm-resources-schedule-20230511120040: includeClusterResources: true includedResources: - managedproxyserviceresolver.proxy.open-cluster-management.io - managedproxyconfiguration.proxy.open-cluster-management.io - clusterstatus.proxy.open-cluster-management.io - subscription.apps.open-cluster-management.io - deployable.apps.open-cluster-management.io - channel.apps.open-cluster-management.io - helmrelease.apps.open-cluster-management.io - placementrule.apps.open-cluster-management.io - gitopscluster.apps.open-cluster-management.io - subscriptionstatus.apps.open-cluster-management.io - subscriptionreport.apps.open-cluster-management.io - managedclusterset.cluster.open-cluster-management.io - managedclustersetbinding.cluster.open-cluster-management.io - placementdecision.cluster.open-cluster-management.io - placement.cluster.open-cluster-management.io - addonplacementscore.cluster.open-cluster-management.io - userpreference.console.open-cluster-management.io - discoveryconfig.discovery.open-cluster-management.io - policy.policy.open-cluster-management.io - placementbinding.policy.open-cluster-management.io - configurationpolicy.policy.open-cluster-management.io - iampolicy.policy.open-cluster-management.io - certificatepolicy.policy.open-cluster-management.io - policyset.policy.open-cluster-management.io - policyautomation.policy.open-cluster-management.io - addondeploymentconfig.addon.open-cluster-management.io - applicationset.argoproj.io ================> applicationset.argoproj.io is backed up. You mentioned it should be 'argoproj.io.applicationset' in comment#13, guessing it is a typo? - appproject.argoproj.io - argocd.argoproj.io - application.argoproj.io - observatorium.core.observatorium.io - managedclusterimageregistry.imageregistry.open-cluster-management.io - submarinerdiagnoseconfig.submarineraddon.open-cluster-management.io - submarinerconfig.submarineraddon.open-cluster-management.io - managedclusteraction.action.open-cluster-management.io - infraenv.agent-install.openshift.io - hypershiftagentserviceconfig.agent-install.openshift.io - agentserviceconfig.agent-install.openshift.io - agentclassification.agent-install.openshift.io - nmstateconfig.agent-install.openshift.io - agent.agent-install.openshift.io - application.app.k8s.io - multiclusterobservability.observability.open-cluster-management.io - managedclusterview.view.open-cluster-management.io - managedclusterset.clusterview.open-cluster-management.io labelSelector: matchExpressions: - key: policy.open-cluster-management.io/root-policy operator: DoesNotExist - key: cluster.open-cluster-management.io/backup operator: DoesNotExist metadata: {} storageLocation: dpa-1 ttl: 2h0m0s status: completionTimestamp: "2023-05-11T12:04:02Z" expiration: "2023-05-11T14:02:58Z" formatVersion: 1.1.0 phase: Completed progress: itemsBackedUp: 65 totalItems: 65 startTimestamp: "2023-05-11T12:02:58Z" version: 1 2. Application set app creation time is '11 May 2023, 11:06 UTC', so the app creation time is before the time backup is taken, which is "2023-05-11T12:04:02Z". One observation, the drpc shown below of the created appset apps are restored but not the apps in acm on new hub. NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE openshift-gitops busybox-rbd-appset-placement-drpc 25m asagare-clut1 openshift-gitops job-appset-placement-drpc 25m asagare-clut1 3. ACM version is 2.7.3(GA) You can find complete yaml files of backup/restore at: backup: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-2203182/acm/registry-redhat-io-rhacm2-acm-must-gather-rhel8-sha256-5c79bf93599b792c20c76d9a0a35532ad79a45f3765dccfbafab9a273e338e52/namespaces/open-cluster-management-backup/velero.io/backups/ restore: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-2203182/acm/registry-redhat-io-rhacm2-acm-must-gather-rhel8-sha256-5c79bf93599b792c20c76d9a0a35532ad79a45f3765dccfbafab9a273e338e52/namespaces/open-cluster-management-backup/velero.io/restores/
Thank you for the extra details On your question : applicationset.argoproj.io is backed up. You mentioned it should be 'argoproj.io.applicationset' in comment#13, guessing it is a typo? Yes, I meant applicationset.argoproj.io So it seems that the applicationset.argoproj.io are supposed to be backed up and the appset was created on the hub before the backup run. From the attached backup yaml I see that there were 64 items backup and 63 restored From the backup : name: acm-resources-schedule-20230511120040 namespace: open-cluster-management-backup ... progress: itemsBackedUp: 65 totalItems: 65 From restore name: restore-acm-acm-resources-schedule-20230511120040 namespace: open-cluster-management-backup ... status: completionTimestamp: "2023-05-11T12:15:50Z" phase: Completed progress: itemsRestored: 63 totalItems: 63 I want to check if the applicationsets are indeed backed up , so I want to check the backup log; and if they are restore, checking the restore log Can you please attach the 2 logs here To get the backup log, either hubs, create the resource below; you should get a url in the status, this is the log download url apiVersion: velero.io/v1 kind: DownloadRequest metadata: name: downloadrequest-backup namespace: open-cluster-management-backup spec: target: kind: BackupLog name: acm-resources-schedule-20230511120040 To get the restore log, from the restore hub, create the resource below; you should get a url in the status, this is the log download url apiVersion: velero.io/v1 kind: DownloadRequest metadata: name: downloadrequest-restore namespace: open-cluster-management-backup spec: target: kind: RestoreLog name: restore-acm-acm-resources-schedule-20230511120040 These DownloadRequest have a ttl of few minutes so they are deleted soon after creation
@vbirsan I have created two resources as per mentioned above but in status I am not able to get url for download. avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-backup -n open-cluster-management-backup -o jsonpath='{.status.url}' avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-restore -n open-cluster-management-backup -o jsonpath='{.status.url}' avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-backup -n open-cluster-management-backup NAME AGE downloadrequest-backup 11m avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-restore -n open-cluster-management-backup NAME AGE downloadrequest-restore 97s avd@Avdhoots-MBP hub-passive %
@vbirsan Please check above comment.
Can you open the yaml, prop name is downloadURL apiVersion: velero.io/v1 kind: DownloadRequest metadata: creationTimestamp: "2023-05-17T15:13:29Z" generation: 2 managedFields: - apiVersion: velero.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:target: .: {} f:kind: {} f:name: {} manager: Mozilla operation: Update time: "2023-05-17T15:13:29Z" - apiVersion: velero.io/v1 fieldsType: FieldsV1 fieldsV1: f:status: .: {} f:downloadURL: {} f:expiration: {} f:phase: {} manager: velero-server operation: Update time: "2023-05-17T15:13:29Z" name: downloadrequest-backup-1 namespace: open-cluster-management-backup resourceVersion: "9889126" uid: 14a5d1e1-a8a0-4d5f-a6f4-8beb81fc60a9 spec: target: kind: BackupLog name: acm-resources-schedule-20230516201845 status: downloadURL: https://vb-velero-backup.s3.amazonaws.com/acm-hub-1/backups/acm-resources-schedule-2023051
avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-backup -n open-cluster-management-backup -o yaml apiVersion: velero.io/v1 kind: DownloadRequest metadata: creationTimestamp: "2023-05-18T09:33:32Z" generation: 5 name: downloadrequest-backup namespace: open-cluster-management-backup resourceVersion: "21872380" uid: d9470aaf-de31-40b3-8baf-b537269fd56e spec: target: kind: BackupLog name: acm-resources-schedule-20230511120040 status: expiration: "2023-05-18T09:43:37Z" avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-restore -n open-cluster-management-backup -o yaml apiVersion: velero.io/v1 kind: DownloadRequest metadata: creationTimestamp: "2023-05-18T09:35:35Z" generation: 5 name: downloadrequest-restore namespace: open-cluster-management-backup resourceVersion: "21875318" uid: 038ae330-c4fd-4c63-98c8-6e2ba5070651 spec: target: kind: RestoreLog name: restore-acm-acm-resources-schedule-20230511120040 status: expiration: "2023-05-18T09:45:44Z" Please correct me if i am wrong. Backup was automatically deleted from s3 storage(2hrs ttl), so may be it is not able to generate url. I will try to reproduce issue again and will update the bz.
That is correct, the log can be retrieved only if the resource ( backup , restore ) is still available. (If you can's see the backup or restore CR using the oc get .. command line than they no linger exist) The log is actually located in the s3 storage, same folder with the actual backup zip so you can look for it there too; I tried to give you the instructions to get the logs without the need to login to s3
HI, I tried to reproduce issue again but issue could not be reproduced That's why reducing the severity for now to high. I Will close BZ if not reproduced after few more tries.
Removing the blocker flag which was set because of the urgent severity.
Not a 4.13 blocker