Bug 2203182
| Summary: | [MDR] After hub recovery ApplicationSet apps are not present | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | avdhoot <asagare> |
| Component: | odf-dr | Assignee: | Benamar Mekhissi <bmekhiss> |
| odf-dr sub component: | unclassified | QA Contact: | krishnaram Karthick <kramdoss> |
| Status: | ASSIGNED --- | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | akrai, amagrawa, bmekhiss, fxiang, hnallurv, jpacker, kseeger, muagarwa, odf-bz-bot, pbyregow, thnguyen, vbirsan |
| Version: | 4.13 | Flags: | akrai:
needinfo-
|
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
avdhoot
2023-05-11 12:57:13 UTC
1. Can you attach the yaml file for the acm-resources-schedule backup restored on the new hub The name is shown under the Restore status, on the veleroResourcesRestoreName property - the name is prefixed by your Restore resource name, so take that out when computing the name ( in the attached image, the backup name you are looking for is acm-resources-schedule-202203.. etc ) I want to verify that the spec.includedResources section include the argoproj.io.applicationset This would confirm that the backup is looking for this type of resource and backing it up 2. If the spec in 1. above shows this CRD is backed up, then check when your ApplicationSet was created on the initial hub and when backup was taken ( is the ApplicationSet create timestamp before the backup create time ? ) 3. Can you let us know what ACM version are you using 1. Can you attach the yaml file for the acm-resources-schedule backup restored on the new hub The name is shown under the Restore status, on the veleroResourcesRestoreName property - the name is prefixed by your Restore resource name, so take that out when computing the name ( in the attached image, the backup name you are looking for is acm-resources-schedule-202203.. etc ) I want to verify that the spec.includedResources section include the argoproj.io.applicationset This would confirm that the backup is looking for this type of resource and backing it up 2. If the spec in 1. above shows this CRD is backed up, then check when your ApplicationSet was created on the initial hub and when backup was taken ( is the ApplicationSet create timestamp before the backup create time ? ) 3. Can you let us know what ACM version are you using (In reply to vbirsan from comment #13) > 1. Can you attach the yaml file for the acm-resources-schedule backup > restored on the new hub > The name is shown under the Restore status, on the > veleroResourcesRestoreName property - the name is prefixed by your Restore > resource name, so take that out when computing the name ( in the attached > image, the backup name you are looking for is > acm-resources-schedule-202203.. etc ) > > I want to verify that the spec.includedResources section include the > argoproj.io.applicationset > This would confirm that the backup is looking for this type of resource and > backing it up > > 2. If the spec in 1. above shows this CRD is backed up, then check when your > ApplicationSet was created on the initial hub and when backup was taken ( is > the ApplicationSet create timestamp before the backup create time ? ) > > 3. Can you let us know what ACM version are you using Hello, I checked Avdhoot's cluster. 1. Restore yaml: apiVersion: cluster.open-cluster-management.io/v1beta1 kind: Restore metadata: creationTimestamp: '2023-05-11T12:15:45Z' generation: 1 managedFields: - apiVersion: cluster.open-cluster-management.io/v1beta1 fieldsType: FieldsV1 fieldsV1: 'f:spec': .: {} 'f:cleanupBeforeRestore': {} 'f:veleroCredentialsBackupName': {} 'f:veleroManagedClustersBackupName': {} 'f:veleroResourcesBackupName': {} manager: kubectl-create operation: Update time: '2023-05-11T12:15:45Z' - apiVersion: cluster.open-cluster-management.io/v1beta1 fieldsType: FieldsV1 fieldsV1: 'f:status': .: {} 'f:lastMessage': {} 'f:phase': {} 'f:veleroCredentialsRestoreName': {} 'f:veleroManagedClustersRestoreName': {} 'f:veleroResourcesRestoreName': {} manager: manager operation: Update subresource: status time: '2023-05-11T12:16:35Z' name: restore-acm namespace: open-cluster-management-backup resourceVersion: '8420332' uid: 3bff736d-33d3-4cde-963f-0f7641cd119d spec: cleanupBeforeRestore: CleanupRestored veleroCredentialsBackupName: latest veleroManagedClustersBackupName: latest veleroResourcesBackupName: latest status: lastMessage: All Velero restores have run successfully phase: Finished veleroCredentialsRestoreName: restore-acm-acm-credentials-schedule-20230511120040 veleroManagedClustersRestoreName: restore-acm-acm-managed-clusters-schedule-20230511120040 veleroResourcesRestoreName: restore-acm-acm-resources-schedule-20230511120040. ======>>> schedule is acm-resources-schedule-20230511120040 Its corresponding backup yaml ie, acm-resources-schedule-20230511120040: includeClusterResources: true includedResources: - managedproxyserviceresolver.proxy.open-cluster-management.io - managedproxyconfiguration.proxy.open-cluster-management.io - clusterstatus.proxy.open-cluster-management.io - subscription.apps.open-cluster-management.io - deployable.apps.open-cluster-management.io - channel.apps.open-cluster-management.io - helmrelease.apps.open-cluster-management.io - placementrule.apps.open-cluster-management.io - gitopscluster.apps.open-cluster-management.io - subscriptionstatus.apps.open-cluster-management.io - subscriptionreport.apps.open-cluster-management.io - managedclusterset.cluster.open-cluster-management.io - managedclustersetbinding.cluster.open-cluster-management.io - placementdecision.cluster.open-cluster-management.io - placement.cluster.open-cluster-management.io - addonplacementscore.cluster.open-cluster-management.io - userpreference.console.open-cluster-management.io - discoveryconfig.discovery.open-cluster-management.io - policy.policy.open-cluster-management.io - placementbinding.policy.open-cluster-management.io - configurationpolicy.policy.open-cluster-management.io - iampolicy.policy.open-cluster-management.io - certificatepolicy.policy.open-cluster-management.io - policyset.policy.open-cluster-management.io - policyautomation.policy.open-cluster-management.io - addondeploymentconfig.addon.open-cluster-management.io - applicationset.argoproj.io ================> applicationset.argoproj.io is backed up. You mentioned it should be 'argoproj.io.applicationset' in comment#13, guessing it is a typo? - appproject.argoproj.io - argocd.argoproj.io - application.argoproj.io - observatorium.core.observatorium.io - managedclusterimageregistry.imageregistry.open-cluster-management.io - submarinerdiagnoseconfig.submarineraddon.open-cluster-management.io - submarinerconfig.submarineraddon.open-cluster-management.io - managedclusteraction.action.open-cluster-management.io - infraenv.agent-install.openshift.io - hypershiftagentserviceconfig.agent-install.openshift.io - agentserviceconfig.agent-install.openshift.io - agentclassification.agent-install.openshift.io - nmstateconfig.agent-install.openshift.io - agent.agent-install.openshift.io - application.app.k8s.io - multiclusterobservability.observability.open-cluster-management.io - managedclusterview.view.open-cluster-management.io - managedclusterset.clusterview.open-cluster-management.io labelSelector: matchExpressions: - key: policy.open-cluster-management.io/root-policy operator: DoesNotExist - key: cluster.open-cluster-management.io/backup operator: DoesNotExist metadata: {} storageLocation: dpa-1 ttl: 2h0m0s status: completionTimestamp: "2023-05-11T12:04:02Z" expiration: "2023-05-11T14:02:58Z" formatVersion: 1.1.0 phase: Completed progress: itemsBackedUp: 65 totalItems: 65 startTimestamp: "2023-05-11T12:02:58Z" version: 1 2. Application set app creation time is '11 May 2023, 11:06 UTC', so the app creation time is before the time backup is taken, which is "2023-05-11T12:04:02Z". One observation, the drpc shown below of the created appset apps are restored but not the apps in acm on new hub. NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE openshift-gitops busybox-rbd-appset-placement-drpc 25m asagare-clut1 openshift-gitops job-appset-placement-drpc 25m asagare-clut1 3. ACM version is 2.7.3(GA) You can find complete yaml files of backup/restore at: backup: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-2203182/acm/registry-redhat-io-rhacm2-acm-must-gather-rhel8-sha256-5c79bf93599b792c20c76d9a0a35532ad79a45f3765dccfbafab9a273e338e52/namespaces/open-cluster-management-backup/velero.io/backups/ restore: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-2203182/acm/registry-redhat-io-rhacm2-acm-must-gather-rhel8-sha256-5c79bf93599b792c20c76d9a0a35532ad79a45f3765dccfbafab9a273e338e52/namespaces/open-cluster-management-backup/velero.io/restores/ Thank you for the extra details On your question : applicationset.argoproj.io is backed up. You mentioned it should be 'argoproj.io.applicationset' in comment#13, guessing it is a typo? Yes, I meant applicationset.argoproj.io So it seems that the applicationset.argoproj.io are supposed to be backed up and the appset was created on the hub before the backup run. From the attached backup yaml I see that there were 64 items backup and 63 restored From the backup : name: acm-resources-schedule-20230511120040 namespace: open-cluster-management-backup ... progress: itemsBackedUp: 65 totalItems: 65 From restore name: restore-acm-acm-resources-schedule-20230511120040 namespace: open-cluster-management-backup ... status: completionTimestamp: "2023-05-11T12:15:50Z" phase: Completed progress: itemsRestored: 63 totalItems: 63 I want to check if the applicationsets are indeed backed up , so I want to check the backup log; and if they are restore, checking the restore log Can you please attach the 2 logs here To get the backup log, either hubs, create the resource below; you should get a url in the status, this is the log download url apiVersion: velero.io/v1 kind: DownloadRequest metadata: name: downloadrequest-backup namespace: open-cluster-management-backup spec: target: kind: BackupLog name: acm-resources-schedule-20230511120040 To get the restore log, from the restore hub, create the resource below; you should get a url in the status, this is the log download url apiVersion: velero.io/v1 kind: DownloadRequest metadata: name: downloadrequest-restore namespace: open-cluster-management-backup spec: target: kind: RestoreLog name: restore-acm-acm-resources-schedule-20230511120040 These DownloadRequest have a ttl of few minutes so they are deleted soon after creation @vbirsan
I have created two resources as per mentioned above but in status I am not able to get url for download.
avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-backup -n open-cluster-management-backup -o jsonpath='{.status.url}'
avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-restore -n open-cluster-management-backup -o jsonpath='{.status.url}'
avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-backup -n open-cluster-management-backup
NAME AGE
downloadrequest-backup 11m
avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-restore -n open-cluster-management-backup
NAME AGE
downloadrequest-restore 97s
avd@Avdhoots-MBP hub-passive %
@vbirsan Please check above comment. Can you open the yaml, prop name is downloadURL
apiVersion: velero.io/v1
kind: DownloadRequest
metadata:
creationTimestamp: "2023-05-17T15:13:29Z"
generation: 2
managedFields:
- apiVersion: velero.io/v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
.: {}
f:target:
.: {}
f:kind: {}
f:name: {}
manager: Mozilla
operation: Update
time: "2023-05-17T15:13:29Z"
- apiVersion: velero.io/v1
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:downloadURL: {}
f:expiration: {}
f:phase: {}
manager: velero-server
operation: Update
time: "2023-05-17T15:13:29Z"
name: downloadrequest-backup-1
namespace: open-cluster-management-backup
resourceVersion: "9889126"
uid: 14a5d1e1-a8a0-4d5f-a6f4-8beb81fc60a9
spec:
target:
kind: BackupLog
name: acm-resources-schedule-20230516201845
status:
downloadURL: https://vb-velero-backup.s3.amazonaws.com/acm-hub-1/backups/acm-resources-schedule-2023051
avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-backup -n open-cluster-management-backup -o yaml
apiVersion: velero.io/v1
kind: DownloadRequest
metadata:
creationTimestamp: "2023-05-18T09:33:32Z"
generation: 5
name: downloadrequest-backup
namespace: open-cluster-management-backup
resourceVersion: "21872380"
uid: d9470aaf-de31-40b3-8baf-b537269fd56e
spec:
target:
kind: BackupLog
name: acm-resources-schedule-20230511120040
status:
expiration: "2023-05-18T09:43:37Z"
avd@Avdhoots-MBP hub-passive % oc get downloadrequest downloadrequest-restore -n open-cluster-management-backup -o yaml
apiVersion: velero.io/v1
kind: DownloadRequest
metadata:
creationTimestamp: "2023-05-18T09:35:35Z"
generation: 5
name: downloadrequest-restore
namespace: open-cluster-management-backup
resourceVersion: "21875318"
uid: 038ae330-c4fd-4c63-98c8-6e2ba5070651
spec:
target:
kind: RestoreLog
name: restore-acm-acm-resources-schedule-20230511120040
status:
expiration: "2023-05-18T09:45:44Z"
Please correct me if i am wrong. Backup was automatically deleted from s3 storage(2hrs ttl), so may be it is not able to generate url.
I will try to reproduce issue again and will update the bz.
That is correct, the log can be retrieved only if the resource ( backup , restore ) is still available. (If you can's see the backup or restore CR using the oc get .. command line than they no linger exist) The log is actually located in the s3 storage, same folder with the actual backup zip so you can look for it there too; I tried to give you the instructions to get the logs without the need to login to s3 HI, I tried to reproduce issue again but issue could not be reproduced That's why reducing the severity for now to high. I Will close BZ if not reproduced after few more tries. Removing the blocker flag which was set because of the urgent severity. Not a 4.13 blocker |