Created attachment 1669732 [details] velero logs Description of problem: Migration fails due to restic time out when migrating 50 projects. Version-Release number of selected component (if applicable): # oc describe pod/velero-658c4d8945-2mppg | grep Image Image: quay.io/konveyor/migration-plugin:latest Image ID: quay.io/konveyor/migration-plugin@sha256:fd6617aa9f86e4760cc076c25f973152ce9c85f83ac1de2cdaafdab860f69d5c Image: quay.io/konveyor/velero-plugin-for-aws:latest Image ID: quay.io/konveyor/velero-plugin-for-aws@sha256:b9867c14816ce3c6797c676988192df771fa54503596931b138aafad91af36a5 Image: quay.io/konveyor/velero-plugin-for-gcp:latest Image ID: quay.io/konveyor/velero-plugin-for-gcp@sha256:a641d610403dbbd3a83f2bcb1f46d91fe9b79563c27bea744b3c60147be93cd5 Image: quay.io/konveyor/velero-plugin-for-microsoft-azure:latest Image ID: quay.io/konveyor/velero-plugin-for-microsoft-azure@sha256:a57c97de744d967d591023e1847507689b66a312ccac709e6c5e5468d865c3d3 Image: quay.io/konveyor/velero:latest Image ID: quay.io/konveyor/velero@sha256:e96d4ba17adfbe4032bd850f3c2b268d87d91422e4f03c97ed816148970b3e9e Image: quay.io/konveyor/velero:latest Image ID: quay.io/konveyor/velero@sha256:e96d4ba17adfbe4032bd850f3c2b268d87d91422e4f03c97ed816148970b3e9e [root@rpattath ~]# oc describe pod/restic-7j88n | grep Image Image: quay.io/konveyor/velero:latest Image ID: quay.io/konveyor/velero@sha256:e96d4ba17adfbe4032bd850f3c2b268d87d91422e4f03c97ed816148970b3e9e Image: quay.io/konveyor/velero:latest Image ID: quay.io/konveyor/velero@sha256:e96d4ba17adfbe4032bd850f3c2b268d87d91422e4f03c97ed816148970b3e9e # oc get clusterversions.config.openshift.io NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-03-09-033343 True False 3d3h Cluster version is 4.4.0-0.nightly-2020-03-09-033343 How reproducible: always Steps to Reproduce: 1. Do the following to load the source cluster with the workload # cat svt/openshift_scalability/mig-test-project-scale.yaml projects: - num: 50 basename: migtest- templates: - num: 5 file: ./content/build-template.json - num: 1 file: ./content/quickstarts/django/django-postgresql-pv.json - num: 1 file: ./content/deployment-config-2rep-template.json parameters: - ENV_VALUE: "asodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij12" - num: 20 file: ./content/ssh-secret-template.json - num: 2 file: ./content/configmap-template.json # rcs and services are implemented in deployments. quotas: - name: default Run the above yaml using cluster-loader.yaml from https://github.com/openshift/svt/tree/master/openshift_scalability 2. This is the migration controller I am using # oc get migrationcontroller migration-controller -o yaml apiVersion: migration.openshift.io/v1alpha1 kind: MigrationController metadata: creationTimestamp: "2020-03-12T15:06:13Z" generation: 2 name: migration-controller namespace: openshift-migration resourceVersion: "1689757" selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migrationcontrollers/migration-controller uid: 450de38c-0b04-4359-92d5-bb255f1ac69e spec: azure_resource_group: "" cluster_name: host mig_controller_image: quay.io/jortel/mig-controller mig_controller_version: ocp4.4-compat mig_namespace_limit: "60" mig_pod_limit: "500" mig_pv_limit: "500" migration_controller: true migration_ui: true migration_velero: true olm_managed: true restic_timeout: 10h version: 1.0 (OLM) status: phase: Reconciled 3.Migrate 50 projects Actual results: Migration fails during backup restore. Expected results: Migration should be successful. Additional info: Attaching the complete velero debug log from the destination cluster. I had filed https://bugzilla.redhat.com/show_bug.cgi?id=1749831, not sure if the reason behind the issue is the same
This likely has nothing to do with migrating 50 namespaces, but more specifically that the restic restores were never acted upon properly. It would be helpful to get the output of `oc get podvolumerestores -n openshift-migration -o yaml` as this will tell us what exactly happened to the restic restores being run. The timeout generally means something went wrong with restic and Velero couldn't recover. Please paste this output if you still have this environment available.
Created attachment 1673583 [details] output of podvolumerestore Attaching the output of # oc get podvolumerestores -n openshift-migration -o yaml
*** Bug 1831605 has been marked as a duplicate of this bug. ***