Bug 1813025 - Migration fails due to restic time out when migrating 50 projects.
Summary: Migration fails due to restic time out when migrating 50 projects.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 1.4.z
Assignee: Dylan Murray
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
: 1831605 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-12 18:25 UTC by Roshni
Modified: 2021-01-14 21:51 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1831605 (view as bug list)
Environment:
Last Closed: 2021-01-14 21:51:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
velero logs (849.24 KB, text/plain)
2020-03-12 18:25 UTC, Roshni
no flags Details
output of podvolumerestore (44.57 KB, text/plain)
2020-03-25 17:41 UTC, Roshni
no flags Details

Description Roshni 2020-03-12 18:25:23 UTC
Created attachment 1669732 [details]
velero logs

Description of problem:
Migration fails due to restic time out when migrating 50 projects.

Version-Release number of selected component (if applicable):
# oc describe pod/velero-658c4d8945-2mppg | grep Image
    Image:          quay.io/konveyor/migration-plugin:latest
    Image ID:       quay.io/konveyor/migration-plugin@sha256:fd6617aa9f86e4760cc076c25f973152ce9c85f83ac1de2cdaafdab860f69d5c
    Image:          quay.io/konveyor/velero-plugin-for-aws:latest
    Image ID:       quay.io/konveyor/velero-plugin-for-aws@sha256:b9867c14816ce3c6797c676988192df771fa54503596931b138aafad91af36a5
    Image:          quay.io/konveyor/velero-plugin-for-gcp:latest
    Image ID:       quay.io/konveyor/velero-plugin-for-gcp@sha256:a641d610403dbbd3a83f2bcb1f46d91fe9b79563c27bea744b3c60147be93cd5
    Image:          quay.io/konveyor/velero-plugin-for-microsoft-azure:latest
    Image ID:       quay.io/konveyor/velero-plugin-for-microsoft-azure@sha256:a57c97de744d967d591023e1847507689b66a312ccac709e6c5e5468d865c3d3
    Image:         quay.io/konveyor/velero:latest
    Image ID:      quay.io/konveyor/velero@sha256:e96d4ba17adfbe4032bd850f3c2b268d87d91422e4f03c97ed816148970b3e9e
    Image:         quay.io/konveyor/velero:latest
    Image ID:      quay.io/konveyor/velero@sha256:e96d4ba17adfbe4032bd850f3c2b268d87d91422e4f03c97ed816148970b3e9e
[root@rpattath ~]# oc describe pod/restic-7j88n | grep Image
    Image:         quay.io/konveyor/velero:latest
    Image ID:      quay.io/konveyor/velero@sha256:e96d4ba17adfbe4032bd850f3c2b268d87d91422e4f03c97ed816148970b3e9e
    Image:         quay.io/konveyor/velero:latest
    Image ID:      quay.io/konveyor/velero@sha256:e96d4ba17adfbe4032bd850f3c2b268d87d91422e4f03c97ed816148970b3e9e

# oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-03-09-033343   True        False         3d3h    Cluster version is 4.4.0-0.nightly-2020-03-09-033343


How reproducible:
always

Steps to Reproduce:
1. Do the following to load the source cluster with the workload
# cat svt/openshift_scalability/mig-test-project-scale.yaml 
projects:
  - num: 50
    basename: migtest-
    templates:
      -
        num: 5
        file: ./content/build-template.json
      -
        num: 1
        file: ./content/quickstarts/django/django-postgresql-pv.json
      -
        num: 1
        file: ./content/deployment-config-2rep-template.json
        parameters:
          -
            ENV_VALUE: "asodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij12"
      -
        num: 20
        file: ./content/ssh-secret-template.json
      -
        num: 2
        file: ./content/configmap-template.json
      # rcs and services are implemented in deployments.
quotas:
  - name: default

Run the above yaml using cluster-loader.yaml from https://github.com/openshift/svt/tree/master/openshift_scalability

2. This is the migration controller I am using
# oc get migrationcontroller migration-controller -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: MigrationController
metadata:
  creationTimestamp: "2020-03-12T15:06:13Z"
  generation: 2
  name: migration-controller
  namespace: openshift-migration
  resourceVersion: "1689757"
  selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migrationcontrollers/migration-controller
  uid: 450de38c-0b04-4359-92d5-bb255f1ac69e
spec:
  azure_resource_group: ""
  cluster_name: host
  mig_controller_image: quay.io/jortel/mig-controller
  mig_controller_version: ocp4.4-compat
  mig_namespace_limit: "60"
  mig_pod_limit: "500"
  mig_pv_limit: "500"
  migration_controller: true
  migration_ui: true
  migration_velero: true
  olm_managed: true
  restic_timeout: 10h
  version: 1.0 (OLM)
status:
  phase: Reconciled
3.Migrate 50 projects

Actual results:
Migration fails during backup restore.

Expected results:
Migration should be successful.

Additional info:
Attaching the complete velero debug log from the destination cluster. 
I had filed https://bugzilla.redhat.com/show_bug.cgi?id=1749831, not sure if the reason behind the issue is the same

Comment 1 Dylan Murray 2020-03-16 20:24:13 UTC
This likely has nothing to do with migrating 50 namespaces, but more specifically that the restic restores were never acted upon properly. It would be helpful to get the output of `oc get podvolumerestores -n openshift-migration -o yaml` as this will tell us what exactly happened to the restic restores being run.

The timeout generally means something went wrong with restic and Velero couldn't recover.

Please paste this output if you still have this environment available.

Comment 2 Roshni 2020-03-25 17:41:30 UTC
Created attachment 1673583 [details]
output of podvolumerestore

Attaching the output of 

# oc get podvolumerestores -n openshift-migration -o yaml

Comment 4 John Matthews 2020-06-16 13:04:42 UTC
*** Bug 1831605 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.