Bug 2091965 - [MTC] Migrations gets stuck at StageBackup stage for indirect runs [OADP-BL]
Summary: [MTC] Migrations gets stuck at StageBackup stage for indirect runs [OADP-BL]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: Controller
Version: 1.7.2
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 1.7.2
Assignee: Jason Montleon
QA Contact: Prasad Joshi
Richard Hoch
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-31 12:28 UTC by Prasad Joshi
Modified: 2022-08-02 07:45 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-02 07:45:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Prasad Joshi 2022-05-31 12:28:19 UTC
Description of problem: Migrations are getting stuck at stageBackup stage, when triggered with indirect mode. For direct mode it’s working fine.


Version-Release number of selected component (if applicable):
Source GCP 4.6   MTC 1.7.2 + OADP 1.0.3
Target GCP 4.10   MTC 1.7.2 + OADP 1.0.3


How reproducible:
Always


Steps to Reproduce:
1. Deploy an application in source cluster
2. Trigger migration with indirect mode

Actual results: Migrations are getting stuck at StageBackup stage. 

$ oc get migmigration  migration-52827 -o yaml

spec:
  migPlanRef:
    name: test4
    namespace: openshift-migration
  quiescePods: true
  stage: false
status:
  conditions:
  - category: Advisory
    lastTransitionTime: "2022-05-31T10:15:05Z"
    message: 'Step: 30/49'
    reason: StageBackupCreated
    status: "True"
    type: Running
  - category: Required
    lastTransitionTime: "2022-05-31T10:13:31Z"
    message: The migration is ready.
    status: "True"
    type: Ready
  - category: Required
    durable: true
    lastTransitionTime: "2022-05-31T10:14:05Z"
    message: The migration registries are healthy.
    status: "True"
    type: RegistriesHealthy
  - category: Advisory
    durable: true
    lastTransitionTime: "2022-05-31T10:14:37Z"
    message: '[1] Stage pods created.'
    status: "True"
    type: StagePodsCreated
  itinerary: Final
  observedDigest: 6a51be85e3b968769b1713084a928b5114ec8e9b3c26662cf534ade8ed78b794
  phase: StageBackupCreated
  pipeline:
  - completed: "2022-05-31T10:14:06Z"
    message: Completed
    name: Prepare
    started: "2022-05-31T10:13:31Z"
  - completed: "2022-05-31T10:14:26Z"
    message: Completed
    name: Backup
    progress:
    - 'Backup openshift-migration/migration-52827-initial-nrqvg: 41 out of estimated total of 41 objects backed up (17s)'
    started: "2022-05-31T10:14:06Z"
  - message: Waiting for stage backup to complete.
    name: StageBackup
    phase: StageBackupCreated
    progress:
    - 'Backup openshift-migration/migration-52827-stage-z8w4d: 0 out of estimated total of 5 objects backed up (52m56s)'
    - 'PodVolumeBackup openshift-migration/migration-52827-stage-z8w4d-f76h9: 0 bytes out of 0 bytes backed up (52m40s)'
    started: "2022-05-31T10:14:26Z"
  - message: Not started
    name: StageRestore
  - message: Not started
    name: Restore
  - message: Not started
    name: Cleanup
  startTimestamp: "2022-05-31T10:13:31Z"

$ oc logs migration-log-reader-5d6d95499b-72bvn -c color

openshift-migration velero-57c48b4bb-n9s4x velero time="2022-05-31T11:08:24Z" level=info msg="Found 1 backups in the backup location that do not exist in the cluster and need to be synced" backupLocation=automatic-c6mbt controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:204"
openshift-migration velero-57c48b4bb-n9s4x velero time="2022-05-31T11:08:24Z" level=info msg="Attempting to sync backup into cluster" backup=migration-58d98-initial-rsrdt backupLocation=automatic-c6mbt controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:212"
openshift-migration velero-57c48b4bb-n9s4x velero time="2022-05-31T11:08:24Z" level=error msg="Error getting backup metadata from backup store" backup=migration-58d98-initial-rsrdt backupLocation=automatic-c6mbt controller=backup-sync error="rpc error: code = Unknown desc = storage: object doesn't exist" error.file="/remote-source/src/github.com/vmware-tanzu/velero/pkg/persistence/object_store.go:289" error.function="github.com/vmware-tanzu/velero/pkg/persistence.(*objectBackupStore).GetBackupMetadata" logSource="pkg/controller/backup_sync_controller.go:216"
openshift-migration velero-57c48b4bb-n9s4x velero time="2022-05-31T11:08:24Z" level=info msg="Validating backup storage location" backup-storage-location=automatic-c6mbt controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:114"
openshift-migration velero-57c48b4bb-n9s4x velero time="2022-05-31T11:08:24Z" level=info msg="Found 1 backups in the backup location that do not exist in the cluster and need to be synced" backupLocation=automatic-gt8v9 controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:204"
openshift-migration velero-57c48b4bb-n9s4x velero time="2022-05-31T11:08:24Z" level=info msg="Attempting to sync backup into cluster" backup=migration-58d98-initial-rsrdt backupLocation=automatic-gt8v9 controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:212"
openshift-migration velero-57c48b4bb-n9s4x velero time="2022-05-31T11:08:24Z" level=info msg="Backup storage location valid, marking as available" backup-storage-location=automatic-c6mbt controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:121"
openshift-migration velero-57c48b4bb-n9s4x velero time="2022-05-31T11:08:24Z" level=info msg="Validating backup storage location" backup-storage-location=automatic-gt8v9 controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:114"
openshift-migration velero-57c48b4bb-n9s4x velero time="2022-05-31T11:08:24Z" level=error msg="Error getting backup metadata from backup store" backup=migration-58d98-initial-rsrdt backupLocation=automatic-gt8v9 controller=backup-sync error="rpc error: code = Unknown desc = storage: object doesn't exist" error.file="/remote-source/src/github.com/vmware-tanzu/velero/pkg/persistence/object_store.go:289" error.function="github.com/vmware-tanzu/velero/pkg/persistence.(*objectBackupStore).GetBackupMetadata" logSource="pkg/controller/backup_sync_controller.go:216"
openshift-migration velero-57c48b4bb-n9s4x velero time="2022-05-31T11:08:24Z" level=info msg="Backup storage location valid, marking as available" backup-storage-location=automatic-gt8v9 controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:121"
openshift-migration migration-controller-56d764884-7fxkd mtc {"level":"info","ts":1653995305.3079288,"logger":"migration","msg":"Checking registry health","migMigration":"migration-52827"}
openshift-migration migration-controller-56d764884-7fxkd mtc {"level":"info","ts":1653995305.389897,"logger":"migration","msg":"Found 2/2 registries in healthy condition.","migMigration":"migration-52827","message":""}
openshift-migration migration-controller-56d764884-7fxkd mtc {"level":"info","ts":1653995305.390091,"logger":"migration","msg":"[RUN] (Step 30/49) Waiting for stage backup to complete.","migMigration":"migration-52827","phase":"StageBackupCreated"}
openshift-migration migration-controller-56d764884-7fxkd mtc {"level":"info","ts":1653995305.8250961,"logger":"migration","msg":"Velero Backup progress report","migMigration":"migration-52827","phase":"StageBackupCreated","backup":"openshift-migration/migration-52827-stage-z8w4d","backupProgress":["Backup openshift-migration/migration-52827-stage-z8w4d: 0 out of estimated total of 5 objects backed up (53m21s)","PodVolumeBackup openshift-migration/migration-52827-stage-z8w4d-f76h9: 0 bytes out of 0 bytes backed up (53m5s)"]}
openshift-migration migration-controller-56d764884-7fxkd mtc {"level":"info","ts":1653995305.8251326,"logger":"migration","msg":"Stage Backup on source cluster is incomplete. Waiting.","migMigration":"migration-52827","phase":"StageBackupCreated","backup":"openshift-migration/migration-52827-stage-z8w4d","backupPhase":"InProgress","backupProgress":"0/5","backupWarnings":0,"backupErrors":0}


Expected results: Migrations should be successful.


Additional info:

Comment 1 Erik Nelson 2022-06-03 15:16:59 UTC
We're expecting a fix for this to land in OADP 1.0.3 on 6/14.

Comment 2 Prasad Joshi 2022-06-06 13:22:39 UTC
Verified with MTC 1.7.2 + OADP 1.0.3 (Stage)

$ oc get csv -n openshift-migration
NAME                   DISPLAY                                     VERSION   REPLACES   PHASE
mtc-operator.v1.7.2    Migration Toolkit for Containers Operator   1.7.2                Succeeded
oadp-operator.v1.0.3   OADP Operator                               1.0.3                Succeeded
 

$ oc get migplan -n openshift-migration test-indirect -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: MigPlan
metadata:
  annotations:
    migration.openshift.io/selected-migplan-type: full
  name: test-indirect
  namespace: openshift-migration
spec:
  destMigClusterRef:
    name: host
    namespace: openshift-migration
  indirectImageMigration: true
  indirectVolumeMigration: true
  migStorageRef:
    name: automatic
    namespace: openshift-migration
  namespaces:
  - ocp-django
  persistentVolumes:
  - capacity: 1Gi
    name: pvc-ecd2c872-946b-4710-b733-21f347bcb7ea
    proposedCapacity: 1Gi
    pvc:
      accessModes:
      - ReadWriteOnce
      hasReference: true
      name: postgresql
      namespace: ocp-django
    selection:
      action: copy
      copyMethod: filesystem
      storageClass: standard
    storageClass: standard
    supported:
      actions:
      - skip
      - copy
      copyMethods:
      - filesystem
      - snapshot
  srcMigClusterRef:
    name: source-cluster
    namespace: openshift-migration

$ oc get migmigration -n openshift-migration migration-f3c5c -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
  labels:
    migration.openshift.io/migplan-name: test-indirect
    migration.openshift.io/migration-uid: 3ede7239-d48e-43db-8ed6-99af1eff761e
  name: migration-f3c5c
  namespace: openshift-migration
spec:
  migPlanRef:
    name: test-indirect
    namespace: openshift-migration
  quiescePods: true
  stage: false
status:
  conditions:
  - category: Advisory
    durable: true
    lastTransitionTime: "2022-06-06T06:46:40Z"
    message: The migration has completed successfully.
    reason: Completed
    status: "True"
    type: Succeeded
  itinerary: Final
  observedDigest: b0320c2fc4ba12a915d7133d0b2bc798024ce836ae4c87417098721261076177
  phase: Completed
  pipeline:
  - completed: "2022-06-06T06:43:53Z"
    message: Completed
    name: Prepare
    started: "2022-06-06T06:43:20Z"
  - completed: "2022-06-06T06:44:19Z"
    message: Completed
    name: Backup
    progress:
    - 'Backup openshift-migration/migration-f3c5c-initial-gkll5: 41 out of estimated total of 41 objects backed up (17s)'
    started: "2022-06-06T06:43:53Z"
  - completed: "2022-06-06T06:45:36Z"
    message: Completed
    name: StageBackup
    progress:
    - 'Backup openshift-migration/migration-f3c5c-stage-xp444: 6 out of estimated total of 6 objects backed up (21s)'
    - 'PodVolumeBackup openshift-migration/migration-f3c5c-stage-xp444-mbcwp: 46.74 MB out of 46.74 MB backed up (5s)'
    started: "2022-06-06T06:44:19Z"
  - completed: "2022-06-06T06:46:31Z"
    message: Completed
    name: StageRestore
    progress:
    - 'Restore openshift-migration/migration-f3c5c-stage-trbw2: 6 out of estimated total of 6 objects restored (19s)'
    - 'PodVolumeRestore openshift-migration/migration-f3c5c-stage-trbw2-jnnx2: 46.74 MB out of 46.74 MB restored (6s)'
    - 'Pod ocp-django/stage-postgresql-6k959: Container sleep-0 '
    started: "2022-06-06T06:45:36Z"
  - completed: "2022-06-06T06:46:40Z"
    message: Completed
    name: Restore
    progress:
    - 'Restore openshift-migration/migration-f3c5c-final-zmgbp: 37 out of estimated total of 37 objects restored (4s)'
    - All the stage pods are restored, waiting for restore to Complete
    started: "2022-06-06T06:46:31Z"
  - completed: "2022-06-06T06:46:40Z"
    message: Completed
    name: Cleanup
    started: "2022-06-06T06:46:40Z"
  startTimestamp: "2022-06-06T06:43:20Z"

Indirect migrations are working fine, Moving this to verified status.

Comment 8 errata-xmlrpc 2022-08-02 07:45:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.7.3 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5840


Note You need to log in before you can comment on or make changes to this bug.