Bug 1837463 - restic backup couldn’t find the associated resource when migrating the canceled migration
Summary: restic backup couldn’t find the associated resource when migrating the cancel...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 1.4.z
Assignee: John Matthews
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-19 13:56 UTC by Xin jiang
Modified: 2021-04-08 02:27 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-08 02:27:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Xin jiang 2020-05-19 13:56:24 UTC
Description of problem:
If the migration is canceled, you try to migrate it again but it failed at StageBackupFailed status

Version-Release number of selected component (if applicable):
CAM 1.2

How reproducible:
Always

Steps to Reproduce:
1. Create a namespace
$ oc new-project mongodbtest

2. Deploy an application 
$ oc new-app mongodb-persistent

3. Create a migplan on CAM console

4. Migrate the migplan created in step#3, cancel it before the migration finish

5. After the cancel operator finished, then migrate the migplan again

Actual results:
On UI , it failed at 'StageBackupFailed' status

From the backend:
1. cancel migmigration
$ oc get migmigration -n openshift-migration 3136d380-99d2-11ea-8b5d-d7b9450dbfc0 -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
  annotations:
    openshift.io/touch: aafefe45-99d2-11ea-99fd-0a580a810010
  clusterName: ""
  creationTimestamp: 2020-05-19T13:11:03Z
  generation: 0
  name: 3136d380-99d2-11ea-8b5d-d7b9450dbfc0
  namespace: openshift-migration
  ownerReferences:
  - apiVersion: migration.openshift.io/v1alpha1
    kind: MigPlan
    name: mongodbtest1
    uid: 13b68cac-99d2-11ea-a806-0e4a2a109e83
  resourceVersion: "67641"
  selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/3136d380-99d2-11ea-8b5d-d7b9450dbfc0
  uid: 315a145d-99d2-11ea-a806-0e4a2a109e83
spec:
  canceled: true
  migPlanRef:
    name: mongodbtest1
    namespace: openshift-migration
  stage: false
status:
  conditions:
  - category: Advisory
    durable: true
    lastTransitionTime: 2020-05-19T13:11:35Z
    message: '[1] Stage pods created.'
    status: "True"
    type: StagePodsCreated
  - category: Advisory
    durable: true
    lastTransitionTime: 2020-05-19T13:14:27Z
    message: The migration has been canceled.
    reason: Cancel
    status: "True"
    type: Canceled
  - category: Advisory
    durable: true
    lastTransitionTime: 2020-05-19T13:14:27Z
    message: The migration has completed successfully.
    reason: Completed
    status: "True"
    type: Succeeded
  itenerary: Cancel
  observedDigest: 537ab32d5a913d0fac3533aab22fa34d866a10b787c7eef2ac195e96d631fb03
  phase: Completed
  startTimestamp: 2020-05-19T13:11:03Z

2. 2nd migration
$ oc get migmigration -n openshift-migration c4c88940-99d2-11ea-8b5d-d7b9450dbfc0 -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
  annotations:
    openshift.io/touch: 23d0ca33-99d3-11ea-99fd-0a580a810010
  clusterName: ""
  creationTimestamp: 2020-05-19T13:15:11Z
  generation: 0
  name: c4c88940-99d2-11ea-8b5d-d7b9450dbfc0
  namespace: openshift-migration
  ownerReferences:
  - apiVersion: migration.openshift.io/v1alpha1
    kind: MigPlan
    name: mongodbtest1
    uid: 13b68cac-99d2-11ea-a806-0e4a2a109e83
  resourceVersion: "68610"
  selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/c4c88940-99d2-11ea-8b5d-d7b9450dbfc0
  uid: c51e177c-99d2-11ea-a806-0e4a2a109e83
spec:
  migPlanRef:
    name: mongodbtest1
    namespace: openshift-migration
  stage: false
status:
  conditions:
  - category: Advisory
    durable: true
    lastTransitionTime: 2020-05-19T13:15:44Z
    message: '[1] Stage pods created.'
    status: "True"
    type: StagePodsCreated
  - category: Advisory
    durable: true
    lastTransitionTime: 2020-05-19T13:16:51Z
    message: 'The migration has failed.  See: Errors.'
    reason: StageBackupFailed
    status: "True"
    type: Failed
  errors:
  - 'Backup: openshift-migration/c4c88940-99d2-11ea-8b5d-d7b9450dbfc0-d6q9l partially
    failed.'
  itenerary: Failed
  observedDigest: b436f24c7bfcc882b29015b1a47a90294873802be2403bbd15e30720c5c98d1d
  phase: Completed
  startTimestamp: 2020-05-19T13:15:11Z


Expected results:
the 2nd migration should be successful

Additional info:

Comment 1 Dylan Murray 2020-05-19 16:34:22 UTC
Error from restic pod:

time="2020-05-19T12:08:46Z" level=info msg="Found most recent completed pod volume backup for PVC" backup=openshift-migration/447d9ae0-99c9-11ea-8b5d-d7b9450dbfc0-dqhvm controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:357" name=447d9ae0-99c9-11ea-8b5d-d7b9450dbfc0-dqhvm-d97sk namespace=openshift-migration parentPodVolumeBackup=bfaff470-99c8-11ea-8b5d-d7b9450dbfc0-fvbcz-4s99m parentSnapshotID=9ac2d6ed pvcUID=e21bce46-99c7-11ea-a806-0e4a2a109e83
time="2020-05-19T12:08:46Z" level=info msg="Setting --parent flag for this backup" backup=openshift-migration/447d9ae0-99c9-11ea-8b5d-d7b9450dbfc0-dqhvm controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:268" name=447d9ae0-99c9-11ea-8b5d-d7b9450dbfc0-dqhvm-d97sk namespace=openshift-migration parentSnapshotID=9ac2d6ed
time="2020-05-19T12:08:47Z" level=error msg="Error running command=restic backup --repo=s3:s3-us-east-2.amazonaws.com/cam0519/velero/restic/test1 --password-file=/tmp/velero-restic-credentials-test1611307284 --cache-dir=/scratch/.cache/restic . --tag=pod-uid=57c2e87a-99c9-11ea-a806-0e4a2a109e83 --tag=pvc-uid=e21bce46-99c7-11ea-a806-0e4a2a109e83 --tag=volume=mongodb-data --tag=backup=447d9ae0-99c9-11ea-8b5d-d7b9450dbfc0-dqhvm --tag=backup-uid=75bfbbc7-99c9-11ea-a806-0e4a2a109e83 --tag=ns=test1 --tag=pod=mongodb-1-cfckh-stage --host=velero --json --parent=9ac2d6ed, stdout=, stderr=Fatal: invalid id \"9ac2d6ed\": no matching ID found\n" backup=openshift-migration/447d9ae0-99c9-11ea-8b5d-d7b9450dbfc0-dqhvm controller=pod-volume-backup error="unable to find summary in restic backup command output" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/exec_commands.go:175" error.function=github.com/vmware-tanzu/velero/pkg/restic.getSummaryLine logSource="pkg/controller/pod_volume_backup_controller.go:280" name=447d9ae0-99c9-11ea-8b5d-d7b9450dbfc0-dqhvm-d97sk namespace=openshift-migration

Comment 2 Xin jiang 2020-05-22 10:47:30 UTC
above failure happend while migrating from aws 3.7 controller to 4.4

Comment 3 Xin jiang 2020-05-22 10:48:34 UTC
Above failure happened while migrating application aws 3.7 controller --> 4.4.

Comment 4 Xin jiang 2020-05-22 11:12:19 UTC
We verified that it cannot be reproduced on 3.11->4.4 and 4.2 (controller)-> 4.4

Comment 5 Erik Nelson 2021-04-08 02:27:37 UTC
Closing as stale, please re-open if the issue persists.


Note You need to log in before you can comment on or make changes to this bug.