Description of problem: When several migrations are executed in parallel some migrations can report a failure because they cannot create the initial backup. Version-Release number of selected component (if applicable): MTC 1.4.5 TARGET CLUSTER: 4.8 AWS SOURCE CLUSTER: 3.11 AWS REPLICATION REPOSITORY: AWS S3 We have seen this error in 1.4.5, but probably it is present in previous versions too. How reproducible: Intermittent. Steps to Reproduce: 1. Execute 3 or more migrations in parallel Actual results: Eventually, one of them will report an error creating the InitialBackup, like this: $ oc get migmigration -o yaml status: conditions: - category: Advisory durable: true lastTransitionTime: "2021-06-09T13:31:46Z" message: 'The migration has failed. See: Errors.' reason: InitialBackupCreated status: "True" type: Failed errors: - Backup not found itinerary: Failed observedDigest: b922c7ac32f32e776c564ebcfdbee4ebf2a7319cab64d6583c5b6744f9a5e9a9 phase: Completed pipeline: - completed: "2021-06-09T13:31:24Z" message: Completed name: Prepare started: "2021-06-09T13:30:33Z" - completed: "2021-06-09T13:31:48Z" failed: true message: Failed name: Backup progress: - 'Backup openshift-migration/ocp-25000-sets-mig-1623245405-zqcsr: 64 out of estimated total of 90 objects backed up (14s)' started: "2021-06-09T13:31:24Z" - message: Skipped name: DirectImage skipped: true - message: Skipped name: Restore skipped: true - completed: "2021-06-09T13:31:48Z" message: Completed name: Cleanup started: "2021-06-09T13:31:48Z" - completed: "2021-06-09T13:31:48Z" message: Completed name: CleanupHelpers started: "2021-06-09T13:31:48Z" startTimestamp: "2021-06-09T13:30:28Z" In the source cluster we can find this error message too: $ oc logs $(oc get pods -l logreader -o name) -c plain openshift-migration velero-7979c7d6b4-jk6t6 velero time="2021-06-09T11:54:48Z" level=warning msg="Got error trying to update backup's status.progress" backup=openshift-migration/ocp-25000-sets-mig-1623239611-znzld error="backups.velero.io \"ocp-25000-sets-mig-1623239611-znzld\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/backup/backup.go:361" error.function="github.com/vmware-tanzu/velero/pkg/backup.(*kubernetesBackupper).Backup.func1" logSource="pkg/backup/backup.go:361" And we can see this error in the migration controller pod {"level":"error","ts":1623239690076,"logger":"migration|8ff9p","msg":"","migMigration":"ocp-25000-sets-mig-1623239611","error":"Backup not found","errorVerbose":"Backup not found\ngithub.com/konveyor/mig-controller/pkg/controller/migmigration.(*Task).Run\n\t/remote-source/app/pkg/controller/migmigration/task.go:537\ngithub.com/konveyor/mig-controller/pkg/controller/migmigration.(*ReconcileMigMigration).migrate\n\t/remote-source/app/pkg/controller/migmigration/migrate.go:71\ngithub.com/konveyor/mig-controller/pkg/controller/migmigration.(*ReconcileMigMigration).Reconcile\n\t/remote-source/app/pkg/controller/migmigration/migmigration_controller.go:241\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/opt/rh/go-toolset-1.16/root/usr/lib/go-toolset-1.16-golang/src/runtime/asm_amd64.s:1371","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\ngithub.com/konveyor/controller/pkg/logging.Logger.Error\n\t/remote-source/app/vendor/github.com/konveyor/controller/pkg/logging/logger.go:92\ngithub.com/konveyor/controller/pkg/logging.Logger.Trace\n\t/remote-source/app/vendor/github.com/konveyor/controller/pkg/logging/logger.go:98\ngithub.com/konveyor/mig-controller/pkg/controller/migmigration.(*ReconcileMigMigration).migrate\n\t/remote-source/app/pkg/controller/migmigration/migrate.go:81\ngithub.com/konveyor/mig-controller/pkg/controller/migmigration.(*ReconcileMigMigration).Reconcile\n\t/remote-source/app/pkg/controller/migmigration/migmigration_controller.go:241\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} Expected results: The migration should not fail. Additional info:
I can confirm I am able to reproduce this. I have not determined a root cause but it's fairly easily reproducible.
Fixed in https://github.com/konveyor/mig-controller/pull/1191 Cherry-picked in https://github.com/konveyor/mig-controller/pull/1192
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.6.0 security & bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3694