Description of problem: If there is a problem pulling the restic restore helper pod, the migration is stuck forever instead of failed. Version-Release number of selected component (if applicable): CAM 1.2.5 How reproducible: Always Steps to Reproduce: 1. In source cluster, create a namespace oc new-project bztest 2. In this namespace, deploy an application oc new-app cakephp-mysql-persistent 3. In target cluster, configure a wrong value for velero_restic_restore_helper_version oc edit migrationcontroller .... restic_timeout: 1h velero_restic_restore_helper_version: THISISAFAKEVALUETHATCANNOTBEPULLED 4. Create a migration plan and migrate the namespace created in step 1 Actual results: The migration is stuck forever in StageRestoreCreated stated In target cluster we can see that the stage pod cannot be created $ oc get pods NAME READY STATUS RESTARTS AGE stage-mysql-1-dmgvm-2flgs 0/1 Init:ImagePullBackOff 0 15m Expected results: When CAM can see that the stage pod cannot be created, the migration should fail instead of remain stuck. Additional info: If we use this configuration migration_stage_image: mybadregistry.com/bad migration_stage_repo: mybadrepo migration_stage_version: badversion The problem happens too, but it's stuck in StagePodsCreated status instead.
Alay, this is probably related to the registry health check work. Think the expectation here is a failure, which the dependency checks should satisfy.
Verified using MTC 1.4.0 In 1.4.0 the error is visible in the UI , like this: Container restic-wait Failed to apply default image tag "registry.stage.redhat.io/rhmtc/openshift-migration-velero-restic-restore-helper-rhel8@sha256:THISISAFAKEVALUETHATCANNOTBEPULLED": couldn't parse image reference "registry.stage.redhat.io/rhmtc/openshift-migration-velero-restic-restore-helper-rhel8@sha256:THISISAFAKEVALUETHATCANNOTBEPULLED": invalid reference format The migration will be aborted and a warning will be reported once the restic timeout is reached. It happened before 1.4.0 too, but the cause of this timeout was hidden. Given that the error is now reported to the user, and that actually the restic timeout will make the migration not to wait forever, we can consider that this BZ is verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Migration Toolkit for Containers (MTC) tool image release advisory 1.4.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5329