Description of problem: When we add a hook to a migration plan, if this hook fails and we cancel and retrigger the migration, a new job for the hook is not created, and since the failed previous job already reached the max backoff limit, the playbook is never run. Version-Release number of selected component (if applicable): KONVEYOR 1.2 How reproducible: Always. Steps to Reproduce: 1. Create a migration plan with a hook that will always fail (any hook with syntax errors, for instance) 2. Migrate the plan 3. The migration will fail (there is a bug that doesnt fail the migration and the migration is stuck forever, if that happens, cancel the migration) 4. Run the migration again Actual results: The new run will not create a new job for the hook. And since the old job already reached the max backoff limit, the hook will never be executed. Expected results: Since we are running the migration again, a new job should be created for the hook and the hook should be run again. Additional info:
I added a label we can look for based on the migmigration UID. Since each run creates a new migmigration this stays unique and allows for a new copy to be created.
Verified using CAM 1.2 stage A failed hook is retriggered using an new job for the hook $ oc get jobs NAME COMPLETIONS DURATION AGE hookfail-postrestore-h4877 0/1 13m 13m hookfail-postrestore-lt6fg 0/1 29m 29m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:2326