Bug 1832156

Summary: Migration stuck when a hook using a non existent service account is configured
Product: OpenShift Container Platform Reporter: Sergio <sregidor>
Component: Migration ToolingAssignee: Jason Montleon <jmontleo>
Status: CLOSED ERRATA QA Contact: Xin jiang <xjiang>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: chezhang, dymurray, ernelson, jmatthew, pvauter, rjohnson, rpattath, whu, xjiang
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1832155 Environment:
Last Closed: 2020-05-28 11:10:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1832155    
Bug Blocks:    

Description Sergio 2020-05-06 08:37:50 UTC
+++ This bug was initially created as a clone of Bug #1832155 +++

Description of problem:
When we add a hook to a migration plan, and this hook is configured to use a service account that does not exist, the migration is stuck forever, without failing.

Version-Release number of selected component (if applicable):
KONVEYOR 1.2

How reproducible:
Always

Steps to Reproduce:
1. Create a migration plan and add a hook (no matter the playbook) that uses a service account that does not exist.
2. Run a migration.

Actual results:
The migration will remain stuck forever.

The job created by the hook, cannot create the pods to execute the hook, since the pod will use the configured service account and it does not exist. We find this information describing the job

$ oc describe job noservicaacc-prebackup-lb44r
Type     Reason        Age                            From            Message
  ----     ------        ----                           ----            -------
  Warning  FailedCreate  <invalid> (x5 over <invalid>)  job-controller  Error creating: pods "noservicaacc-prebackup-lb44r-" is forbidden: error looking up service account robot-source/fakename: serviceaccount "fakename" not found

An the pod is never created.


Expected results:
The service account should have been rejected in the hook's UI, or the migration should fail telling that the configured service account cannot be used because it does not exist.

Additional info:

Comment 1 Jason Montleon 2020-05-07 18:34:37 UTC
This should be fixed with some other issues in https://github.com/konveyor/mig-controller/pull/518.

Comment 2 Sergio 2020-05-08 13:29:51 UTC
Using CAM 1.2 stage.

It seems that when we configure the hook so that it uses a service account in the source cluster (the one where controller is NOT located), CAM is always reporting a failure because the service account does not exist, even if it exists.

We move the BZ to ASSIGNED status.

Comment 3 Jason Montleon 2020-05-08 13:33:41 UTC
https://github.com/konveyor/mig-controller/pull/527

Comment 6 Xin jiang 2020-05-15 08:47:02 UTC
Verified. And it reports serviceaccounts  not found when the immigration failed

  status:
    conditions:
    - category: Advisory
      durable: true
      lastTransitionTime: "2020-05-15T08:41:54Z"
      message: 'The migration has failed.  See: Errors.'
      reason: PreBackupHooksFailed
      status: "True"
      type: Failed
    errors:
    - serviceaccounts "asdfasfd" not found
    itenerary: Failed
    observedDigest: 9ee5917595b5f434ad848932d54e52d8d489b94b3761290a60b4580b57e317b5
    phase: Completed
    startTimestamp: "2020-05-15T08:41:45Z"
kind: List

Comment 8 errata-xmlrpc 2020-05-28 11:10:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2326