1816230 – All migrations fail when the controller is installed in 3.9 cluster

Bug 1816230 - All migrations fail when the controller is installed in 3.9 cluster

Summary: All migrations fail when the controller is installed in 3.9 cluster

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Migration Tooling
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Erik Nelson
QA Contact:	Xin jiang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1816235 (view as bug list)
Depends On:
Blocks:	1816235
TreeView+	depends on / blocked

Reported:	2020-03-23 15:41 UTC by Sergio
Modified:	2020-05-28 11:10 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1816235 (view as bug list)
Environment:
Last Closed:	2020-05-28 11:09:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
all logs (1.11 MB, application/zip) 2020-03-23 15:41 UTC, Sergio	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2020:2326	0	None	None	None	2020-05-28 11:10:37 UTC

Description Sergio 2020-03-23 15:41:51 UTC

Created attachment 1672741 [details]
all logs

Description of problem:
When the controller is installed in a 3.9 cluster, all migrations fail

Version-Release number of selected component (if applicable):
1.1.2 CAM stage

How reproducible:
Always

Steps to Reproduce:
1. Prepare 2 clusters to perform migrations. One of them must be version ocp3.9 and the other ocp4.3
2. Install the controller in the 3.9 cluster
3. Run any migration

Actual results:
The migration fails with this failure

$ oc get migmigration -o yaml
....
  status:
    conditions:
    - category: Advisory
      durable: true
      lastTransitionTime: 2020-03-23T15:22:45Z
      message: '[1] Stage pods created.'
      status: "True"
      type: StagePodsCreated
    - category: Critical
      lastTransitionTime: 2020-03-23T15:23:04Z
      message: 'Reconcile failed: [pods "restic-26mz2" is forbidden: User "system:serviceaccount:openshift-migration:migration-controller"
        cannot delete pods in the namespace "openshift-migration": User "system:serviceaccount:openshift-migration:migration-controller"
        cannot delete pods in project "openshift-migration"]. See controller logs
        for details.'
      status: "True"
      type: ReconcileFailed
    phase: RestartRestic



Expected results:
The migration should run without failures

Additional info:

Comment 1 Dylan Murray 2020-03-25 15:39:32 UTC

*** Bug 1816235 has been marked as a duplicate of this bug. ***

Comment 2 Erik Nelson 2020-04-28 20:21:08 UTC

https://github.com/konveyor/mig-operator/pull/259

PR moves the overall deployment to use the same SA (migration-controller) regardless of where things are deployed. Ensures consistent permissioning.

Comment 5 Sergio 2020-05-11 12:25:37 UTC

Using CAM 1.2 stage we found this problem when deploying the controller on 3.9

TASK [Gathering Facts] *********************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'getpwuid(): uid not found: 1000130000'
fatal: [localhost]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}

We move the BZ to ASSIGNED.

Comment 6 Jason Montleon 2020-05-11 12:41:13 UTC

This is the upstream PR that fixed the last error:
https://github.com/konveyor/mig-operator/pull/290

I have enabled the downstream content source to get nss_wrapper and updated the downstream Dockerfile to use our entrypoint with the fix that was eventually implemented by the ansible-operator team after the original solution was removed due to a CVE being filed against it.

Comment 7 Jason Montleon 2020-05-11 12:42:59 UTC

Sorrym I just read back and it looks like the original issue was a different. The last comment probably warranted a separate BZ (and we should split it the next time it goes ON_QA if it's not fixed by this).

Comment 10 Sergio 2020-05-13 11:21:10 UTC

Verified using CAM 1.2 stage

3.9 (controller) -> 4.3

openshift-migration-rhel7-operator@sha256:6afd508558cdbfdfa05b46d0d02c46af59404a1f2bfd09c3272bbcf41900996d

Migrations could be executed without errors.

Comment 12 errata-xmlrpc 2020-05-28 11:09:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2326

Note You need to log in before you can comment on or make changes to this bug.