Created attachment 1672741 [details] all logs Description of problem: When the controller is installed in a 3.9 cluster, all migrations fail Version-Release number of selected component (if applicable): 1.1.2 CAM stage How reproducible: Always Steps to Reproduce: 1. Prepare 2 clusters to perform migrations. One of them must be version ocp3.9 and the other ocp4.3 2. Install the controller in the 3.9 cluster 3. Run any migration Actual results: The migration fails with this failure $ oc get migmigration -o yaml .... status: conditions: - category: Advisory durable: true lastTransitionTime: 2020-03-23T15:22:45Z message: '[1] Stage pods created.' status: "True" type: StagePodsCreated - category: Critical lastTransitionTime: 2020-03-23T15:23:04Z message: 'Reconcile failed: [pods "restic-26mz2" is forbidden: User "system:serviceaccount:openshift-migration:migration-controller" cannot delete pods in the namespace "openshift-migration": User "system:serviceaccount:openshift-migration:migration-controller" cannot delete pods in project "openshift-migration"]. See controller logs for details.' status: "True" type: ReconcileFailed phase: RestartRestic Expected results: The migration should run without failures Additional info:
*** Bug 1816235 has been marked as a duplicate of this bug. ***
https://github.com/konveyor/mig-operator/pull/259 PR moves the overall deployment to use the same SA (migration-controller) regardless of where things are deployed. Ensures consistent permissioning.
Using CAM 1.2 stage we found this problem when deploying the controller on 3.9 TASK [Gathering Facts] ********************************************************* An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'getpwuid(): uid not found: 1000130000' fatal: [localhost]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""} We move the BZ to ASSIGNED.
This is the upstream PR that fixed the last error: https://github.com/konveyor/mig-operator/pull/290 I have enabled the downstream content source to get nss_wrapper and updated the downstream Dockerfile to use our entrypoint with the fix that was eventually implemented by the ansible-operator team after the original solution was removed due to a CVE being filed against it.
Sorrym I just read back and it looks like the original issue was a different. The last comment probably warranted a separate BZ (and we should split it the next time it goes ON_QA if it's not fixed by this).
Verified using CAM 1.2 stage 3.9 (controller) -> 4.3 openshift-migration-rhel7-operator@sha256:6afd508558cdbfdfa05b46d0d02c46af59404a1f2bfd09c3272bbcf41900996d Migrations could be executed without errors.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:2326