Description of problem: When we execute a migration that has been previously run as "stage", the migration fails. Version-Release number of selected component (if applicable): CAM 1.2.2 stage SOURCE: OCP 3.11 AWS TARGET: OCP 4.4 AWS AWS S3 BUCKET How reproducible: Always Steps to Reproduce: 1. Create a namespace with a nginx application oc process -p NAMESPACE=bztest -f https://gitlab.cee.redhat.com/app-mig/cam-helper/raw/master/ocp-26160/nginx_with_pv_defaultsc_template.yml | oc create -f - 2. Feed the data oc -n bztest rsh $(oc get pods -n bztest -o jsonpath='{.items[0].metadata.name}') sh -c 'echo "<h1>HELLO WORLD</h1>" > /usr/share/nginx/html/index.html' 3. Execute a "stage" migration with this namespace 4. Execute a migration with this namespace Actual results: The stage migration will run OK, but the actual migration will fail, and this error will be displayed in the MigMigration resource status: conditions: - category: Advisory durable: true lastTransitionTime: "2020-06-08T12:59:10Z" message: '[1] Stage pods created.' status: "True" type: StagePodsCreated - category: Advisory durable: true lastTransitionTime: "2020-06-08T13:00:12Z" message: 'The migration has failed. See: Errors.' reason: QuiesceApplications status: "True" type: Failed errors: - 'Operation cannot be fulfilled on replicasets.extensions "nginx-deployment-557dd97bf8": the object has been modified; please apply your changes to the latest version and try again' itenerary: Failed Expected results: The migration should be executed without problems. Additional info:
It doesn't seem to necessary to previously "stage" the migration. I could reproduce it consistently by staging migrations, but all of a sudden it stopped happening. Now it eventually happens in quiesced migrations, and I cannot consistently reproduce it.
PR is here: https://github.com/konveyor/mig-controller/pull/565
ReplicaSets can exist standalone or as part of a Deployment. Standalone ReplicaSets should be quiesced, but Deployment ReplicaSets should not -- the Deployment quiesce will handle it and without skipping it, we run into a race condition with two things modifying the ReplicaSet at once.
Have been verified in new CAM 1.2.2 stage image. The fix has worked. CAM 1.2.2 information : openshift-migration-rhel7-operator@sha256:ab124c3917a2ea22e03618f287c629e727bbcdf7ec76db5e7d0f8654064b7a52 openshift-migration-controller-rhel8@sha256:ca9ab7ecf0d939afa1aae2540bb3daf5d7ce651ad58b94c6987484d12af1d211 openshift-migration-ui-rhel8@sha256:6abfaea8ac04e3b5bbf9648a3479b420b4baec35201033471020c9cae1fe1e11
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:2571