----------------------- Description of problem: ----------------------- When attempting a stateless migration from 4.3->4.4, I have repeatedly observed "Partial Failure" on the Velero "FinalRestore" that is part of the CAM migration flow. ------------------------------------------------------------- Version-Release number of selected component (if applicable): ------------------------------------------------------------- SOURCE CLUSTER: 4.3 TARGET CLUSTER: 4.4 KONVEYOR: 1.2 ----------------- How reproducible: ----------------- In my experience, always. ------------------- Steps to Reproduce: 1. Create 'hello-openshift' namespace $ oc create namespace openshift-migration 2. Create pod to be migrated apiVersion: v1 kind: Pod metadata: annotations: generateName: hello-openshift-68876989dc- name: hello-openshift-68876989dc-bwhrq namespace: hello-openshift spec: containers: - image: openshift/hello-openshift:latest imagePullPolicy: Always name: hello-openshift ports: - containerPort: 80 protocol: TCP resources: {} securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1000570000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File $ oc create -f pod.yaml 3. Create plan to migrate above NS; run Final Migration apiVersion: migration.openshift.io/v1alpha1 kind: MigMigration metadata: name: migration-1 namespace: openshift-migration spec: migPlanRef: name: gvkdiff namespace: openshift-migration quiescePods: true stage: false ------------------- --------------- Actual results: --------------- --------------------- mig-controller output --------------------- {"level":"info","ts":1588877311.3817651,"logger":"migration|gmmkj","msg":"[RUN]","migration":"migration-3","stage":false,"phase":"FinalRestoreFailed"} {"level":"info","ts":1588877315.816103,"logger":"migration|k7k6l","msg":"[RUN]","migration":"migration-3","stage":false,"phase":"DeleteMigrated"} -------------- migplan status -------------- - 'Restore: openshift-migration/migration-3-grf4x partially failed.' -------------------------- velero logs recorded error -------------------------- time="2020-05-07T18:48:14Z" level=info msg="error restoring operatorgroups.operators.coreos.com: CustomResourceDefinition.apiextensions.k8s.io \"operatorgroups.operators.coreos.com\" is invalid: [spec.versions: Invalid value: []apiextensions.CustomResourceDefinitionVersion{apiextensions.CustomResourceDefinitionVersion{Name:\"v1\", Served:true, Storage:true, Schema:(*apiextensions.CustomResourceValidation)(0xc025456a08), Subre sources:(*apiextensions.CustomResourceSubresources)(0xc012fea770), AdditionalPrinterColumns:[]apiextensions.CustomResourceColumnDefinition(nil)}, apiextensions.CustomResourceDefinitionVersion{Name:\"v1alpha2\", Served:true, Storage:false, Schema:(*apiextensions.CustomResourceValidation)(0xc025456a10), Subresources:(*apiextensions.CustomResourceSubresources)(0xc012fea9c0), AdditionalPrinterColumns:[]apiextensions.CustomResourc eColumnDefinition(nil)}}: per-version schemas may not all be set to identical values (top-level validation should be used instead), spec.versions: Invalid value: []apiextensions.CustomResourceDefinitionVersion{apiextensions.CustomResourceDefinitionVersion{Name:\"v1\", Served:true, Storage:true, Schema:(*apiextensions.CustomResourceValidation)(0xc025456a08), Subresources:(*apiextensions.CustomResourceSubresources)(0xc012fea770 ), AdditionalPrinterColumns:[]apiextensions.CustomResourceColumnDefinition(nil)}, apiextensions.CustomResourceDefinitionVersion{Name:\"v1alpha2\", Served:true, Storage:false, Schema:(*apiextensions.CustomResourceValidation)(0xc025456a10), Subresources:(*apiextensions.CustomResourceSubresources)(0xc012fea9c0), AdditionalPrinterColumns:[]apiextensions.CustomResourceColumnDefinition(nil)}}: per-version subresources may not all b e set to identical values (top-level subresources should be used instead)]" logSource="pkg/restore/restore.go:1199" restore=openshift-migration/migration-3-grf4x ----------------- Expected results: ----------------- Migration success. ---------------- Additional info: ---------------- Dylan Murray and Scott Seago mentioned the below links may be pertinent. https://github.com/kubernetes/apiextensions-apiserver/blob/master/pkg/apis/apiextensions/validation/validation.go#L249 https://github.com/vmware-tanzu/velero/issues/2383 https://github.com/vmware-tanzu/velero/issues/2383#issuecomment-616751291 https://github.com/vmware-tanzu/velero/issues/2383#issuecomment-617349350 https://github.com/vmware-tanzu/velero/pull/2478
====== UPDATE ====== This issue only occurs when the namespace being migrated contains an OperatorGroup. Issue would probably also get triggered while migrating other core OpenShift 4.x CRs. In this case, I had created the OperatorGroup below (and forgot about it) in my namespace. apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: generateName: openshift-migration- namespace: hello-openshift spec: targetNamespaces: - openshift-migration status: namespaces: - openshift-migration
Should we add operatorgroups to the default list of excluded resources?
Fix posted: https://github.com/konveyor/mig-operator/pull/411
Verified using CAM 1.3 stage OperatorGroups are now added to the excluded resources list by default. They are ignored. excludedResources: - imagetags - templateinstances - clusterserviceversions - packagemanifests - subscriptions - servicebrokers - servicebindings - serviceclasses - serviceinstances - serviceplans - operatorgroups Moved to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Migration Toolkit for Containers (MTC) Tool image release advisory 1.3.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4148