Created attachment 1714801 [details] restore logs Description of problem: When we migrate a CRD and a CR in a migration that contains a deployment, the CRD is created, but CR is skipped and not migrated. The migration shows no failures. Version-Release number of selected component (if applicable): CMT 1.3 SOURCE CLUSTER: azure 4.2 TARGET CLUSTER: azure 4.5 Replication repository: Azure storage How reproducible: Always Steps to Reproduce: 1. Create a namespaces oc new-project bztest 2. Create the CRD cat <<EOF | oc create -f - apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: deploycustoms.samplecontroller.k8s.io spec: group: samplecontroller.k8s.io version: v1alpha1 names: kind: DeployCustom plural: deploycustoms scope: Namespaced EOF 3. Create the application. This application reads CRDs resources, and creates a deployment for every resource found. cat <<EOF | oc create -f - apiVersion: apps/v1 kind: Deployment metadata: labels: app: 'bztest' name: 'bztest' spec: replicas: 1 selector: matchLabels: app: bztest template: metadata: labels: app: 'bztest' spec: containers: - image: quay.io/sregidor/foo-controller name: 'bztest' EOF 4. Create permissions for the app. cat <<EOF | oc create -f - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: foo-controller roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: foo-controller subjects: - kind: ServiceAccount name: default namespace: bztest --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: foo-controller rules: - apiGroups: - samplecontroller.k8s.io resources: - deploycustoms - deploycustoms/finalizers verbs: - "*" - apiGroups: - apps resources: - deployments verbs: - "*" - apiGroups: - "" resources: - events - pods verbs: - "*" EOF 5. Create the CR cat <<EOF | oc create -f - apiVersion: samplecontroller.k8s.io/v1alpha1 kind: DeployCustom metadata: name: mytest spec: deploymentName: 'ondedployment' replicas: 1 image: quay.io/sregidor/openshift-nginx:latest EOF 6. Check pods in namespace $ oc get pods -n bztest NAME READY STATUS RESTARTS AGE bztest-6b944f996-cgq95 1/1 Running 0 25s ondedployment-d8855b5bf-mn8lt 1/1 Running 0 14s 7. Migrate the application. QUIESCED = FALSE. It's important, because since it's not a normal deployment the bztest application and the CMT will confict, and a quiesced migration will last forever because the app will override the quiesced behavior of CMT. Actual results: In target cluster we will be able to find the CRDs, but not the CR. $ oc get crds | grep deploy deploycustoms.samplecontroller.k8s.io 2020-09-14T13:48:22Z $ oc get deploycustoms.samplecontroller.k8s.io -n bztest No resources found in bztest namespace. Expected results: Both the CRD and the CR should be present in the target cluster once the migration ends. Additional info: We can see this message in the restore logs: time="2020-09-14T13:48:24Z" level=info msg="Skipping restore of resource because it cannot be resolved via discovery" logSource="pkg/restore/restore.go:383" resource=deploycustoms.samplecontroller.k8s.io restore=openshift-migration/d0865710-f690-11ea-9a47-e12d786f434f-5w6wc
To reproduce it without custom apps or templates. 1. Deploy a normal nginx deployment cat <<EOF | oc create -f - apiVersion: v1 items: - apiVersion: v1 kind: Namespace metadata: labels: app: nginx template: nginx-persistent-template name: mycrdtests - apiVersion: v1 kind: PersistentVolumeClaim metadata: labels: app: nginx template: nginx-persistent-template name: nginx-logs namespace: mycrdtests spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Mi - apiVersion: v1 kind: PersistentVolumeClaim metadata: labels: app: nginx template: nginx-persistent-template name: nginx-html namespace: mycrdtests spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Mi - apiVersion: apps/v1beta1 kind: Deployment metadata: labels: template: nginx-persistent-template name: nginx-deployment namespace: mycrdtests spec: replicas: 1 template: metadata: labels: app: nginx spec: containers: - image: quay.io/sregidor/openshift-nginx name: nginx ports: - containerPort: 8081 resources: limits: cpu: "1" memory: 128Mi volumeMounts: - mountPath: /var/log/nginx name: nginx-logs readOnly: false - mountPath: /usr/share/nginx/html name: nginx-html readOnly: false volumes: - name: nginx-logs persistentVolumeClaim: claimName: nginx-logs - name: nginx-html persistentVolumeClaim: claimName: nginx-html - apiVersion: v1 kind: Service metadata: labels: app: nginx template: nginx-persistent-template name: my-nginx namespace: mycrdtests spec: ports: - port: 8081 targetPort: 8081 selector: app: nginx type: ClusterIP - apiVersion: route.openshift.io/v1 kind: Route metadata: labels: app: nginx service: my-nginx template: nginx-persistent-template name: my-nginx namespace: mycrdtests spec: port: targetPort: 8081 to: kind: Service name: my-nginx kind: List metadata: {} EOF 2. Create the crd cat <<EOF | oc create -f - apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: deploycustoms.samplecontroller.k8s.io spec: group: samplecontroller.k8s.io version: v1alpha1 names: kind: DeployCustom plural: deploycustoms scope: Namespaced EOF 3. Create the CR cat <<EOF | oc create -f - apiVersion: samplecontroller.k8s.io/v1alpha1 kind: DeployCustom metadata: name: mytest spec: deploymentName: 'ondedployment' replicas: 1 image: quay.io/sregidor/openshift-nginx:latest EOF 4. Migrate You will see after the migration that the CRD was created in the target cluster, but not the CR.
I've identified the upstream commit that caused this bug to start happening. I'm currently talking to upstream as to whether this commit can just be reverted or whether it was a necessary fix that needs refactoring to handle the original issue and the issue that was broken.
Upstream issue filed: https://github.com/vmware-tanzu/velero/issues/2948 I've discussed the problem briefly with upstream developers. If they don't fix it in the next day or two I'll submit a PR upstream. Either way, once an upstream fix is in place, we can cherry-pick it into our velero fork.
Upstream PR submitted. Once it's merged, I'll cherry-pick it into our konveyor fork: https://github.com/vmware-tanzu/velero/pull/2949
PR to pull this into our 1.3.0 release branch for the velero fork: https://github.com/konveyor/velero/pull/74
Verified using MTC 1.3 openshift-migration-rhel7-operator@sha256:71156aa47b56dd673268f7f073c76c9595e6d856b4f94f61b28e599bffe12899 - name: MIG_CONTROLLER_REPO value: openshift-migration-controller-rhel8@sha256 - name: MIG_CONTROLLER_TAG value: 0e805a6901d3b5c257c877af7f714cfa3e088b0bf0ef0e9ce743994f656a2fa8 - name: MIG_UI_REPO value: openshift-migration-ui-rhel8@sha256 - name: MIG_UI_TAG value: d5d2a58977d533d2bd773d6e0403eea9f072a2e09d19efa219fccb3df9b96457 - name: MIGRATION_REGISTRY_REPO value: openshift-migration-registry-rhel8@sha256 - name: MIGRATION_REGISTRY_TAG value: 3b4a26983053bccc548bc106bdfc0f651075301b90572a03d9d31d62a6c3d769 - name: VELERO_REPO value: openshift-migration-velero-rhel8@sha256 - name: VELERO_TAG value: ef57b63792391edaf5b699102a7dd490748f8aa879daf9cc77e9c226d74b8522 - name: VELERO_PLUGIN_REPO value: openshift-velero-plugin-rhel8@sha256 - name: VELERO_PLUGIN_TAG value: 18377c92939bcd447a35b44aa872656954a9c834350300394e1753a8cbf7830a Verified but running testcase "OCP-34703 - Migrate Custorm Resource Definition". The test passed. Moved to VERIFIED status.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Migration Toolkit for Containers (MTC) Tool image release advisory 1.3.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4148