Bug 1878791 - CR is not migrated when there are deployments in the migration too.
Summary: CR is not migrated when there are deployments in the migration too.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Migration Tooling
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Scott Seago
QA Contact: Xin jiang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-14 14:17 UTC by Sergio
Modified: 2020-09-30 18:43 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-30 18:43:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
restore logs (76.87 KB, text/plain)
2020-09-14 14:17 UTC, Sergio
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4148 0 None None None 2020-09-30 18:43:19 UTC

Description Sergio 2020-09-14 14:17:44 UTC
Created attachment 1714801 [details]
restore logs

Description of problem:
When we migrate a CRD and a CR in a migration that contains a deployment, the CRD is created, but CR is skipped and not migrated. The migration shows no failures.


Version-Release number of selected component (if applicable):
CMT 1.3
SOURCE CLUSTER: azure 4.2
TARGET CLUSTER: azure 4.5
Replication repository: Azure storage

How reproducible:
Always

Steps to Reproduce:
1. Create a namespaces

  oc new-project bztest

2. Create the CRD

cat <<EOF | oc create -f -
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: deploycustoms.samplecontroller.k8s.io
spec:
  group: samplecontroller.k8s.io
  version: v1alpha1
  names:
    kind: DeployCustom
    plural: deploycustoms
  scope: Namespaced
EOF


3. Create the application. This application reads CRDs resources, and creates a deployment for every resource found.

cat <<EOF | oc create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: 'bztest'
  name: 'bztest'

spec:
  replicas: 1
  selector:
    matchLabels:
      app: bztest
  template:
    metadata:
      labels:
        app: 'bztest'
    spec:
      containers:
      - image: quay.io/sregidor/foo-controller
        name: 'bztest'
EOF


4. Create permissions for the app.
cat <<EOF | oc create -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: foo-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: foo-controller
subjects:
- kind: ServiceAccount
  name: default
  namespace: bztest

---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: foo-controller
rules:
- apiGroups:
  - samplecontroller.k8s.io
  resources:
  - deploycustoms
  - deploycustoms/finalizers
  verbs:
  - "*"
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - events
  - pods
  verbs:
  - "*"

EOF

5. Create the CR

cat <<EOF | oc create -f -
apiVersion: samplecontroller.k8s.io/v1alpha1
kind: DeployCustom
metadata:
  name: mytest
spec:
  deploymentName: 'ondedployment'
  replicas: 1
  image: quay.io/sregidor/openshift-nginx:latest
EOF

6. Check pods in namespace

$ oc get pods -n bztest
NAME                            READY   STATUS    RESTARTS   AGE
bztest-6b944f996-cgq95          1/1     Running   0          25s
ondedployment-d8855b5bf-mn8lt   1/1     Running   0          14s

7. Migrate the application. QUIESCED = FALSE. It's important, because since it's not a normal deployment the bztest application and the CMT will confict, and a quiesced migration will last forever because the app will override the quiesced behavior of CMT.


Actual results:
In target cluster we will be able to find the CRDs, but not the CR.
$ oc get crds | grep deploy
deploycustoms.samplecontroller.k8s.io                       2020-09-14T13:48:22Z

$ oc get deploycustoms.samplecontroller.k8s.io -n bztest
No resources found in bztest namespace.


Expected results:
Both the CRD and the CR should be present in the target cluster once the migration ends.


Additional info:
We can see this message in the restore logs:

time="2020-09-14T13:48:24Z" level=info msg="Skipping restore of resource because it cannot be resolved via discovery" logSource="pkg/restore/restore.go:383" resource=deploycustoms.samplecontroller.k8s.io restore=openshift-migration/d0865710-f690-11ea-9a47-e12d786f434f-5w6wc

Comment 2 Sergio 2020-09-14 16:14:36 UTC
To reproduce it without custom apps or templates.

1. Deploy a normal nginx deployment

cat <<EOF | oc create -f -
apiVersion: v1
items:
- apiVersion: v1
  kind: Namespace
  metadata:
    labels:
      app: nginx
      template: nginx-persistent-template
    name: mycrdtests
- apiVersion: v1
  kind: PersistentVolumeClaim
  metadata:
    labels:
      app: nginx
      template: nginx-persistent-template
    name: nginx-logs
    namespace: mycrdtests
  spec:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 50Mi
- apiVersion: v1
  kind: PersistentVolumeClaim
  metadata:
    labels:
      app: nginx
      template: nginx-persistent-template
    name: nginx-html
    namespace: mycrdtests
  spec:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 50Mi
- apiVersion: apps/v1beta1
  kind: Deployment
  metadata:
    labels:
      template: nginx-persistent-template
    name: nginx-deployment
    namespace: mycrdtests
  spec:
    replicas: 1
    template:
      metadata:
        labels:
          app: nginx
      spec:
        containers:
        - image: quay.io/sregidor/openshift-nginx
          name: nginx
          ports:
          - containerPort: 8081
          resources:
            limits:
              cpu: "1"
              memory: 128Mi
          volumeMounts:
          - mountPath: /var/log/nginx
            name: nginx-logs
            readOnly: false
          - mountPath: /usr/share/nginx/html
            name: nginx-html
            readOnly: false
        volumes:
        - name: nginx-logs
          persistentVolumeClaim:
            claimName: nginx-logs
        - name: nginx-html
          persistentVolumeClaim:
            claimName: nginx-html
- apiVersion: v1
  kind: Service
  metadata:
    labels:
      app: nginx
      template: nginx-persistent-template
    name: my-nginx
    namespace: mycrdtests
  spec:
    ports:
    - port: 8081
      targetPort: 8081
    selector:
      app: nginx
    type: ClusterIP
- apiVersion: route.openshift.io/v1
  kind: Route
  metadata:
    labels:
      app: nginx
      service: my-nginx
      template: nginx-persistent-template
    name: my-nginx
    namespace: mycrdtests
  spec:
    port:
      targetPort: 8081
    to:
      kind: Service
      name: my-nginx
kind: List
metadata: {}
EOF



2. Create the crd

cat <<EOF | oc create -f -
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: deploycustoms.samplecontroller.k8s.io
spec:
  group: samplecontroller.k8s.io
  version: v1alpha1
  names:
    kind: DeployCustom
    plural: deploycustoms
  scope: Namespaced
EOF



3.  Create the CR

cat <<EOF | oc create -f -
apiVersion: samplecontroller.k8s.io/v1alpha1
kind: DeployCustom
metadata:
  name: mytest
spec:
  deploymentName: 'ondedployment'
  replicas: 1
  image: quay.io/sregidor/openshift-nginx:latest
EOF



4.  Migrate



You will see after the migration that the CRD was created in the target cluster, but not the CR.

Comment 3 Scott Seago 2020-09-17 15:26:51 UTC
I've identified the upstream commit that caused this bug to start happening. I'm currently talking to upstream as to whether this commit can just be reverted or whether it was a necessary fix that needs refactoring to handle the original issue and the issue that was broken.

Comment 4 Scott Seago 2020-09-17 21:23:50 UTC
Upstream issue filed: https://github.com/vmware-tanzu/velero/issues/2948
I've discussed the problem briefly with upstream developers. If they don't fix it in the next day or two I'll submit a PR upstream. Either way, once an upstream fix is in place, we can cherry-pick it into our velero fork.

Comment 5 Scott Seago 2020-09-18 15:20:32 UTC
Upstream PR submitted. Once it's merged, I'll cherry-pick it into our konveyor fork: https://github.com/vmware-tanzu/velero/pull/2949

Comment 6 Scott Seago 2020-09-23 16:34:34 UTC
PR to pull this into our 1.3.0 release branch for the velero fork: https://github.com/konveyor/velero/pull/74

Comment 10 Sergio 2020-09-24 11:56:32 UTC
Verified using MTC 1.3

openshift-migration-rhel7-operator@sha256:71156aa47b56dd673268f7f073c76c9595e6d856b4f94f61b28e599bffe12899
    - name: MIG_CONTROLLER_REPO
      value: openshift-migration-controller-rhel8@sha256
    - name: MIG_CONTROLLER_TAG
      value: 0e805a6901d3b5c257c877af7f714cfa3e088b0bf0ef0e9ce743994f656a2fa8
    - name: MIG_UI_REPO
      value: openshift-migration-ui-rhel8@sha256
    - name: MIG_UI_TAG
      value: d5d2a58977d533d2bd773d6e0403eea9f072a2e09d19efa219fccb3df9b96457
    - name: MIGRATION_REGISTRY_REPO
      value: openshift-migration-registry-rhel8@sha256
    - name: MIGRATION_REGISTRY_TAG
      value: 3b4a26983053bccc548bc106bdfc0f651075301b90572a03d9d31d62a6c3d769
    - name: VELERO_REPO
      value: openshift-migration-velero-rhel8@sha256
    - name: VELERO_TAG
      value: ef57b63792391edaf5b699102a7dd490748f8aa879daf9cc77e9c226d74b8522
    - name: VELERO_PLUGIN_REPO
      value: openshift-velero-plugin-rhel8@sha256
    - name: VELERO_PLUGIN_TAG
      value: 18377c92939bcd447a35b44aa872656954a9c834350300394e1753a8cbf7830a


Verified but running testcase "OCP-34703 - Migrate Custorm Resource Definition". The test passed.

Moved to VERIFIED status.

Comment 14 errata-xmlrpc 2020-09-30 18:43:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Migration Toolkit for Containers (MTC) Tool image release advisory 1.3.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4148


Note You need to log in before you can comment on or make changes to this bug.