Bug 1997173 - Migration of custom resource definitions to OpenShift Container Platform 4.9 fails because of API version incompatibility
Summary: Migration of custom resource definitions to OpenShift Container Platform 4.9 ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.6.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 1.6.0
Assignee: Scott Seago
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-24 14:36 UTC by Sergio
Modified: 2021-09-29 14:35 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-29 14:35:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github konveyor mig-controller pull 1197 0 None None None 2021-09-07 17:16:21 UTC
Github konveyor mig-controller pull 1201 0 None None None 2021-09-10 14:20:00 UTC
Red Hat Product Errata RHSA-2021:3694 0 None None None 2021-09-29 14:35:31 UTC

Description Sergio 2021-08-24 14:36:12 UTC
Description of problem:
When we have a CRD in the source cluster created using the value "apiVersion: apiextensions.k8s.io/v1beta1" and we try to migrate this custom resource definition to a OCP 4.9 cluster, the migration's restore fails.

Version-Release number of selected component (if applicable):
SOURCE CLUSTER: AWS OCP 3.11 (MTC 1.5.1)
TARGET CLSUTER: AWS OCP 4.9 (MTC 1.6.0)
REPLICATION REPOSITORY: AWS

How reproducible:
Always

Steps to Reproduce:
1. In source cluster, create a namespaces 

oc new-project bz-test

2. In source cluster, create the custom resource definition

$ cat <<EOF | oc create -f -
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: deploycustoms.samplecontroller.k8s.io
  namespace: bz-test
spec:
  group: samplecontroller.k8s.io
  version: v1alpha1
  names:
    kind: DeployCustom
    plural: deploycustoms
  scope: Namespaced
EOF

3. In the namespace created in step 1 create a custom resource using the CRD defined in step 2

$ cat <<EOF | oc create -f -
apiVersion: samplecontroller.k8s.io/v1alpha1
kind: DeployCustom
metadata:
  name: my-cr-name
  namespace: bz-test
spec:
  deploymentName: deploy-name
  replicas: 1
  image: quay.io/openshifttest/openshift-nginx:latest
EOF

4. Migrate the namespace to the 4.9 target cluster.


Actual results:
Migration's restore will fail with  this failure:

time="2021-08-24T13:33:53Z" level=error msg="error restoring deploycustoms.samplecontroller.k8s.io: the server could not find the requested resource" logSource="pkg/restore/restore.go:1333" restore=openshift-migration/migration-d76bd-final-5lz8j


Expected results:
The CRD and the CR should be migrated without problems.

Additional info:
In 4.9 apiVersion: apiextensions.k8s.io/v1beta1 is deprecated for CRDs. That could be the root cause of the failure.

Comment 1 Erik Nelson 2021-08-25 15:09:37 UTC
I think this might be something related to Velero but I'm unsure. Scott can you evaluate and understand if this is expected / what's going on here?

Comment 2 Scott Seago 2021-08-25 15:25:27 UTC
The issue is that the CRD in the backup is v1beta1 (since that's the only GVR for 3 clusters), and that GVR isn't available in 4.9 clusters. It's vaguely equivalent to a Deployment migration from 3.7 to 4.6 where apps/v1beta1 is in 3.7 but not 4.6, and apps/v1 is in 4.6 but not 3.7. The difference is that our GVR compatibility warning validation doesn't catch CRDs, but it does catch Deployments.

I think all we can do is make sure we fix the compat warning code to catch this

The main challenge is that we don't necessarily know is whether there are CRDs that will be migrated. That's logic built into velero.

The simple fix would be to always warn that users are responsible for pre-creating their own CRDs for 3.x/4.2- to 4.9+ migrations, but that would be triggered for basically every 3->4 migration for users moving to 4.9+ clusters.

The more complicated fix would be to identify whether there are any CRDs being migrated. That would entail looking at all of the existing GRs that have resources to migrate (I think we're already doing this to catch basic GVR incompatibility), and add a step where we also see whether that resource's GR corresponds to a CRD, and if it does (and if src->dest is a pair with incompatible CRD GVR availabi;oty_, then warn that the user will be responsible for pre-creating that particular CRD on the destination cluster.

Comment 3 Scott Seago 2021-08-25 15:29:31 UTC
(ignore the typo of "availabi;oty_" for "availability" above. Apparently BZ doesn't actually let me change it.


Comment 4 Erik Nelson 2021-09-01 13:42:54 UTC
Plan is to warn the user about CRDs that don't exist on dest if they have CRs in their namespaces if CRD APIVersion is incompatible between src and dest. Incoming.

Comment 5 Erik Nelson 2021-09-03 20:21:03 UTC
Kicking up the urgency to blocker here, we want to include this with the 1.6.0 release.

Comment 10 Xin jiang 2021-09-16 02:36:55 UTC
verified with mtc 1.6.0. 


 when CRDs don't exist on dest, migplan will report a GVKsIncompatible warning 
  - category: Warn
    lastTransitionTime: "2021-09-16T02:34:04Z"
    message: 'Some namespaces contain GVKs incompatible with destination cluster. See: `incompatibleNamespaces` for details.'
    status: "True"
    type: GVKsIncompatible

Comment 12 errata-xmlrpc 2021-09-29 14:35:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.6.0 security & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3694


Note You need to log in before you can comment on or make changes to this bug.