Bug 1997173

Summary: Migration of custom resource definitions to OpenShift Container Platform 4.9 fails because of API version incompatibility
Product: Migration Toolkit for Containers Reporter: Sergio <sregidor>
Component: GeneralAssignee: Scott Seago <sseago>
Status: CLOSED ERRATA QA Contact: Xin jiang <xjiang>
Severity: urgent Docs Contact: Avital Pinnick <apinnick>
Priority: urgent    
Version: 1.6.0CC: ernelson, prajoshi, rjohnson, ssingla, whu, xjiang
Target Milestone: ---   
Target Release: 1.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-29 14:35:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sergio 2021-08-24 14:36:12 UTC
Description of problem:
When we have a CRD in the source cluster created using the value "apiVersion: apiextensions.k8s.io/v1beta1" and we try to migrate this custom resource definition to a OCP 4.9 cluster, the migration's restore fails.

Version-Release number of selected component (if applicable):
SOURCE CLUSTER: AWS OCP 3.11 (MTC 1.5.1)
TARGET CLSUTER: AWS OCP 4.9 (MTC 1.6.0)
REPLICATION REPOSITORY: AWS

How reproducible:
Always

Steps to Reproduce:
1. In source cluster, create a namespaces 

oc new-project bz-test

2. In source cluster, create the custom resource definition

$ cat <<EOF | oc create -f -
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: deploycustoms.samplecontroller.k8s.io
  namespace: bz-test
spec:
  group: samplecontroller.k8s.io
  version: v1alpha1
  names:
    kind: DeployCustom
    plural: deploycustoms
  scope: Namespaced
EOF

3. In the namespace created in step 1 create a custom resource using the CRD defined in step 2

$ cat <<EOF | oc create -f -
apiVersion: samplecontroller.k8s.io/v1alpha1
kind: DeployCustom
metadata:
  name: my-cr-name
  namespace: bz-test
spec:
  deploymentName: deploy-name
  replicas: 1
  image: quay.io/openshifttest/openshift-nginx:latest
EOF

4. Migrate the namespace to the 4.9 target cluster.


Actual results:
Migration's restore will fail with  this failure:

time="2021-08-24T13:33:53Z" level=error msg="error restoring deploycustoms.samplecontroller.k8s.io: the server could not find the requested resource" logSource="pkg/restore/restore.go:1333" restore=openshift-migration/migration-d76bd-final-5lz8j


Expected results:
The CRD and the CR should be migrated without problems.

Additional info:
In 4.9 apiVersion: apiextensions.k8s.io/v1beta1 is deprecated for CRDs. That could be the root cause of the failure.

Comment 1 Erik Nelson 2021-08-25 15:09:37 UTC
I think this might be something related to Velero but I'm unsure. Scott can you evaluate and understand if this is expected / what's going on here?

Comment 2 Scott Seago 2021-08-25 15:25:27 UTC
The issue is that the CRD in the backup is v1beta1 (since that's the only GVR for 3 clusters), and that GVR isn't available in 4.9 clusters. It's vaguely equivalent to a Deployment migration from 3.7 to 4.6 where apps/v1beta1 is in 3.7 but not 4.6, and apps/v1 is in 4.6 but not 3.7. The difference is that our GVR compatibility warning validation doesn't catch CRDs, but it does catch Deployments.

I think all we can do is make sure we fix the compat warning code to catch this

The main challenge is that we don't necessarily know is whether there are CRDs that will be migrated. That's logic built into velero.

The simple fix would be to always warn that users are responsible for pre-creating their own CRDs for 3.x/4.2- to 4.9+ migrations, but that would be triggered for basically every 3->4 migration for users moving to 4.9+ clusters.

The more complicated fix would be to identify whether there are any CRDs being migrated. That would entail looking at all of the existing GRs that have resources to migrate (I think we're already doing this to catch basic GVR incompatibility), and add a step where we also see whether that resource's GR corresponds to a CRD, and if it does (and if src->dest is a pair with incompatible CRD GVR availabi;oty_, then warn that the user will be responsible for pre-creating that particular CRD on the destination cluster.

Comment 3 Scott Seago 2021-08-25 15:29:31 UTC
(ignore the typo of "availabi;oty_" for "availability" above. Apparently BZ doesn't actually let me change it.


Comment 4 Erik Nelson 2021-09-01 13:42:54 UTC
Plan is to warn the user about CRDs that don't exist on dest if they have CRs in their namespaces if CRD APIVersion is incompatible between src and dest. Incoming.

Comment 5 Erik Nelson 2021-09-03 20:21:03 UTC
Kicking up the urgency to blocker here, we want to include this with the 1.6.0 release.

Comment 10 Xin jiang 2021-09-16 02:36:55 UTC
verified with mtc 1.6.0. 


 when CRDs don't exist on dest, migplan will report a GVKsIncompatible warning 
  - category: Warn
    lastTransitionTime: "2021-09-16T02:34:04Z"
    message: 'Some namespaces contain GVKs incompatible with destination cluster. See: `incompatibleNamespaces` for details.'
    status: "True"
    type: GVKsIncompatible

Comment 12 errata-xmlrpc 2021-09-29 14:35:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.6.0 security & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3694