1833101 – CAM Migrations from 4.3 to 4.4 fail on FinalRestore due to OperatorGroups CRD invalid error

Bug 1833101 - CAM Migrations from 4.3 to 4.4 fail on FinalRestore due to OperatorGroups CRD invalid error

Summary: CAM Migrations from 4.3 to 4.4 fail on FinalRestore due to OperatorGroups CRD...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Migration Tooling
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Derek Whatley
QA Contact:	Xin jiang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-05-07 19:35 UTC by Derek Whatley
Modified:	2020-09-30 18:42 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-30 18:42:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:4148	0	None	None	None	2020-09-30 18:42:53 UTC

Description Derek Whatley 2020-05-07 19:35:47 UTC

-----------------------
Description of problem:
-----------------------
When attempting a stateless migration from 4.3->4.4, I have repeatedly observed "Partial Failure" on the Velero "FinalRestore" that is part of the CAM migration flow.

-------------------------------------------------------------
Version-Release number of selected component (if applicable):
-------------------------------------------------------------
SOURCE CLUSTER: 4.3
TARGET CLUSTER: 4.4
KONVEYOR: 1.2


-----------------
How reproducible:
-----------------
In my experience, always.


-------------------
Steps to Reproduce:
1. Create 'hello-openshift' namespace

$ oc create namespace openshift-migration

2. Create pod to be migrated

apiVersion: v1
kind: Pod
metadata:
  annotations:
  generateName: hello-openshift-68876989dc-
  name: hello-openshift-68876989dc-bwhrq
  namespace: hello-openshift
spec:
  containers:
  - image: openshift/hello-openshift:latest
    imagePullPolicy: Always
    name: hello-openshift
    ports:
    - containerPort: 80
      protocol: TCP
    resources: {}
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1000570000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File

$ oc create -f pod.yaml
 


3. Create plan to migrate above NS; run Final Migration

apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
  name: migration-1
  namespace: openshift-migration
spec:
  migPlanRef:
    name: gvkdiff
    namespace: openshift-migration
  quiescePods: true
  stage: false
-------------------

---------------
Actual results:
---------------

---------------------
mig-controller output
---------------------
{"level":"info","ts":1588877311.3817651,"logger":"migration|gmmkj","msg":"[RUN]","migration":"migration-3","stage":false,"phase":"FinalRestoreFailed"}
{"level":"info","ts":1588877315.816103,"logger":"migration|k7k6l","msg":"[RUN]","migration":"migration-3","stage":false,"phase":"DeleteMigrated"}


--------------
migplan status
--------------
- 'Restore: openshift-migration/migration-3-grf4x partially failed.'


--------------------------
velero logs recorded error
--------------------------
time="2020-05-07T18:48:14Z" level=info msg="error restoring operatorgroups.operators.coreos.com: CustomResourceDefinition.apiextensions.k8s.io \"operatorgroups.operators.coreos.com\" is invalid: [spec.versions: Invalid value: []apiextensions.CustomResourceDefinitionVersion{apiextensions.CustomResourceDefinitionVersion{Name:\"v1\", Served:true, Storage:true, Schema:(*apiextensions.CustomResourceValidation)(0xc025456a08), Subre
sources:(*apiextensions.CustomResourceSubresources)(0xc012fea770), AdditionalPrinterColumns:[]apiextensions.CustomResourceColumnDefinition(nil)}, apiextensions.CustomResourceDefinitionVersion{Name:\"v1alpha2\", Served:true, Storage:false, Schema:(*apiextensions.CustomResourceValidation)(0xc025456a10), Subresources:(*apiextensions.CustomResourceSubresources)(0xc012fea9c0), AdditionalPrinterColumns:[]apiextensions.CustomResourc
eColumnDefinition(nil)}}: per-version schemas may not all be set to identical values (top-level validation should be used instead), spec.versions: Invalid value: []apiextensions.CustomResourceDefinitionVersion{apiextensions.CustomResourceDefinitionVersion{Name:\"v1\", Served:true, Storage:true, Schema:(*apiextensions.CustomResourceValidation)(0xc025456a08), Subresources:(*apiextensions.CustomResourceSubresources)(0xc012fea770
), AdditionalPrinterColumns:[]apiextensions.CustomResourceColumnDefinition(nil)}, apiextensions.CustomResourceDefinitionVersion{Name:\"v1alpha2\", Served:true, Storage:false, Schema:(*apiextensions.CustomResourceValidation)(0xc025456a10), Subresources:(*apiextensions.CustomResourceSubresources)(0xc012fea9c0), AdditionalPrinterColumns:[]apiextensions.CustomResourceColumnDefinition(nil)}}: per-version subresources may not all b
e set to identical values (top-level subresources should be used instead)]" logSource="pkg/restore/restore.go:1199" restore=openshift-migration/migration-3-grf4x


-----------------
Expected results:
-----------------

Migration success.


----------------
Additional info:
----------------
Dylan Murray and Scott Seago mentioned the below links may be pertinent.

https://github.com/kubernetes/apiextensions-apiserver/blob/master/pkg/apis/apiextensions/validation/validation.go#L249

https://github.com/vmware-tanzu/velero/issues/2383

https://github.com/vmware-tanzu/velero/issues/2383#issuecomment-616751291

https://github.com/vmware-tanzu/velero/issues/2383#issuecomment-617349350

https://github.com/vmware-tanzu/velero/pull/2478

Comment 1 Derek Whatley 2020-05-07 20:25:28 UTC

======
UPDATE
======

This issue only occurs when the namespace being migrated contains an OperatorGroup. Issue would probably also get triggered while migrating other core OpenShift 4.x CRs.

In this case, I had created the OperatorGroup below (and forgot about it) in my namespace.

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  generateName: openshift-migration-
  namespace: hello-openshift
spec:
  targetNamespaces:
  - openshift-migration
status:
  namespaces:
  - openshift-migration

Comment 3 Jason Montleon 2020-08-21 02:35:21 UTC

Should we add operatorgroups to the default list of excluded resources?

Comment 4 Derek Whatley 2020-08-21 17:11:45 UTC

Fix posted: https://github.com/konveyor/mig-operator/pull/411

Comment 8 Sergio 2020-09-17 09:36:35 UTC

Verified using CAM 1.3 stage

OperatorGroups are now added to the excluded resources list by default. They are ignored.

  excludedResources:
  - imagetags
  - templateinstances
  - clusterserviceversions
  - packagemanifests
  - subscriptions
  - servicebrokers
  - servicebindings
  - serviceclasses
  - serviceinstances
  - serviceplans
  - operatorgroups

Moved to verified.

Comment 12 errata-xmlrpc 2020-09-30 18:42:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Migration Toolkit for Containers (MTC) Tool image release advisory 1.3.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4148

Note You need to log in before you can comment on or make changes to this bug.