1748957 – CRs are not being migrated

Bug 1748957 - CRs are not being migrated

Summary: CRs are not being migrated

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Migration Tooling
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Scott Seago
QA Contact:	Sergio
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-09-04 14:29 UTC by Sergio
Modified:	2020-05-28 11:10 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-28 11:09:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2020:2326	0	None	None	None	2020-05-28 11:10:20 UTC

Description Sergio 2019-09-04 14:29:52 UTC

Description of problem:
Custom resources in the namespace are not being migrated


Version-Release number of selected component (if applicable):

OCP4
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0     True        False         5h32m   Cluster version is 4.1.0

OCP3
$ oc version
oc v3.11.126
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https:XXXXXXX
openshift v3.11.104
kubernetes v1.11.0+d4cacc0

Controller:
    image: quay.io/ocpmigrate/mig-controller:stable
    imageID: quay.io/ocpmigrate/mig-controller@sha256:7ec48a557240f1d2fa6ee6cd62234b0e75f178eca2a0cc5b95124e01bcd2c114

Velero:
    image: quay.io/ocpmigrate/velero:stable
    imageID: quay.io/ocpmigrate/velero@sha256:957725dec5f0fb6a46dee78bd49de9ec4ab66903eabb4561b62ad8f4ad9e6f05
    image: quay.io/ocpmigrate/migration-plugin:stable
    imageID: quay.io/ocpmigrate/migration-plugin@sha256:b4493d826260eb1e3e02ba935aaedfd5310fefefb461ca7dcd9a5d55d4aa8f35


How reproducible:
Always


Steps to Reproduce:
1. oc new-project cdrstest
2. oc create -f https://raw.githubusercontent.com/kubernetes/sample-controller/master/artifacts/examples/crd.yaml
3. oc create -f https://raw.githubusercontent.com/kubernetes/sample-controller/master/artifacts/examples/example-foo.yaml
4. Migrate cdrstest namespace


Actual results:
  In target ocp4 cluster there is on CR created in cdrstest namespace
  $ oc get foo -n crdstest
error: the server doesn't have a resource type "foo"


Expected results:
  There should be a CR migrated to cdrstest namespace in target cluster.
$ oc get foo -n crdstest
NAME          AGE
example-foo   21m


Additional info:

The content of the backup was this:

Backup content:
{"authorization.openshift.io/v1/RoleBinding":["crdstest/admin","crdstest/system:deployers","crdstest/system:image-builders","crdstest/system:image-pullers"],"rbac.authorization.k8s.io/v1/RoleBinding":["crdstest/admin","crdstest/system:deployers","crdstest/system:image-builders","crdstest/system:image-pullers"],"samplecontroller.k8s.io/v1alpha1/Foo":["crdstest/example-foo"],"v1/LimitRange":["crdstest/crdstest-core-resource-limits"],"v1/Namespace":["crdstest"],"v1/Secret":["crdstest/builder-dockercfg-jvbbq","crdstest/builder-token-spnc9","crdstest/builder-token-zmhdz","crdstest/default-dockercfg-vwj5m","crdstest/default-token-qbfbz","crdstest/default-token-r9jwf","crdstest/deployer-dockercfg-zvmkd","crdstest/deployer-token-nxlrr","crdstest/deployer-token-p5k88"],"v1/ServiceAccount":["crdstest/builder","crdstest/default","crdstest/deployer"]}


If we create manually the CRD in the target cluster and then, later on, we make the migration. The CR is migrated properly.

Comment 1 Scott Seago 2019-10-28 12:39:17 UTC

This is an upstream issue. I have already submitted a PR to include the relevant CRDs in the backup/restore which has been merged. There's a related upstream race condition -- if the newly-loaded CRD isn't yet ready, CR restore will still fail. There's an in-progress upstream PR for that. Once these are both merged, we should be able to include the fix, either when we update to the next Velero release, or if necessary, by cherry-picking the fixes into our internal velero build.

The upstream fix that's already merged: https://github.com/vmware-tanzu/velero/pull/1831
The upstream fix that's still in progress: https://github.com/vmware-tanzu/velero/pull/1937

Comment 2 Scott Seago 2019-12-16 13:54:20 UTC

It looks like the upstream in-progress PR is being actively worked again. Once it's merged (and our velero is upgraded to 1.2) I can cherry-pick the upstream fix into our build. The already-merged fix is in velero 1.2

Comment 4 Scott Seago 2019-12-16 22:42:43 UTC

Oops. I updated the wrong PR. Disregard the above comment.

Comment 5 Scott Seago 2020-01-10 19:44:54 UTC

The upstream commits from the (open) upstream PR have been cherry-picked into https://github.com/fusor/velero/pull/48 -- once that's tested and reviewed it can be merged. Once we upgrade to Velero 1.3, we will no longer need to carry this cherry-pick.

Comment 6 John Matthews 2020-01-14 20:46:30 UTC

We ran into some issues with further testing and believe more work is required to investigate a potential upstream problem.
Moving this to next release as we missed the window to get this into CAM 1.1

Comment 7 Scott Seago 2020-03-30 23:41:25 UTC

Velero 1.3.1 should include the remaining part of the fix.

Comment 11 Sergio 2020-05-07 14:11:57 UTC

Verified in CAM 1.2 stage

Note: We detected that after creating the CRD, velero needs a short amount of time to realize of the CRD existence in order to be able to migrate this CRD's resources.


In source cluster (4.2):

$ oc get crds | grep deploycustom
deploycustoms.samplecontroller.k8s.io                       2020-05-07T09:49:12Z
$ oc get deploycustom
NAME                 AGE
example-deployment   7m44s


Result in target cluster (4.3):

$ oc get crds | grep deploycustom
deploycustoms.samplecontroller.k8s.io                       2020-05-07T14:04:55Z
$ oc get deploycustom
NAME                 AGE
example-deployment   96s

Comment 13 errata-xmlrpc 2020-05-28 11:09:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2326

Note You need to log in before you can comment on or make changes to this bug.