Description of problem: Custom resources in the namespace are not being migrated Version-Release number of selected component (if applicable): OCP4 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.0 True False 5h32m Cluster version is 4.1.0 OCP3 $ oc version oc v3.11.126 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https:XXXXXXX openshift v3.11.104 kubernetes v1.11.0+d4cacc0 Controller: image: quay.io/ocpmigrate/mig-controller:stable imageID: quay.io/ocpmigrate/mig-controller@sha256:7ec48a557240f1d2fa6ee6cd62234b0e75f178eca2a0cc5b95124e01bcd2c114 Velero: image: quay.io/ocpmigrate/velero:stable imageID: quay.io/ocpmigrate/velero@sha256:957725dec5f0fb6a46dee78bd49de9ec4ab66903eabb4561b62ad8f4ad9e6f05 image: quay.io/ocpmigrate/migration-plugin:stable imageID: quay.io/ocpmigrate/migration-plugin@sha256:b4493d826260eb1e3e02ba935aaedfd5310fefefb461ca7dcd9a5d55d4aa8f35 How reproducible: Always Steps to Reproduce: 1. oc new-project cdrstest 2. oc create -f https://raw.githubusercontent.com/kubernetes/sample-controller/master/artifacts/examples/crd.yaml 3. oc create -f https://raw.githubusercontent.com/kubernetes/sample-controller/master/artifacts/examples/example-foo.yaml 4. Migrate cdrstest namespace Actual results: In target ocp4 cluster there is on CR created in cdrstest namespace $ oc get foo -n crdstest error: the server doesn't have a resource type "foo" Expected results: There should be a CR migrated to cdrstest namespace in target cluster. $ oc get foo -n crdstest NAME AGE example-foo 21m Additional info: The content of the backup was this: Backup content: {"authorization.openshift.io/v1/RoleBinding":["crdstest/admin","crdstest/system:deployers","crdstest/system:image-builders","crdstest/system:image-pullers"],"rbac.authorization.k8s.io/v1/RoleBinding":["crdstest/admin","crdstest/system:deployers","crdstest/system:image-builders","crdstest/system:image-pullers"],"samplecontroller.k8s.io/v1alpha1/Foo":["crdstest/example-foo"],"v1/LimitRange":["crdstest/crdstest-core-resource-limits"],"v1/Namespace":["crdstest"],"v1/Secret":["crdstest/builder-dockercfg-jvbbq","crdstest/builder-token-spnc9","crdstest/builder-token-zmhdz","crdstest/default-dockercfg-vwj5m","crdstest/default-token-qbfbz","crdstest/default-token-r9jwf","crdstest/deployer-dockercfg-zvmkd","crdstest/deployer-token-nxlrr","crdstest/deployer-token-p5k88"],"v1/ServiceAccount":["crdstest/builder","crdstest/default","crdstest/deployer"]} If we create manually the CRD in the target cluster and then, later on, we make the migration. The CR is migrated properly.
This is an upstream issue. I have already submitted a PR to include the relevant CRDs in the backup/restore which has been merged. There's a related upstream race condition -- if the newly-loaded CRD isn't yet ready, CR restore will still fail. There's an in-progress upstream PR for that. Once these are both merged, we should be able to include the fix, either when we update to the next Velero release, or if necessary, by cherry-picking the fixes into our internal velero build. The upstream fix that's already merged: https://github.com/vmware-tanzu/velero/pull/1831 The upstream fix that's still in progress: https://github.com/vmware-tanzu/velero/pull/1937
It looks like the upstream in-progress PR is being actively worked again. Once it's merged (and our velero is upgraded to 1.2) I can cherry-pick the upstream fix into our build. The already-merged fix is in velero 1.2
Oops. I updated the wrong PR. Disregard the above comment.
The upstream commits from the (open) upstream PR have been cherry-picked into https://github.com/fusor/velero/pull/48 -- once that's tested and reviewed it can be merged. Once we upgrade to Velero 1.3, we will no longer need to carry this cherry-pick.
We ran into some issues with further testing and believe more work is required to investigate a potential upstream problem. Moving this to next release as we missed the window to get this into CAM 1.1
Velero 1.3.1 should include the remaining part of the fix.
Verified in CAM 1.2 stage Note: We detected that after creating the CRD, velero needs a short amount of time to realize of the CRD existence in order to be able to migrate this CRD's resources. In source cluster (4.2): $ oc get crds | grep deploycustom deploycustoms.samplecontroller.k8s.io 2020-05-07T09:49:12Z $ oc get deploycustom NAME AGE example-deployment 7m44s Result in target cluster (4.3): $ oc get crds | grep deploycustom deploycustoms.samplecontroller.k8s.io 2020-05-07T14:04:55Z $ oc get deploycustom NAME AGE example-deployment 96s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:2326