Created attachment 1679232 [details] velero logs Description of problem: Migration from OCP 3.11 to 4.3.5 isn't working. The status shows that migration was completed but in reality it didn't. No errors in logs. Version-Release number of selected component (if applicable): OCP 3.11, OCP 4.3.5 (on IBM Cloud), mig-operator 1.0.1 How reproducible: Maybe reproducable Steps to Reproduce: 1. Follow instructions from here - https://cloud.ibm.com/docs/openshift?topic=openshift-openshift_versions#ocp-3-to-4-migration Actual results: Migration doesn't happen however no errors in logs. The backup and restore crds aren't created. Attached velero logs Message: The migration registry resources have been created. Status: True Type: RegistriesEnsured Category: Required Last Transition Time: 2020-04-16T05:19:45Z Message: The migration plan is ready. Status: True Type: Ready Events: <none> Status: Conditions: Category: Required Last Transition Time: 2020-04-15T18:07:33Z Message: The storage is ready. Status: True Type: Ready Events: <none> Status: Conditions: Category: Advisory Durable: true Last Transition Time: 2020-04-16T05:33:11Z Message: The migration has completed successfully. Reason: Completed Status: True Type: Succeeded Phase: Completed Start Timestamp: 2020-04-16T05:32:59Z Events: <none> Expected results: Migration should complete Additional info:
I am seeing below error from the attached log: time="2020-04-16T05:12:13Z" level=info msg="Checking that all backup storage locations are valid" logSource="pkg/cmd/server/server.go:412" time="2020-04-16T05:12:15Z" level=warning msg="A backup storage location named default has been specified for the server to use by default, but no corresponding backup storage location exists. Backups with a location not matching the default will need to explicitly specify an existing location" error="backupstoragelocations.velero.io \"default\" not found" error.file="/go/src/github.com/heptio/velero/pkg/cmd/server/server.go:318" error.function="github.com/heptio/velero/pkg/cmd/server.(*server).run" logSource="pkg/cmd/server/server.go:319" Would you please list the migstorage, just like below command. oc -n openshift-migration describe migstorage <migstorage name>
Name: migstorage-sample Namespace: openshift-migration Labels: controller-tools.k8s.io=1.0 Annotations: touch: 5c826b17-01e9-4aba-8b8c-348e94400d00 API Version: migration.openshift.io/v1alpha1 Kind: MigStorage Metadata: Creation Timestamp: 2020-04-15T18:07:32Z Generation: 2 Resource Version: 5069155 Self Link: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migstorages/migstorage-sample UID: 308cc268-e3f9-4d7d-a80c-4f0cf754c9fe Spec: Backup Storage Config: Aws Bucket Name: cacf-pt-opc4 Aws Public URL: https://s3.us.cloud-object-storage.appdomain.cloud Aws Region: us-geo awsS3Url: https://s3.us.cloud-object-storage.appdomain.cloud Creds Secret Ref: Name: migstorage-creds Namespace: openshift-migration Backup Storage Provider: aws Volume Snapshot Config: Aws Region: us-geo Creds Secret Ref: Name: migstorage-creds Namespace: openshift-migration Volume Snapshot Provider: aws Status: Conditions: Category: Required Last Transition Time: 2020-04-15T18:07:33Z Message: The storage is ready. Status: True Type: Ready Events: <none>
Note from the velero logs I'm not seeing an obvious error. The issue seen with no 'default' is fine. We don't expect a default, after MigPlan is created we will create the BackupStorageLocation resources. Confirmed that looks good from Velero logs with below in logs: "Got backups from backup store" backupCount=0 backupLocation=migstorage-sample-cqvvc controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:184" After this I don't see further work, beyond the confirmation that Velero is happy with the backupLocation. I would have expected to see logs from work being done, not seeing that.
I did notice that from the migration output shown above, the migration itself is in the completed state. Are the velero logs attached above for the source cluster or for the destination/target cluster? Also, are there any velero backup or restore CRDs in either the src or dest clusters? I wonder if 'velero backup get -n openshift-migration' on src cluster or 'velero restore get -n openshift-migration' on dest cluster would show anything. I've never seen a case where a migration moved to a successful Completed state without the velero operations running, so if that's what is happening here, we should probably look at the controller logs too.
Hi I don't see any backup or restore CRDs in either the source or destination cluster - sh-4.4$ ./velero restore get -n openshift-migration NAME BACKUP STATUS WARNINGS ERRORS CREATED SELECTOR sh-4.4$ sh-4.4$ sh-4.4$ sh-4.4$ ./velero backup get -n openshift-migration NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR
So if the velero CRDs aren't being created at all, that sounds like a problem with the controller rather than with Velero. It might be helpful to see the controller logs as well as the complete yaml output for the migplan and migmigration.
[root@oc2382025704 ~]# oc get MigPlan -o yaml apiVersion: v1 items: - apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan metadata: annotations: touch: 4595311c-c429-4b8d-af1f-bc4bfea7399f creationTimestamp: "2020-04-16T13:13:23Z" generation: 4 labels: controller-tools.k8s.io: "1.0" name: migplan-sample namespace: openshift-migration resourceVersion: "5305433" selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migplans/migplan-sample uid: 0082df77-7fc3-4079-91d6-9952e605005b spec: destMigClusterRef: name: migcluster-local namespace: openshift-migration migStorageRef: name: migstorage-sample namespace: openshift-migration namespaces: - mig-test srcMigClusterRef: name: migcluster-remote namespace: openshift-migration status: conditions: - category: Required lastTransitionTime: "2020-04-16T13:13:28Z" message: The `persistentVolumes` list has been updated with discovered PVs. reason: Done status: "True" type: PvsDiscovered - category: Required lastTransitionTime: "2020-04-16T13:13:30Z" message: The storage resources have been created. status: "True" type: StorageEnsured - category: Required lastTransitionTime: "2020-04-16T13:13:32Z" message: The migration registry resources have been created. status: "True" type: RegistriesEnsured - category: Required lastTransitionTime: "2020-04-16T13:13:32Z" message: The migration plan is ready. status: "True" type: Ready kind: List metadata: resourceVersion: "" selfLink: "" [root@oc2382025704 ~]# oc get MigMigration -o yaml apiVersion: v1 items: - apiVersion: migration.openshift.io/v1alpha1 kind: MigMigration metadata: annotations: touch: 9092518f-c6a8-44be-bd66-ce64f9cab026 creationTimestamp: "2020-04-16T13:14:11Z" generation: 5 labels: controller-tools.k8s.io: "1.0" name: migplan-sample namespace: openshift-migration ownerReferences: - apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan name: migplan-sample uid: 0082df77-7fc3-4079-91d6-9952e605005b resourceVersion: "5277280" selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/migplan-sample uid: 5ea45cf2-a536-40f5-bb2e-988bfbe01992 spec: keepAnnotations: true migPlanRef: name: migplan-sample namespace: openshift-migration stage: true status: conditions: - category: Advisory durable: true lastTransitionTime: "2020-04-16T13:14:23Z" message: The migration has completed successfully. reason: Completed status: "True" type: Succeeded phase: Completed startTimestamp: "2020-04-16T13:14:11Z" kind: List metadata: resourceVersion: "" selfLink: "" [root@oc2382025704 ~]# controller logs - W0416 13:05:30.365096 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.StorageClass ended with: The resourceVersion for the provided watch is too old. W0416 13:08:21.966546 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.VolumeSnapshotLocation ended with: The resourceVersion for the provided watch is too old. W0416 13:09:23.440540 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.DeploymentConfig ended with: The resourceVersion for the provided watch is too old. W0416 13:09:31.768404 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1alpha1.MigMigration ended with: too old resource version: 5257506 (5275035) W0416 13:10:11.075896 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.PersistentVolumeClaim ended with: The resourceVersion for the provided watch is too old. W0416 13:10:55.735800 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Restore ended with: The resourceVersion for the provided watch is too old. W0416 13:12:29.465770 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.ImageStream ended with: The resourceVersion for the provided watch is too old. {"level":"info","ts":1587042851.9383733,"logger":"migration|lbc4q","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":""} {"level":"info","ts":1587042852.0680423,"logger":"migration|2mgm2","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":"Started"} {"level":"info","ts":1587042852.4123368,"logger":"migration|zt5pq","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":"Prepare"} {"level":"info","ts":1587042856.7348688,"logger":"migration|f8z2x","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":"EnsureCloudSecretPropagated"} {"level":"info","ts":1587042863.2297509,"logger":"migration|f8z2x","msg":"[COMPLETED]","migration":"migplan-sample"} {"level":"info","ts":1587042863.2517893,"logger":"migration|8t7vm","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":"EnsureCloudSecretPropagated"} {"level":"info","ts":1587042869.582732,"logger":"migration|8t7vm","msg":"[COMPLETED]","migration":"migplan-sample"} W0416 13:17:53.507632 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.StorageClass ended with: The resourceVersion for the provided watch is too old. W0416 13:18:42.932090 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Backup ended with: The resourceVersion for the provided watch is too old. W0416 13:19:04.335432 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.PersistentVolume ended with: The resourceVersion for the provided watch is too old. W0416 13:21:23.286066 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.PersistentVolumeClaim ended with: The resourceVersion for the provided watch is too old. W0416 13:21:41.637269 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Namespace ended with: The resourceVersion for the provided watch is too old. W0416 13:24:04.956443 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Restore ended with: The resourceVersion for the provided watch is too old. W0416 13:25:32.640331 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.StorageClass ended with: The resourceVersion for the provided watch is too old. W0416 13:25:52.532964 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.DeploymentConfig ended with: The resourceVersion for the provided watch is too old. W0416 13:27:16.473025 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.PersistentVolume ended with: The resourceVersion for the provided watch is too old. W0416 13:27:19.776645 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.ImageStream ended with: The resourceVersion for the provided watch is too old. W0416 13:28:18.088683 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Backup ended with: The resourceVersion for the provided watch is too old. W0416 13:28:53.367165 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.VolumeSnapshotLocation ended with: The resourceVersion for the provided watch is too old. W0416 13:31:11.790329 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Namespace ended with: The resourceVersion for the provided watch is too old. W0416 13:33:52.580387 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map
So it looks like this was a stage migration rather than a final migration. A final migration is a two-pass migration where, in the first pass, PV data is moved over, and in the second pass, everything else comes along. A stage migration only does the "migrate PVs" step, to allow for incremental copying of persistent volumes, which in some cases will significantly shorten final migration times. In this case, though, you don't have any PVs to migrate, so the stage migration ends up being basically a no-op.
From the migmigration: spec: keepAnnotations: true migPlanRef: name: migplan-sample namespace: openshift-migration stage: true
Hi, The migration failed. Below are the logs time="2020-04-16T16:19:50Z" level=info msg="Listing items" backup=openshift-migration/migplan-sample-m2f4v group=extensions/v1beta1 logSource="pkg/backup/resource_backupper.go:224" namespace=ansible-tower-new resource=ingresses time="2020-04-16T16:19:50Z" level=info msg="Retrieved 0 items" backup=openshift-migration/migplan-sample-m2f4v group=extensions/v1beta1 logSource="pkg/backup/resource_backupper.go:238" namespace=ansible-tower-new resource=ingresses time="2020-04-16T16:19:52Z" level=info msg="Backup completed" controller=backup logSource="pkg/controller/backup_controller.go:537" W0416 16:20:02.270248 1 reflector.go:289] github.com/heptio/velero/pkg/cmd/server/server.go:493: watch of *v1.Secret ended with: The resourceVersion for the provided watch is too old. time="2020-04-16T16:20:12Z" level=info msg="Syncing contents of backup store into cluster" backupLocation=migstorage-sample-mgx67 controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:176" oc get MigMigration -o yaml apiVersion: v1 items: [] kind: List metadata: resourceVersion: "" selfLink: "" [root@oc2382025704 ~]# oc project openshift-migration Now using project "openshift-migration" on server "https://c100-e.us-east.containers.cloud.ibm.com:31126". [root@oc2382025704 ~]# oc get MigMigration -o yaml apiVersion: v1 items: - apiVersion: migration.openshift.io/v1alpha1 kind: MigMigration metadata: annotations: touch: dbaf7523-88c0-4cfc-94c4-32b59f37f188 creationTimestamp: "2020-04-16T16:07:38Z" generation: 8 labels: controller-tools.k8s.io: "1.0" name: migplan-sample namespace: openshift-migration ownerReferences: - apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan name: migplan-sample uid: caabdcfa-913c-4f0d-b92f-8dc42fa0235f resourceVersion: "5359196" selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/migplan-sample uid: e20db301-bb1c-45fa-aa4f-d83b308c7910 spec: keepAnnotations: true migPlanRef: name: migplan-sample namespace: openshift-migration stage: false status: conditions: - category: Advisory durable: true lastTransitionTime: "2020-04-16T16:15:44Z" message: 'The migration has failed. See: Errors.' reason: InitialBackupFailed status: "True" type: Failed errors: - 'Backup: openshift-migration/migplan-sample-m2f4v partially failed.' phase: Completed startTimestamp: "2020-04-16T16:07:38Z" kind: List metadata: resourceVersion: "" selfLink: "" Also, on the object storage, I can see Total Bytes: 2.1 GB Total Objects: 242
[root@oc2382025704 ~]# oc describe backup migplan-sample-m2f4v Name: migplan-sample-m2f4v Namespace: openshift-migration Labels: app.kubernetes.io/part-of=openshift-migration migmigration=e20db301-bb1c-45fa-aa4f-d83b308c7910 migration-initial-backup=e20db301-bb1c-45fa-aa4f-d83b308c7910 velero.io/storage-location=migstorage-sample-w8tnx Annotations: openshift.io/migrate-copy-phase: final openshift.io/migration-registry: 172.30.235.89:5000 openshift.io/migration-registry-dir: /migplan-sample-registry-caabdcfa-913c-4f0d-b92f-8dc42fa0235f API Version: velero.io/v1 Kind: Backup Metadata: Creation Timestamp: 2020-04-16T16:16:34Z Generate Name: migplan-sample- Generation: 1 Resource Version: 5359466 Self Link: /apis/velero.io/v1/namespaces/openshift-migration/backups/migplan-sample-m2f4v UID: 8d84289e-cd3d-4968-9eb2-e177a3e08cb8 Spec: Excluded Namespaces: Excluded Resources: persistentvolumes persistentvolumeclaims Hooks: Resources: Include Cluster Resources: <nil> Included Namespaces: ansible-tower-new Included Resources: Label Selector: <nil> Storage Location: migstorage-sample-w8tnx Ttl: 720h0m0s Volume Snapshot Locations: migstorage-sample-caabdcfa-913c-4f0d-b92f-8dc42fa0235f Status: Completion Timestamp: 2020-04-16T16:19:50Z Errors: 3 Expiration: 2020-05-16T16:12:08Z Phase: PartiallyFailed Start Timestamp: 2020-04-16T16:12:08Z Validation Errors: <nil> Version: 1 Volume Snapshots Attempted: 0 Volume Snapshots Completed: 0 Warnings: 0 Events: <none> [root@oc2382025704 ~]#
So we need to see the velero logs for that backup. You can use the velero CLI to get them "velero backup get -n openshift-migration migplan-sample-m2f4v" That will show just the logs for the one backup. You can also just grab the velero pod logs, but that will include all backups executed since the velero pod started.
While the original problem stated here isn't really a bug -- we expect there to be no velero action taken for a stage migration with no PVs -- it's confusing, so I've added a warning condition on the migmigration when this situation occurs. PR is here: https://github.com/konveyor/mig-controller/pull/510
Verified using CAM 1.2 stage When we run a stage migration using namespace where no PVC exists, a warning is reported in the MigMigration resources telling the user that there is no PVC to execute a "Stage". This warning is visible in the UI too, marking the migration execution in orange. oc get MigMigration -o yaml ....... status: conditions: - category: Warn durable: true lastTransitionTime: "2020-05-08T12:45:21Z" message: Stage migration was run without any PVs. No Velero operations were initiated. status: "True" type: StageNoOp
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:2326