1824395 – OCP migration not working

Bug 1824395 - OCP migration not working

Summary: OCP migration not working

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Migration Tooling
Sub Component:
Version:	4.3.z
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Scott Seago
QA Contact:	Xin jiang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-16 06:00 UTC by Ashwin
Modified:	2020-05-28 11:10 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-28 11:09:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
velero logs (82.14 KB, text/plain) 2020-04-16 06:00 UTC, Ashwin	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2020:2326	0	None	None	None	2020-05-28 11:10:21 UTC

Description Ashwin 2020-04-16 06:00:15 UTC

Created attachment 1679232 [details]
velero logs

Description of problem:

Migration from OCP 3.11 to 4.3.5 isn't working. The status shows that migration was completed but in reality it didn't. No errors in logs.

Version-Release number of selected component (if applicable):

OCP 3.11, OCP 4.3.5 (on IBM Cloud), mig-operator 1.0.1

How reproducible:

Maybe reproducable

Steps to Reproduce:
1. Follow instructions from here - https://cloud.ibm.com/docs/openshift?topic=openshift-openshift_versions#ocp-3-to-4-migration

Actual results:

Migration doesn't happen however no errors in logs. The backup and restore crds aren't created.

Attached velero logs


    Message:               The migration registry resources have been created.
    Status:                True
    Type:                  RegistriesEnsured
    Category:              Required
    Last Transition Time:  2020-04-16T05:19:45Z
    Message:               The migration plan is ready.
    Status:                True
    Type:                  Ready
Events:                    <none>


Status:
  Conditions:
    Category:              Required
    Last Transition Time:  2020-04-15T18:07:33Z
    Message:               The storage is ready.
    Status:                True
    Type:                  Ready
Events:                    <none>

Status:
  Conditions:
    Category:              Advisory
    Durable:               true
    Last Transition Time:  2020-04-16T05:33:11Z
    Message:               The migration has completed successfully.
    Reason:                Completed
    Status:                True
    Type:                  Succeeded
  Phase:                   Completed
  Start Timestamp:         2020-04-16T05:32:59Z
Events:                    <none>

Expected results:

Migration should complete

Additional info:

Comment 1 Xin jiang 2020-04-16 08:25:15 UTC

I am seeing below error from the attached log:
time="2020-04-16T05:12:13Z" level=info msg="Checking that all backup storage locations are valid" logSource="pkg/cmd/server/server.go:412"
time="2020-04-16T05:12:15Z" level=warning msg="A backup storage location named default has been specified for the server to use by default, but no corresponding backup storage location exists. Backups with a location not matching the default will need to explicitly specify an existing location" error="backupstoragelocations.velero.io \"default\" not found" error.file="/go/src/github.com/heptio/velero/pkg/cmd/server/server.go:318" error.function="github.com/heptio/velero/pkg/cmd/server.(*server).run" logSource="pkg/cmd/server/server.go:319"

Would you please list the migstorage, just like below command.

oc -n openshift-migration describe migstorage <migstorage name>

Comment 2 Ashwin 2020-04-16 09:48:53 UTC

Name:         migstorage-sample
Namespace:    openshift-migration
Labels:       controller-tools.k8s.io=1.0
Annotations:  touch: 5c826b17-01e9-4aba-8b8c-348e94400d00
API Version:  migration.openshift.io/v1alpha1
Kind:         MigStorage
Metadata:
  Creation Timestamp:  2020-04-15T18:07:32Z
  Generation:          2
  Resource Version:    5069155
  Self Link:           /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migstorages/migstorage-sample
  UID:                 308cc268-e3f9-4d7d-a80c-4f0cf754c9fe
Spec:
  Backup Storage Config:
    Aws Bucket Name:  cacf-pt-opc4
    Aws Public URL:   https://s3.us.cloud-object-storage.appdomain.cloud
    Aws Region:       us-geo
    awsS3Url:         https://s3.us.cloud-object-storage.appdomain.cloud
    Creds Secret Ref:
      Name:                 migstorage-creds
      Namespace:            openshift-migration
  Backup Storage Provider:  aws
  Volume Snapshot Config:
    Aws Region:  us-geo
    Creds Secret Ref:
      Name:                  migstorage-creds
      Namespace:             openshift-migration
  Volume Snapshot Provider:  aws
Status:
  Conditions:
    Category:              Required
    Last Transition Time:  2020-04-15T18:07:33Z
    Message:               The storage is ready.
    Status:                True
    Type:                  Ready
Events:                    <none>

Comment 4 John Matthews 2020-04-16 13:05:30 UTC

Note from the velero logs I'm not seeing an obvious error.

The issue seen with no 'default' is fine.  We don't expect a default, after MigPlan is created we will create the BackupStorageLocation resources.

Confirmed that looks good from Velero logs with below in logs:
"Got backups from backup store" backupCount=0 backupLocation=migstorage-sample-cqvvc controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:184"


After this I don't see further work, beyond the confirmation that Velero is happy with the backupLocation.

I would have expected to see logs from work being done, not seeing that.

Comment 5 John Matthews 2020-04-16 13:05:31 UTC

Note from the velero logs I'm not seeing an obvious error.

The issue seen with no 'default' is fine.  We don't expect a default, after MigPlan is created we will create the BackupStorageLocation resources.

Confirmed that looks good from Velero logs with below in logs:
"Got backups from backup store" backupCount=0 backupLocation=migstorage-sample-cqvvc controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:184"


After this I don't see further work, beyond the confirmation that Velero is happy with the backupLocation.

I would have expected to see logs from work being done, not seeing that.

Comment 6 Scott Seago 2020-04-16 13:17:00 UTC

I did notice that from the migration output shown above, the migration itself is in the completed state. Are the velero logs attached above for the source cluster or for the destination/target cluster?

Also, are there any velero backup or restore CRDs in either the src or dest clusters?
I wonder if 'velero backup get -n openshift-migration' on src  cluster or 'velero restore get -n openshift-migration' on dest cluster would show anything.

I've never seen a case where a migration moved to a successful Completed state without the velero operations running, so if that's what is happening here, we should probably look at the controller logs too.

Comment 7 Ashwin 2020-04-16 13:39:46 UTC

Hi

I don't see any backup or restore CRDs in either the source or destination cluster - 

sh-4.4$ ./velero restore get -n openshift-migration
NAME   BACKUP   STATUS   WARNINGS   ERRORS   CREATED   SELECTOR
sh-4.4$ 
sh-4.4$ 
sh-4.4$ 
sh-4.4$ ./velero backup get -n openshift-migration
NAME   STATUS   CREATED   EXPIRES   STORAGE LOCATION   SELECTOR

Comment 8 Scott Seago 2020-04-16 14:00:20 UTC

So if the velero CRDs aren't being created at all, that sounds like a problem with the controller rather than with Velero. It might be helpful to see the controller logs as well as the complete yaml output for the migplan and migmigration.

Comment 9 Ashwin 2020-04-16 14:24:10 UTC

[root@oc2382025704 ~]# oc get MigPlan -o yaml
apiVersion: v1
items:
- apiVersion: migration.openshift.io/v1alpha1
  kind: MigPlan
  metadata:
    annotations:
      touch: 4595311c-c429-4b8d-af1f-bc4bfea7399f
    creationTimestamp: "2020-04-16T13:13:23Z"
    generation: 4
    labels:
      controller-tools.k8s.io: "1.0"
    name: migplan-sample
    namespace: openshift-migration
    resourceVersion: "5305433"
    selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migplans/migplan-sample
    uid: 0082df77-7fc3-4079-91d6-9952e605005b
  spec:
    destMigClusterRef:
      name: migcluster-local
      namespace: openshift-migration
    migStorageRef:
      name: migstorage-sample
      namespace: openshift-migration
    namespaces:
    - mig-test
    srcMigClusterRef:
      name: migcluster-remote
      namespace: openshift-migration
  status:
    conditions:
    - category: Required
      lastTransitionTime: "2020-04-16T13:13:28Z"
      message: The `persistentVolumes` list has been updated with discovered PVs.
      reason: Done
      status: "True"
      type: PvsDiscovered
    - category: Required
      lastTransitionTime: "2020-04-16T13:13:30Z"
      message: The storage resources have been created.
      status: "True"
      type: StorageEnsured
    - category: Required
      lastTransitionTime: "2020-04-16T13:13:32Z"
      message: The migration registry resources have been created.
      status: "True"
      type: RegistriesEnsured
    - category: Required
      lastTransitionTime: "2020-04-16T13:13:32Z"
      message: The migration plan is ready.
      status: "True"
      type: Ready
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
[root@oc2382025704 ~]# oc get MigMigration -o yaml
apiVersion: v1
items:
- apiVersion: migration.openshift.io/v1alpha1
  kind: MigMigration
  metadata:
    annotations:
      touch: 9092518f-c6a8-44be-bd66-ce64f9cab026
    creationTimestamp: "2020-04-16T13:14:11Z"
    generation: 5
    labels:
      controller-tools.k8s.io: "1.0"
    name: migplan-sample
    namespace: openshift-migration
    ownerReferences:
    - apiVersion: migration.openshift.io/v1alpha1
      kind: MigPlan
      name: migplan-sample
      uid: 0082df77-7fc3-4079-91d6-9952e605005b
    resourceVersion: "5277280"
    selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/migplan-sample
    uid: 5ea45cf2-a536-40f5-bb2e-988bfbe01992
  spec:
    keepAnnotations: true
    migPlanRef:
      name: migplan-sample
      namespace: openshift-migration
    stage: true
  status:
    conditions:
    - category: Advisory
      durable: true
      lastTransitionTime: "2020-04-16T13:14:23Z"
      message: The migration has completed successfully.
      reason: Completed
      status: "True"
      type: Succeeded
    phase: Completed
    startTimestamp: "2020-04-16T13:14:11Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
[root@oc2382025704 ~]# 


controller logs - 


W0416 13:05:30.365096       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.StorageClass ended with: The resourceVersion for the provided watch is too old.
W0416 13:08:21.966546       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.VolumeSnapshotLocation ended with: The resourceVersion for the provided watch is too old.
W0416 13:09:23.440540       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.DeploymentConfig ended with: The resourceVersion for the provided watch is too old.
W0416 13:09:31.768404       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1alpha1.MigMigration ended with: too old resource version: 5257506 (5275035)
W0416 13:10:11.075896       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.PersistentVolumeClaim ended with: The resourceVersion for the provided watch is too old.
W0416 13:10:55.735800       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Restore ended with: The resourceVersion for the provided watch is too old.
W0416 13:12:29.465770       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.ImageStream ended with: The resourceVersion for the provided watch is too old.
{"level":"info","ts":1587042851.9383733,"logger":"migration|lbc4q","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":""}
{"level":"info","ts":1587042852.0680423,"logger":"migration|2mgm2","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":"Started"}
{"level":"info","ts":1587042852.4123368,"logger":"migration|zt5pq","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":"Prepare"}
{"level":"info","ts":1587042856.7348688,"logger":"migration|f8z2x","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":"EnsureCloudSecretPropagated"}
{"level":"info","ts":1587042863.2297509,"logger":"migration|f8z2x","msg":"[COMPLETED]","migration":"migplan-sample"}
{"level":"info","ts":1587042863.2517893,"logger":"migration|8t7vm","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":"EnsureCloudSecretPropagated"}
{"level":"info","ts":1587042869.582732,"logger":"migration|8t7vm","msg":"[COMPLETED]","migration":"migplan-sample"}
W0416 13:17:53.507632       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.StorageClass ended with: The resourceVersion for the provided watch is too old.
W0416 13:18:42.932090       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Backup ended with: The resourceVersion for the provided watch is too old.
W0416 13:19:04.335432       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.PersistentVolume ended with: The resourceVersion for the provided watch is too old.
W0416 13:21:23.286066       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.PersistentVolumeClaim ended with: The resourceVersion for the provided watch is too old.
W0416 13:21:41.637269       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Namespace ended with: The resourceVersion for the provided watch is too old.
W0416 13:24:04.956443       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Restore ended with: The resourceVersion for the provided watch is too old.
W0416 13:25:32.640331       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.StorageClass ended with: The resourceVersion for the provided watch is too old.
W0416 13:25:52.532964       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.DeploymentConfig ended with: The resourceVersion for the provided watch is too old.
W0416 13:27:16.473025       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.PersistentVolume ended with: The resourceVersion for the provided watch is too old.
W0416 13:27:19.776645       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.ImageStream ended with: The resourceVersion for the provided watch is too old.
W0416 13:28:18.088683       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Backup ended with: The resourceVersion for the provided watch is too old.
W0416 13:28:53.367165       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.VolumeSnapshotLocation ended with: The resourceVersion for the provided watch is too old.
W0416 13:31:11.790329       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Namespace ended with: The resourceVersion for the provided watch is too old.
W0416 13:33:52.580387       1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map

Comment 10 Scott Seago 2020-04-16 14:36:49 UTC

So it looks like this was a stage migration rather than a final migration. A final migration is a two-pass migration where, in the first pass, PV data is moved over, and in the second pass, everything else comes along. A stage migration only does the "migrate PVs" step, to allow for incremental copying of persistent volumes, which in some cases will significantly shorten final migration times.

In this case, though, you don't have any PVs to migrate, so the stage migration ends up being basically a no-op.

Comment 11 Scott Seago 2020-04-16 14:37:21 UTC

From the migmigration:
  spec:
    keepAnnotations: true
    migPlanRef:
      name: migplan-sample
      namespace: openshift-migration
    stage: true

Comment 12 Ashwin 2020-04-16 16:27:29 UTC

Hi,

The migration failed. Below are the logs 

time="2020-04-16T16:19:50Z" level=info msg="Listing items" backup=openshift-migration/migplan-sample-m2f4v group=extensions/v1beta1 logSource="pkg/backup/resource_backupper.go:224" namespace=ansible-tower-new resource=ingresses
time="2020-04-16T16:19:50Z" level=info msg="Retrieved 0 items" backup=openshift-migration/migplan-sample-m2f4v group=extensions/v1beta1 logSource="pkg/backup/resource_backupper.go:238" namespace=ansible-tower-new resource=ingresses
time="2020-04-16T16:19:52Z" level=info msg="Backup completed" controller=backup logSource="pkg/controller/backup_controller.go:537"
W0416 16:20:02.270248       1 reflector.go:289] github.com/heptio/velero/pkg/cmd/server/server.go:493: watch of *v1.Secret ended with: The resourceVersion for the provided watch is too old.
time="2020-04-16T16:20:12Z" level=info msg="Syncing contents of backup store into cluster" backupLocation=migstorage-sample-mgx67 controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:176"

oc get MigMigration -o yaml

apiVersion: v1
items: []
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
[root@oc2382025704 ~]# oc project openshift-migration
Now using project "openshift-migration" on server "https://c100-e.us-east.containers.cloud.ibm.com:31126".
[root@oc2382025704 ~]# oc get MigMigration -o yaml
apiVersion: v1
items:
- apiVersion: migration.openshift.io/v1alpha1
  kind: MigMigration
  metadata:
    annotations:
      touch: dbaf7523-88c0-4cfc-94c4-32b59f37f188
    creationTimestamp: "2020-04-16T16:07:38Z"
    generation: 8
    labels:
      controller-tools.k8s.io: "1.0"
    name: migplan-sample
    namespace: openshift-migration
    ownerReferences:
    - apiVersion: migration.openshift.io/v1alpha1
      kind: MigPlan
      name: migplan-sample
      uid: caabdcfa-913c-4f0d-b92f-8dc42fa0235f
    resourceVersion: "5359196"
    selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/migplan-sample
    uid: e20db301-bb1c-45fa-aa4f-d83b308c7910
  spec:
    keepAnnotations: true
    migPlanRef:
      name: migplan-sample
      namespace: openshift-migration
    stage: false
  status:
    conditions:
    - category: Advisory
      durable: true
      lastTransitionTime: "2020-04-16T16:15:44Z"
      message: 'The migration has failed.  See: Errors.'
      reason: InitialBackupFailed
      status: "True"
      type: Failed
    errors:
    - 'Backup: openshift-migration/migplan-sample-m2f4v partially failed.'
    phase: Completed
    startTimestamp: "2020-04-16T16:07:38Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


Also, on the object storage, I can see


Total Bytes: 2.1 GB
Total Objects: 242

Comment 13 Ashwin 2020-04-16 16:27:55 UTC

[root@oc2382025704 ~]# oc describe backup migplan-sample-m2f4v
Name:         migplan-sample-m2f4v
Namespace:    openshift-migration
Labels:       app.kubernetes.io/part-of=openshift-migration
              migmigration=e20db301-bb1c-45fa-aa4f-d83b308c7910
              migration-initial-backup=e20db301-bb1c-45fa-aa4f-d83b308c7910
              velero.io/storage-location=migstorage-sample-w8tnx
Annotations:  openshift.io/migrate-copy-phase: final
              openshift.io/migration-registry: 172.30.235.89:5000
              openshift.io/migration-registry-dir: /migplan-sample-registry-caabdcfa-913c-4f0d-b92f-8dc42fa0235f
API Version:  velero.io/v1
Kind:         Backup
Metadata:
  Creation Timestamp:  2020-04-16T16:16:34Z
  Generate Name:       migplan-sample-
  Generation:          1
  Resource Version:    5359466
  Self Link:           /apis/velero.io/v1/namespaces/openshift-migration/backups/migplan-sample-m2f4v
  UID:                 8d84289e-cd3d-4968-9eb2-e177a3e08cb8
Spec:
  Excluded Namespaces:
  Excluded Resources:
    persistentvolumes
    persistentvolumeclaims
  Hooks:
    Resources:
  Include Cluster Resources:  <nil>
  Included Namespaces:
    ansible-tower-new
  Included Resources:
  Label Selector:    <nil>
  Storage Location:  migstorage-sample-w8tnx
  Ttl:               720h0m0s
  Volume Snapshot Locations:
    migstorage-sample-caabdcfa-913c-4f0d-b92f-8dc42fa0235f
Status:
  Completion Timestamp:        2020-04-16T16:19:50Z
  Errors:                      3
  Expiration:                  2020-05-16T16:12:08Z
  Phase:                       PartiallyFailed
  Start Timestamp:             2020-04-16T16:12:08Z
  Validation Errors:           <nil>
  Version:                     1
  Volume Snapshots Attempted:  0
  Volume Snapshots Completed:  0
  Warnings:                    0
Events:                        <none>
[root@oc2382025704 ~]#

Comment 14 Scott Seago 2020-04-16 17:02:42 UTC

So we need to see the velero logs for that backup. You can use the velero CLI to get them "velero backup get -n openshift-migration migplan-sample-m2f4v" That will show just the logs for the one backup. You can also just grab the velero pod logs, but that will include all backups executed since the velero pod started.

Comment 15 Scott Seago 2020-04-22 14:48:59 UTC

While the original problem stated here isn't really a bug -- we expect there to be no velero action taken for a stage migration with no PVs -- it's confusing, so I've added a warning condition on the migmigration when this situation occurs.
PR is here: https://github.com/konveyor/mig-controller/pull/510

Comment 19 Sergio 2020-05-08 12:48:30 UTC

Verified using CAM 1.2 stage

When we run a stage migration using namespace where no PVC exists, a warning is reported in the MigMigration resources telling the user that there is no PVC to execute a "Stage".

This warning is visible in the UI too, marking the migration execution in orange.

oc get MigMigration -o yaml
.......
status:
  conditions:
  - category: Warn
    durable: true
    lastTransitionTime: "2020-05-08T12:45:21Z"
    message: Stage migration was run without any PVs. No Velero operations were initiated.
    status: "True"
    type: StageNoOp

Comment 21 errata-xmlrpc 2020-05-28 11:09:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2326

Note You need to log in before you can comment on or make changes to this bug.