Bug 1824395
| Summary: | OCP migration not working | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ashwin <ashwin.lohia> | ||||
| Component: | Migration Tooling | Assignee: | Scott Seago <sseago> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Xin jiang <xjiang> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.3.z | CC: | chezhang, dwhatley, dymurray, ernelson, jmatthew, jmontleo, mberube, pvauter, sregidor, whu | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.4.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-05-28 11:09:56 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Ashwin
2020-04-16 06:00:15 UTC
I am seeing below error from the attached log: time="2020-04-16T05:12:13Z" level=info msg="Checking that all backup storage locations are valid" logSource="pkg/cmd/server/server.go:412" time="2020-04-16T05:12:15Z" level=warning msg="A backup storage location named default has been specified for the server to use by default, but no corresponding backup storage location exists. Backups with a location not matching the default will need to explicitly specify an existing location" error="backupstoragelocations.velero.io \"default\" not found" error.file="/go/src/github.com/heptio/velero/pkg/cmd/server/server.go:318" error.function="github.com/heptio/velero/pkg/cmd/server.(*server).run" logSource="pkg/cmd/server/server.go:319" Would you please list the migstorage, just like below command. oc -n openshift-migration describe migstorage <migstorage name> Name: migstorage-sample
Namespace: openshift-migration
Labels: controller-tools.k8s.io=1.0
Annotations: touch: 5c826b17-01e9-4aba-8b8c-348e94400d00
API Version: migration.openshift.io/v1alpha1
Kind: MigStorage
Metadata:
Creation Timestamp: 2020-04-15T18:07:32Z
Generation: 2
Resource Version: 5069155
Self Link: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migstorages/migstorage-sample
UID: 308cc268-e3f9-4d7d-a80c-4f0cf754c9fe
Spec:
Backup Storage Config:
Aws Bucket Name: cacf-pt-opc4
Aws Public URL: https://s3.us.cloud-object-storage.appdomain.cloud
Aws Region: us-geo
awsS3Url: https://s3.us.cloud-object-storage.appdomain.cloud
Creds Secret Ref:
Name: migstorage-creds
Namespace: openshift-migration
Backup Storage Provider: aws
Volume Snapshot Config:
Aws Region: us-geo
Creds Secret Ref:
Name: migstorage-creds
Namespace: openshift-migration
Volume Snapshot Provider: aws
Status:
Conditions:
Category: Required
Last Transition Time: 2020-04-15T18:07:33Z
Message: The storage is ready.
Status: True
Type: Ready
Events: <none>
Note from the velero logs I'm not seeing an obvious error. The issue seen with no 'default' is fine. We don't expect a default, after MigPlan is created we will create the BackupStorageLocation resources. Confirmed that looks good from Velero logs with below in logs: "Got backups from backup store" backupCount=0 backupLocation=migstorage-sample-cqvvc controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:184" After this I don't see further work, beyond the confirmation that Velero is happy with the backupLocation. I would have expected to see logs from work being done, not seeing that. Note from the velero logs I'm not seeing an obvious error. The issue seen with no 'default' is fine. We don't expect a default, after MigPlan is created we will create the BackupStorageLocation resources. Confirmed that looks good from Velero logs with below in logs: "Got backups from backup store" backupCount=0 backupLocation=migstorage-sample-cqvvc controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:184" After this I don't see further work, beyond the confirmation that Velero is happy with the backupLocation. I would have expected to see logs from work being done, not seeing that. I did notice that from the migration output shown above, the migration itself is in the completed state. Are the velero logs attached above for the source cluster or for the destination/target cluster? Also, are there any velero backup or restore CRDs in either the src or dest clusters? I wonder if 'velero backup get -n openshift-migration' on src cluster or 'velero restore get -n openshift-migration' on dest cluster would show anything. I've never seen a case where a migration moved to a successful Completed state without the velero operations running, so if that's what is happening here, we should probably look at the controller logs too. Hi I don't see any backup or restore CRDs in either the source or destination cluster - sh-4.4$ ./velero restore get -n openshift-migration NAME BACKUP STATUS WARNINGS ERRORS CREATED SELECTOR sh-4.4$ sh-4.4$ sh-4.4$ sh-4.4$ ./velero backup get -n openshift-migration NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR So if the velero CRDs aren't being created at all, that sounds like a problem with the controller rather than with Velero. It might be helpful to see the controller logs as well as the complete yaml output for the migplan and migmigration. [root@oc2382025704 ~]# oc get MigPlan -o yaml
apiVersion: v1
items:
- apiVersion: migration.openshift.io/v1alpha1
kind: MigPlan
metadata:
annotations:
touch: 4595311c-c429-4b8d-af1f-bc4bfea7399f
creationTimestamp: "2020-04-16T13:13:23Z"
generation: 4
labels:
controller-tools.k8s.io: "1.0"
name: migplan-sample
namespace: openshift-migration
resourceVersion: "5305433"
selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migplans/migplan-sample
uid: 0082df77-7fc3-4079-91d6-9952e605005b
spec:
destMigClusterRef:
name: migcluster-local
namespace: openshift-migration
migStorageRef:
name: migstorage-sample
namespace: openshift-migration
namespaces:
- mig-test
srcMigClusterRef:
name: migcluster-remote
namespace: openshift-migration
status:
conditions:
- category: Required
lastTransitionTime: "2020-04-16T13:13:28Z"
message: The `persistentVolumes` list has been updated with discovered PVs.
reason: Done
status: "True"
type: PvsDiscovered
- category: Required
lastTransitionTime: "2020-04-16T13:13:30Z"
message: The storage resources have been created.
status: "True"
type: StorageEnsured
- category: Required
lastTransitionTime: "2020-04-16T13:13:32Z"
message: The migration registry resources have been created.
status: "True"
type: RegistriesEnsured
- category: Required
lastTransitionTime: "2020-04-16T13:13:32Z"
message: The migration plan is ready.
status: "True"
type: Ready
kind: List
metadata:
resourceVersion: ""
selfLink: ""
[root@oc2382025704 ~]# oc get MigMigration -o yaml
apiVersion: v1
items:
- apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
annotations:
touch: 9092518f-c6a8-44be-bd66-ce64f9cab026
creationTimestamp: "2020-04-16T13:14:11Z"
generation: 5
labels:
controller-tools.k8s.io: "1.0"
name: migplan-sample
namespace: openshift-migration
ownerReferences:
- apiVersion: migration.openshift.io/v1alpha1
kind: MigPlan
name: migplan-sample
uid: 0082df77-7fc3-4079-91d6-9952e605005b
resourceVersion: "5277280"
selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/migplan-sample
uid: 5ea45cf2-a536-40f5-bb2e-988bfbe01992
spec:
keepAnnotations: true
migPlanRef:
name: migplan-sample
namespace: openshift-migration
stage: true
status:
conditions:
- category: Advisory
durable: true
lastTransitionTime: "2020-04-16T13:14:23Z"
message: The migration has completed successfully.
reason: Completed
status: "True"
type: Succeeded
phase: Completed
startTimestamp: "2020-04-16T13:14:11Z"
kind: List
metadata:
resourceVersion: ""
selfLink: ""
[root@oc2382025704 ~]#
controller logs -
W0416 13:05:30.365096 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.StorageClass ended with: The resourceVersion for the provided watch is too old.
W0416 13:08:21.966546 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.VolumeSnapshotLocation ended with: The resourceVersion for the provided watch is too old.
W0416 13:09:23.440540 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.DeploymentConfig ended with: The resourceVersion for the provided watch is too old.
W0416 13:09:31.768404 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1alpha1.MigMigration ended with: too old resource version: 5257506 (5275035)
W0416 13:10:11.075896 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.PersistentVolumeClaim ended with: The resourceVersion for the provided watch is too old.
W0416 13:10:55.735800 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Restore ended with: The resourceVersion for the provided watch is too old.
W0416 13:12:29.465770 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.ImageStream ended with: The resourceVersion for the provided watch is too old.
{"level":"info","ts":1587042851.9383733,"logger":"migration|lbc4q","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":""}
{"level":"info","ts":1587042852.0680423,"logger":"migration|2mgm2","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":"Started"}
{"level":"info","ts":1587042852.4123368,"logger":"migration|zt5pq","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":"Prepare"}
{"level":"info","ts":1587042856.7348688,"logger":"migration|f8z2x","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":"EnsureCloudSecretPropagated"}
{"level":"info","ts":1587042863.2297509,"logger":"migration|f8z2x","msg":"[COMPLETED]","migration":"migplan-sample"}
{"level":"info","ts":1587042863.2517893,"logger":"migration|8t7vm","msg":"[RUN]","migration":"migplan-sample","stage":true,"phase":"EnsureCloudSecretPropagated"}
{"level":"info","ts":1587042869.582732,"logger":"migration|8t7vm","msg":"[COMPLETED]","migration":"migplan-sample"}
W0416 13:17:53.507632 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.StorageClass ended with: The resourceVersion for the provided watch is too old.
W0416 13:18:42.932090 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Backup ended with: The resourceVersion for the provided watch is too old.
W0416 13:19:04.335432 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.PersistentVolume ended with: The resourceVersion for the provided watch is too old.
W0416 13:21:23.286066 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.PersistentVolumeClaim ended with: The resourceVersion for the provided watch is too old.
W0416 13:21:41.637269 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Namespace ended with: The resourceVersion for the provided watch is too old.
W0416 13:24:04.956443 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Restore ended with: The resourceVersion for the provided watch is too old.
W0416 13:25:32.640331 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.StorageClass ended with: The resourceVersion for the provided watch is too old.
W0416 13:25:52.532964 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.DeploymentConfig ended with: The resourceVersion for the provided watch is too old.
W0416 13:27:16.473025 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.PersistentVolume ended with: The resourceVersion for the provided watch is too old.
W0416 13:27:19.776645 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.ImageStream ended with: The resourceVersion for the provided watch is too old.
W0416 13:28:18.088683 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Backup ended with: The resourceVersion for the provided watch is too old.
W0416 13:28:53.367165 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.VolumeSnapshotLocation ended with: The resourceVersion for the provided watch is too old.
W0416 13:31:11.790329 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Namespace ended with: The resourceVersion for the provided watch is too old.
W0416 13:33:52.580387 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map
So it looks like this was a stage migration rather than a final migration. A final migration is a two-pass migration where, in the first pass, PV data is moved over, and in the second pass, everything else comes along. A stage migration only does the "migrate PVs" step, to allow for incremental copying of persistent volumes, which in some cases will significantly shorten final migration times. In this case, though, you don't have any PVs to migrate, so the stage migration ends up being basically a no-op. From the migmigration:
spec:
keepAnnotations: true
migPlanRef:
name: migplan-sample
namespace: openshift-migration
stage: true
Hi, The migration failed. Below are the logs time="2020-04-16T16:19:50Z" level=info msg="Listing items" backup=openshift-migration/migplan-sample-m2f4v group=extensions/v1beta1 logSource="pkg/backup/resource_backupper.go:224" namespace=ansible-tower-new resource=ingresses time="2020-04-16T16:19:50Z" level=info msg="Retrieved 0 items" backup=openshift-migration/migplan-sample-m2f4v group=extensions/v1beta1 logSource="pkg/backup/resource_backupper.go:238" namespace=ansible-tower-new resource=ingresses time="2020-04-16T16:19:52Z" level=info msg="Backup completed" controller=backup logSource="pkg/controller/backup_controller.go:537" W0416 16:20:02.270248 1 reflector.go:289] github.com/heptio/velero/pkg/cmd/server/server.go:493: watch of *v1.Secret ended with: The resourceVersion for the provided watch is too old. time="2020-04-16T16:20:12Z" level=info msg="Syncing contents of backup store into cluster" backupLocation=migstorage-sample-mgx67 controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:176" oc get MigMigration -o yaml apiVersion: v1 items: [] kind: List metadata: resourceVersion: "" selfLink: "" [root@oc2382025704 ~]# oc project openshift-migration Now using project "openshift-migration" on server "https://c100-e.us-east.containers.cloud.ibm.com:31126". [root@oc2382025704 ~]# oc get MigMigration -o yaml apiVersion: v1 items: - apiVersion: migration.openshift.io/v1alpha1 kind: MigMigration metadata: annotations: touch: dbaf7523-88c0-4cfc-94c4-32b59f37f188 creationTimestamp: "2020-04-16T16:07:38Z" generation: 8 labels: controller-tools.k8s.io: "1.0" name: migplan-sample namespace: openshift-migration ownerReferences: - apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan name: migplan-sample uid: caabdcfa-913c-4f0d-b92f-8dc42fa0235f resourceVersion: "5359196" selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/migplan-sample uid: e20db301-bb1c-45fa-aa4f-d83b308c7910 spec: keepAnnotations: true migPlanRef: name: migplan-sample namespace: openshift-migration stage: false status: conditions: - category: Advisory durable: true lastTransitionTime: "2020-04-16T16:15:44Z" message: 'The migration has failed. See: Errors.' reason: InitialBackupFailed status: "True" type: Failed errors: - 'Backup: openshift-migration/migplan-sample-m2f4v partially failed.' phase: Completed startTimestamp: "2020-04-16T16:07:38Z" kind: List metadata: resourceVersion: "" selfLink: "" Also, on the object storage, I can see Total Bytes: 2.1 GB Total Objects: 242 [root@oc2382025704 ~]# oc describe backup migplan-sample-m2f4v
Name: migplan-sample-m2f4v
Namespace: openshift-migration
Labels: app.kubernetes.io/part-of=openshift-migration
migmigration=e20db301-bb1c-45fa-aa4f-d83b308c7910
migration-initial-backup=e20db301-bb1c-45fa-aa4f-d83b308c7910
velero.io/storage-location=migstorage-sample-w8tnx
Annotations: openshift.io/migrate-copy-phase: final
openshift.io/migration-registry: 172.30.235.89:5000
openshift.io/migration-registry-dir: /migplan-sample-registry-caabdcfa-913c-4f0d-b92f-8dc42fa0235f
API Version: velero.io/v1
Kind: Backup
Metadata:
Creation Timestamp: 2020-04-16T16:16:34Z
Generate Name: migplan-sample-
Generation: 1
Resource Version: 5359466
Self Link: /apis/velero.io/v1/namespaces/openshift-migration/backups/migplan-sample-m2f4v
UID: 8d84289e-cd3d-4968-9eb2-e177a3e08cb8
Spec:
Excluded Namespaces:
Excluded Resources:
persistentvolumes
persistentvolumeclaims
Hooks:
Resources:
Include Cluster Resources: <nil>
Included Namespaces:
ansible-tower-new
Included Resources:
Label Selector: <nil>
Storage Location: migstorage-sample-w8tnx
Ttl: 720h0m0s
Volume Snapshot Locations:
migstorage-sample-caabdcfa-913c-4f0d-b92f-8dc42fa0235f
Status:
Completion Timestamp: 2020-04-16T16:19:50Z
Errors: 3
Expiration: 2020-05-16T16:12:08Z
Phase: PartiallyFailed
Start Timestamp: 2020-04-16T16:12:08Z
Validation Errors: <nil>
Version: 1
Volume Snapshots Attempted: 0
Volume Snapshots Completed: 0
Warnings: 0
Events: <none>
[root@oc2382025704 ~]#
So we need to see the velero logs for that backup. You can use the velero CLI to get them "velero backup get -n openshift-migration migplan-sample-m2f4v" That will show just the logs for the one backup. You can also just grab the velero pod logs, but that will include all backups executed since the velero pod started. While the original problem stated here isn't really a bug -- we expect there to be no velero action taken for a stage migration with no PVs -- it's confusing, so I've added a warning condition on the migmigration when this situation occurs. PR is here: https://github.com/konveyor/mig-controller/pull/510 Verified using CAM 1.2 stage
When we run a stage migration using namespace where no PVC exists, a warning is reported in the MigMigration resources telling the user that there is no PVC to execute a "Stage".
This warning is visible in the UI too, marking the migration execution in orange.
oc get MigMigration -o yaml
.......
status:
conditions:
- category: Warn
durable: true
lastTransitionTime: "2020-05-08T12:45:21Z"
message: Stage migration was run without any PVs. No Velero operations were initiated.
status: "True"
type: StageNoOp
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:2326 |