Bug 1752985
Summary: | Backup creation fails for a project with 1000Gi pvc | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Roshni <rpattath> | ||||
Component: | Migration Tooling | Assignee: | Jason Montleon <jmontleo> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Roshni <rpattath> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.2.0 | CC: | chezhang, mifiedle, sregidor | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.2.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
URL: | https://github.com/fusor/mig-operator/pull/110 | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1757487 (view as bug list) | Environment: | |||||
Last Closed: | 2019-11-21 18:38:14 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1757487 | ||||||
Attachments: |
|
Description
Roshni
2019-09-17 17:44:41 UTC
Can we get any more logs to learn what went wrong? Any errors from Restic? How did you populate the data in the PV? Was this just a 1000GB PV/PVC that was mostly empty, or did you have data in there up to 1000GB? What did you use for the object storage? Did you have sufficient room in object storage? (In reply to John Matthews from comment #1) > Can we get any more logs to learn what went wrong? > Any errors from Restic? No errors in restic > > How did you populate the data in the PV? https://gist.github.com/mffiedler/21e751f99945646998a3e42092af4da8 > Was this just a 1000GB PV/PVC that was mostly empty, or did you have data in > there up to 1000GB? I think I answered this above. I tried migrating with only 25 files (instead of 110) and migration was successful. I could see the pv migrated to the destination. > > What did you use for the object storage? AWS S3 bucket > Did you have sufficient room in object storage? Since it is Amazon S3 bucket, I believe there is no restriction on how much we want to store. I am attaching a screenshot of the storage when I tried migrating 25 8.8Gi files. Created attachment 1617236 [details]
AWS S3 storage screenshot
Migration was successful when for 400Gi and below. When I tested with 600Gi the failure happened. I am crating the workload following these steps https://gist.github.com/mffiedler/21e751f99945646998a3e42092af4da8 Seems like this may be a velero issue. Looks similar to: https://github.com/vmware-tanzu/velero/issues/1868 velero server has a timeout setting for restic. It's an hour by default. I propose we make this customizable from the MigrationController CR for the operator. I'll work on a PR. /velero server --help | grep restic-timeout --restic-timeout duration how long backups/restores of pod volumes should be allowed to run before timing out (default 1h0m0s) restic_timeout: 1h can be modified in the MigrationController CR to allow for larger backups. Change it to 2h, 3h, ..., 24h, ..., 48h, etc. as necessary. Issue in the bug description cannot be reproduced using the following builds. Migration was successful when restic_timeout: 3h was set in the yaml for the controller CR. # oc describe pod/migration-operator-66495ccf7c-kckt9 -n openshift-migration | grep Image containerImage: image-registry.openshift-image-registry.svc:5000/rhcam/openshift-migration-operator:v1.0 Image: image-registry.openshift-image-registry.svc:5000/rhcam/openshift-migration-operator:v1.0 Image ID: image-registry.openshift-image-registry.svc:5000/rhcam/openshift-migration-operator@sha256:db6350c9386343fef1c27b88b3fefc31fc692e97049469564bdc21dbf465454b Image: image-registry.openshift-image-registry.svc:5000/rhcam/openshift-migration-operator:v1.0 Image ID: image-registry.openshift-image-registry.svc:5000/rhcam/openshift-migration-operator@sha256:db6350c9386343fef1c27b88b3fefc31fc692e97049469564bdc21dbf465454b [root@rpattath ~]# oc get pods NAME READY STATUS RESTARTS AGE mongodb-1-deploy 0/1 Completed 0 7h44m mongodb-1-hzn7p 1/1 Running 0 7h44m nodejs-mongo-persistent-1-build 0/1 Completed 0 7h44m nodejs-mongo-persistent-1-deploy 0/1 Completed 0 7h44m nodejs-mongo-persistent-2-deploy 0/1 Completed 0 7h43m nodejs-mongo-persistent-2-r2lq5 1/1 Running 0 7h42m [root@rpattath ~]# oc get pods -n openshift-migration NAME READY STATUS RESTARTS AGE controller-manager-56d558c5f-l498l 1/1 Running 0 17h migration-operator-66495ccf7c-kckt9 2/2 Running 0 17h migration-ui-6f7df75875-rkg2k 1/1 Running 0 17h restic-c8x6d 1/1 Running 0 17h restic-cz7rf 1/1 Running 0 17h restic-n2l5r 1/1 Running 0 17h velero-bdbd6cc56-gfq8p 1/1 Running 0 17h [root@rpattath ~]# oc describe pod/controller-manager-56d558c5f-l498l -n openshift-migration | grep Image Image: image-registry.openshift-image-registry.svc:5000/rhcam/openshift-migration-controller:v1.0 Image ID: image-registry.openshift-image-registry.svc:5000/rhcam/openshift-migration-controller@sha256:46c622e0fbe64165b09930738a7d111c875976b54c8236ddce328cb5470d60ab [root@rpattath ~]# oc describe pod/migration-ui-6f7df75875-rkg2k -n openshift-migration | grep Image Image: image-registry.openshift-image-registry.svc:5000/rhcam/openshift-migration-ui:v1.0 Image ID: image-registry.openshift-image-registry.svc:5000/rhcam/openshift-migration-ui@sha256:59e60d7036ebdc5b7d29895104e7b459a53a1c004e876f50b3e79cdc2b78941c |