We'd like to ensure this BZ is captured in Release Notes/Known issues (it won't be fixed in 4.2.0 release). +++ This bug was initially created as a clone of Bug #1752985 +++ Description of problem: Backup creation fails for a project with 1000Gi pvc Version-Release number of selected component (if applicable): # oc describe pod/controller-manager-78d9589445-5xztn | grep Image Image: quay.io/ocpmigrate/mig-controller:release-1.0 Image ID: quay.io/ocpmigrate/mig-controller@sha256:0f74db7171712ffc440b3d7b0f02a775ccd71238827ec856b7d090f90f2feffb # oc describe pod/velero-58f7447985-d9hzf | grep Image Image: quay.io/ocpmigrate/migration-plugin:release-1.0 Image ID: quay.io/ocpmigrate/migration-plugin@sha256:eb9b82c3f26bcd876bc501e18dde7cffe7e451c8c8a231959ed4d9f1127b91a6 Image: quay.io/ocpmigrate/velero:fusor-1.1 Image ID: quay.io/ocpmigrate/velero@sha256:6c16a1288bf6aca74afbb0184fa987506839c5193ae8bb2be05cb6aa0a9f3dc5 # oc describe pod/restic-9hst9 | grep Image Image: quay.io/ocpmigrate/velero:fusor-1.1 Image ID: quay.io/ocpmigrate/velero@sha256:6c16a1288bf6aca74afbb0184fa987506839c5193ae8bb2be05cb6aa0a9f3dc5 # oc describe pod/migration-operator-5cb94b46fb-vgs5k | grep Image Image: quay.io/ocpmigrate/mig-operator:release-1.0 Image ID: quay.io/ocpmigrate/mig-operator@sha256:c5e3a0c4ca4ec954f0c6552b367bc7b3baafa5acea833496147d0b6611bef241 Image: quay.io/ocpmigrate/mig-operator:release-1.0 Image ID: quay.io/ocpmigrate/mig-operator@sha256:c5e3a0c4ca4ec954f0c6552b367bc7b3baafa5acea833496147d0b6611bef241 # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-09-17-001320 True False 176m Cluster version is 4.2.0-0.nightly-2019-09-17-001320 How reproducible: always Steps to Reproduce: 1. On a 3.11 cluster create a project with the following CRs # oc get all -n big-pvc NAME READY STATUS RESTARTS AGE pod/postgresql-1-65lbd 1/1 Running 0 1d pod/rails-postgresql-example-1-build 0/1 Completed 0 1d pod/rails-postgresql-example-1-swmkz 1/1 Running 0 1d NAME DESIRED CURRENT READY AGE replicationcontroller/postgresql-1 1 1 1 1d replicationcontroller/rails-postgresql-example-1 1 1 1 1d NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/postgresql ClusterIP 172.27.110.229 <none> 5432/TCP 1d service/rails-postgresql-example ClusterIP 172.26.253.220 <none> 8080/TCP 1d NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/postgresql 1 1 1 config,image(postgresql:9.5) deploymentconfig.apps.openshift.io/rails-postgresql-example 1 1 1 config,image(rails-postgresql-example:latest) NAME TYPE FROM LATEST buildconfig.build.openshift.io/rails-postgresql-example Source Git 1 NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/rails-postgresql-example-1 Source Git@67d882b Complete 25 hours ago 1m40s NAME DOCKER REPO TAGS UPDATED imagestream.image.openshift.io/rails-postgresql-example docker-registry.default.svc:5000/big-pvc/rails-postgresql-example latest 25 hours ago NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route.route.openshift.io/rails-postgresql-example rails-postgresql-example-big-pvc.apps.0906-5ce.qe.rhcloud.com rails-postgresql-example <all> None root@ip-172-31-43-162: /tmp/AWS/3.11 GLUSTERFS # oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-b1e6180f-d8a1-11e9-b34f-029519c5614c 1000Gi RWO Delete Bound big-pvc/postgresql gp2 1d 2. Configure migration CRs with the namespace to be migrated and the pv information. 3. Start migration Actual results: Backup creation fails # oc logs velero-c9b9cd88f-rgj7w | grep restic | grep error time="2019-09-17T16:15:29Z" level=error msg="Error backing up item" backup=openshift-migration/migmigration-sample-pbx7w error="timed out waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/heptio/velero/pkg/restic/backupper.go:165" error.function="github.com/heptio/velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1 logSource="pkg/backup/resource_backupper.go:264" name=postgresql-1-65lbd-stage namespace=big-pvc resource=pods Expected results: Backup and migration should be successful. Additional info: --- Additional comment from John Matthews on 2019-09-17 17:58:14 UTC --- Can we get any more logs to learn what went wrong? Any errors from Restic? How did you populate the data in the PV? Was this just a 1000GB PV/PVC that was mostly empty, or did you have data in there up to 1000GB? What did you use for the object storage? Did you have sufficient room in object storage? --- Additional comment from Roshni on 2019-09-20 14:57:19 UTC --- (In reply to John Matthews from comment #1) > Can we get any more logs to learn what went wrong? > Any errors from Restic? No errors in restic > > How did you populate the data in the PV? https://gist.github.com/mffiedler/21e751f99945646998a3e42092af4da8 > Was this just a 1000GB PV/PVC that was mostly empty, or did you have data in > there up to 1000GB? I think I answered this above. I tried migrating with only 25 files (instead of 110) and migration was successful. I could see the pv migrated to the destination. > > What did you use for the object storage? AWS S3 bucket > Did you have sufficient room in object storage? Since it is Amazon S3 bucket, I believe there is no restriction on how much we want to store. I am attaching a screenshot of the storage when I tried migrating 25 8.8Gi files. --- Additional comment from Roshni on 2019-09-20 14:58:05 UTC --- --- Additional comment from Roshni on 2019-09-27 00:37:20 UTC --- Migration was successful when for 400Gi and below. When I tested with 600Gi the failure happened. I am crating the workload following these steps https://gist.github.com/mffiedler/21e751f99945646998a3e42092af4da8
John, Roshni commented: > --- Additional comment from Roshni on 2019-09-27 00:37:20 UTC --- > Migration was successful when for 400Gi and below. When I tested with 600Gi the failure happened. Are you sure you want to document that failure occurs at 1000 Gi if it occurred at 600 Gi?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922