Description of problem: This is a further query from customer on Bug 1501637. The upstream gerrit https://review.openstack.org/#/c/476503/ has been tried and Customer has the following further findings / enquiries, Need to understand the logic of the current cinder backup service on ceph. Based on the coding logic of cinder/backup/manager.py, cinder/volume/driver.py, cinder/backup/drivers/ceph.py, the in-use volume backup won't work properly In the this section of code, /usr/lib/python2.7/site-packages/cinder/backup/manager.py: backup_dic = self.volume_rpcapi.get_backup_device(context, Cinder is requesting volume driver/service to return a device object for volume backup request. This device object will be used later in cinder/backup/drivers/ceph.py at, def _backup_rbd(self, backup_id, volume_id, volume_file, volume_name, length): where volume_file is passed as returned object from get_backup_device(). And a snapshot is created for this volume by the following codes, 597 source_rbd_image = volume_file.rbd_image ..... 643 source_rbd_image.create_snap(new_snap) So by the logic of ceph.py, it really should just use the original source volume which the user calls for. And the snapshot will be used for "rbd export-diff". Instead, when volume is "in-use" status, get_backup_device() call to cinder volume service creates a snap-clone for the original ceph volume and return the snap-clone's object handle, which creates two problems, (a) rbd export-diff in ceph.py gets the source volume mixed up, and tries to use original cinder volume as the source of the snapshot in the CMD1, but the snapshot is nowhere to be found for the original cinder volume. The new_snap creation call from above is actually run against the snap_clone volume. As a result, this differential backup will fail. (b) when the above step fails, the code path in ceph.py is trying to perform a full backup with a brutal force copy (block by block) from the true original cinder volume to the backup ceph volume. For active volume ("in-use"), this is clearly not a crash consistent volume but tenant gets no warning and will get impression the backup is fully successful. This should not be the case. If non-disruptive backup mode is offered as a service under newton. This will have to be fixed. Version-Release number of selected component (if applicable): OSP 10 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
On OSP 11 I was able to reproduce as the following. In general, the "incremental" backup is not really an increment but a full copy of the volume. 1. This is the volume to be attached [stack@instack ~]$ openstack volume list +--------------------------------------+--------------+-----------+------+-------------+ | ID | Display Name | Status | Size | Attached to | +--------------------------------------+--------------+-----------+------+-------------+ | 1d12eb29-08bd-4457-98fe-0debf8dbcf59 | backupvol | available | 10 | | +--------------------------------------+--------------+-----------+------+-------------+ 2. Attaching it to my instance stack@instack ~]$ openstack server add volume rhel7-volume-backup backupvol [stack@instack ~]$ openstack volume list +--------------------------------------+--------------+--------+------+----------------------------------------------+ | ID | Display Name | Status | Size | Attached to | +--------------------------------------+--------------+--------+------+----------------------------------------------+ | 1d12eb29-08bd-4457-98fe-0debf8dbcf59 | backupvol | in-use | 10 | Attached to rhel7-volume-backup on /dev/vdd | +--------------------------------------+--------------+--------+------+----------------------------------------------+ 3. Logged in the instance, mkfs and mount the volume. Copied a file to the mount directory 4. Create backup [stack@instack ~]$ cinder backup-create backupvol --force +-----------+--------------------------------------+ | Property | Value | +-----------+--------------------------------------+ | id | 9eed827c-4100-4b20-b520-8bab2d769521 | | name | None | | volume_id | 1d12eb29-08bd-4457-98fe-0debf8dbcf59 | +-----------+--------------------------------------+ 5. Checking on the Ceph side [root@overcloud-controller-0 ~]# rbd -p backups ls volume-1d12eb29-08bd-4457-98fe-0debf8dbcf59.backup.9eed827c-4100-4b20-b520-8bab2d769521 [root@overcloud-controller-0 ~]# rbd -p backups info volume-1d12eb29-08bd-4457-98fe-0debf8dbcf59.backup.9eed827c-4100-4b20-b520-8bab2d769521 rbd image 'volume-1d12eb29-08bd-4457-98fe-0debf8dbcf59.backup.9eed827c-4100-4b20-b520-8bab2d769521': size 10240 MB in 2560 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.26e4977099a5b format: 2 features: layering, striping flags: stripe unit: 4096 kB stripe count: 1 6. Log back to instance and add another file to the mount directory 7. Create incremental backup [stack@instack ~]$ cinder backup-create backupvol --force --incremental +-----------+--------------------------------------+ | Property | Value | +-----------+--------------------------------------+ | id | 2b72b678-4f5b-4c0c-a0e5-3dcb3c487210 | | name | None | | volume_id | 1d12eb29-08bd-4457-98fe-0debf8dbcf59 | +-----------+--------------------------------------+ 8. Check on Ceph side, we can see that the "incremental" backup is 10G in size. [root@overcloud-controller-0 ~]# rbd -p backups ls volume-1d12eb29-08bd-4457-98fe-0debf8dbcf59.backup.2b72b678-4f5b-4c0c-a0e5-3dcb3c487210 volume-1d12eb29-08bd-4457-98fe-0debf8dbcf59.backup.9eed827c-4100-4b20-b520-8bab2d769521 [root@overcloud-controller-0 ~]# rbd -p backups info volume-1d12eb29-08bd-4457-98fe-0debf8dbcf59.backup.2b72b678-4f5b-4c0c-a0e5-3dcb3c487210 rbd image 'volume-1d12eb29-08bd-4457-98fe-0debf8dbcf59.backup.2b72b678-4f5b-4c0c-a0e5-3dcb3c487210': size 10240 MB in 2560 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.26e82583ef15d format: 2 features: layering, striping flags: stripe unit: 4096 kB stripe count: 1 9. none of the backup rbds has snapshot [root@overcloud-controller-0 ~]# rbd -p backups snap ls volume-1d12eb29-08bd-4457-98fe-0debf8dbcf59.backup.9eed827c-4100-4b20-b520-8bab2d769521 [root@overcloud-controller-0 ~]# rbd -p backups snap ls volume-1d12eb29-08bd-4457-98fe-0debf8dbcf59.backup.2b72b678-4f5b-4c0c-a0e5-3dcb3c487210
Tested using: python2-os-brick-2.3.1-1.el7ost.noarch Automation result: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-storage-qe-13_director-rhel-virthost-3cont_2comp_1ceph-ipv4-vxlan-qe-storage-tests/5/testReport/tempest_storage_plugin.tests.scenario.test_volume_backup/TestVolumeBackup/Second_tempest_run___test_volume_backup_increment_restore_compute_id_2ce5e55c_4085_43c1_98c6_582525334ad7_image_volume_/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086
@"jbiao"<jbiao>; hi, I also meet the same question,so how to resolv it,ths!