Description of problem: Glance bloats sparse raw images to full size instead of keeping them sparse when uploading into Ceph Related upstream bug: http://tracker.ceph.com/issues/17178 Additional info: When I upload a sparse raw image into ceph, the whole apparent size seems to be used once the image is uploaded. A 470MB qcow2 image converted to a sparse raw image thus takes up 10GB in ceph instead of 1.1GB. Why doesn't the image only consume 1.1GB in Ceph? E.g.: ~~~ /usr/bin/qemu-img convert -f qcow2 -O raw rhel-guest-image-7.2-20160302.0.x86_64.qcow2 rhel-guest-image-7.2-20160302.0.x86_64.raw ~~~ Apparent size ~~~ [stack@undercloud-1 ~]$ ls -alh | grep rhel -rw-rw-r--. 1 stack stack 470M Mar 4 08:44 rhel-guest-image-7.2-20160302.0.x86_64.qcow2 -rw-r--r--. 1 stack stack 10G Aug 24 00:01 rhel-guest-image-7.2-20160302.0.x86_64.raw ~~~ Disk usage of sparse file ~~~ [stack@undercloud-1 ~]$ du -sh * | grep rhel 470M rhel-guest-image-7.2-20160302.0.x86_64.qcow2 1.1G rhel-guest-image-7.2-20160302.0.x86_64.raw ~~~ ~~~ [stack@undercloud-1 ~]$ glance image-create --progress --is-public true --file rhel-guest-image-7.2-20160302.0.x86_64.raw --name rhel7 --container-format bare --disk-format raw [=============================>] 100% +------------------+--------------------------------------+ | Property | Value | +------------------+--------------------------------------+ | checksum | 62f33fb78ed23539b0871d1ab8c86725 | | container_format | bare | | created_at | 2016-08-24T19:59:10.000000 | | deleted | False | | deleted_at | None | | disk_format | raw | | id | e6c87b66-bb5f-42c7-a322-3a202b34ca75 | | is_public | True | | min_disk | 0 | | min_ram | 0 | | name | rhel7 | | owner | 5d2edea478ec4bdba74b6bd5e108fe3d | | protected | False | | size | 10737418240 | | status | active | | updated_at | 2016-08-24T20:05:19.000000 | | virtual_size | None | +------------------+--------------------------------------+ ~~~ Original object count without any images in pool vs object count after upload of image ~~~ [root@overcloud-cephstorage-0 data]# rados df pool name KB objects clones degraded unfound rd rd KB wr wr KB images 0 1 0 0 0 2425 4988 12250 33163297 rbd 0 0 0 0 0 0 0 0 0 vms 0 0 0 0 0 0 0 0 0 volumes 0 0 0 0 0 0 0 0 0 total used 13409140 1 total avail 338196524 total space 351605664 [root@overcloud-cephstorage-0 data]# rados df pool name KB objects clones degraded unfound rd rd KB wr wr KB images 10485761 1283 0 0 0 2481 5031 14819 43649058 rbd 0 0 0 0 0 0 0 0 0 vms 0 0 0 0 0 0 0 0 0 volumes 0 0 0 0 0 0 0 0 0 total used 46906812 1283 total avail 304698852 total space 351605664 ~~~ And ~~~ [root@overcloud-cephstorage-0 ~]# rbd info --image 718c6f4c-e589-4b30-abf3-fbffd4090040 -p images rbd image '718c6f4c-e589-4b30-abf3-fbffd4090040': size 10240 MB in 1280 objects order 23 (8192 kB objects) block_name_prefix: rbd_data.11865b8de499 format: 2 features: layering, striping flags: stripe unit: 8192 kB stripe count: 1 ~~~ Prior to upload vs after upload ~~~ # before [root@overcloud-cephstorage-0 data]# du -sh . 1.4G . # after [root@overcloud-cephstorage-0 data]# du -sh . 12G . ~~~ Now, I _can_ trim the size of the file system _after_ that: ### How to trim size of images ### Explanation about raw / sparse and How to reduce size with fstrim https://yaple.net/2016/03/21/glance-ceph-and-raw-images/ Show snapshots ~~~ [root@overcloud-cephstorage-0 data]# rbd -p images snap ls f7dc8646-0357-404a-9ca3-a05dc93bffb2 SNAPID NAME SIZE 8 snap 10240 MB ~~~ Remove snapshot ~~~ [root@overcloud-cephstorage-0 data]# rbd -p images snap unprotect f7dc8646-0357-404a-9ca3-a05dc93bffb2@snap [root@overcloud-cephstorage-0 data]# rbd -p images snap rm f7dc8646-0357-404a-9ca3-a05dc93bffb2@snap ~~~ Mount ~~~ [root@overcloud-cephstorage-0 data]# rbd -p images map f7dc8646-0357-404a-9ca3-a05dc93bffb2 [root@overcloud-cephstorage-0 data]# mount /dev/rbd0p1 /mnt [root@overcloud-cephstorage-0 data]# df -h /mnt Filesystem Size Used Avail Use% Mounted on /dev/rbd0p1 6.0G 910M 5.2G 15% /mnt ~~~ Run fstrim and umount ~~~ [root@overcloud-cephstorage-0 data]# fstrim /mnt [root@overcloud-cephstorage-0 data]# umount /mnt ~~~ Show new object count and size on disk ~~~ [root@overcloud-cephstorage-0 data]# rados -p images ls | grep rbd_data | wc -l 635 [root@overcloud-cephstorage-0 data]# rados df pool name KB objects clones degraded unfound rd rd KB wr wr KB images 5167353 638 0 0 0 2229 4836 11606 33163297 rbd 0 0 0 0 0 0 0 0 0 vms 0 0 0 0 0 0 0 0 0 volumes 0 0 0 0 0 0 0 0 0 total used 28917336 638 total avail 322688328 total space 351605664 [root@overcloud-cephstorage-0 data]# du -sh . 6.0G . ~~~ Recreate and protect snapshot ~~~ [root@overcloud-cephstorage-0 data]# rbd -p images snap create f7dc8646-0357-404a-9ca3-a05dc93bffb2@snap [root@overcloud-cephstorage-0 data]# rbd -p images snap protect f7dc8646-0357-404a-9ca3-a05dc93bffb2@snap ~~~ Comparison of Ceph direct upload vs glance this directly with RBD - image-format 1 takes up only 1.5G, the same as image-format 2 `image-format 1` ~~~ [root@overcloud-controller-0 heat-admin]# rbd --image-format 1 import rhel-guest-image-7.2-20160302.0.x86_64.raw images/test ~~~ ~~~ # before upload [root@overcloud-cephstorage-0 ~]# du -sh /srv/ 1.1G /srv/ # after upload [root@overcloud-cephstorage-0 ~]# du -sh /srv/ 2.6G /srv/ ~~~ `image-format 2` ~~~ [root@overcloud-controller-0 heat-admin]# rbd --image-format 2 import rhel-guest-image-7.2-20160302.0.x86_64.raw images/test ~~~ ~~~ # before [root@overcloud-cephstorage-0 ~]# du -sh /srv/ 1.1G /srv/ # after [root@overcloud-cephstorage-0 ~]# du -sh /srv/ 2.6G /srv/ ~~~ Now compare that to an upload with glance ~~~ [stack@undercloud-1 ~]$ glance image-create --progress --is-public true --file rhel-guest-image-7.2-20160302.0.x86_64.raw --name rhel7 --container-format bare --disk-format raw [=============================>] 100% +------------------+--------------------------------------+ | Property | Value | +------------------+--------------------------------------+ | checksum | 62f33fb78ed23539b0871d1ab8c86725 | | container_format | bare | | created_at | 2016-08-25T21:20:14.000000 | | deleted | False | | deleted_at | None | | disk_format | raw | | id | d267fee0-5ddf-4ad0-930b-5411eed53591 | | is_public | True | | min_disk | 0 | | min_ram | 0 | | name | rhel7 | | owner | 5d2edea478ec4bdba74b6bd5e108fe3d | | protected | False | | size | 10737418240 | | status | active | | updated_at | 2016-08-25T21:26:50.000000 | | virtual_size | None | +------------------+--------------------------------------+ ~~~ ~~~ [root@overcloud-cephstorage-0 ~]# du -sh /srv/ 12G /srv/ ~~~ An additional test/suggestion was to use task-create: tried the last suggestion (https://www.sebastien-han.fr/blog/2015/05/11/openstack-glance-a-first-glimpse-at-image-conversion/), but I ran into a bug for which I created BZ https://bugzilla.redhat.com/show_bug.cgi?id=1373571 Although the bug seems to be minor and the image actually gets converted, it doesn't save any space (the image takes the full 10GB after conversion, sparseness is not preserved) ~~~ [stack@undercloud-1 ~]$ ls -alh /var/www/html total 470M drwxr-xr-x. 2 root root 57 Sep 6 08:30 . drwxr-xr-x. 4 root root 31 Aug 18 13:24 .. -rw-r--r--. 1 root root 470M Sep 6 08:30 rhel-guest-image-7.2-20160302.0.x86_64.qcow2 ~~~ ~~~ [root@overcloud-cephstorage-0 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 112G 16G 97G 14% / devtmpfs 32G 0 32G 0% /dev tmpfs 32G 0 32G 0% /dev/shm tmpfs 32G 812K 32G 1% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup tmpfs 6.3G 0 6.3G 0% /run/user/1000 [root@overcloud-cephstorage-0 ~]# rbd info images/f746cecf-7f89-4ba0-870a-6b0f2e961651 rbd image 'f746cecf-7f89-4ba0-870a-6b0f2e961651': size 10240 MB in 1280 objects order 23 (8192 kB objects) block_name_prefix: rbd_data.faa71b9e64e9 format: 2 features: layering, striping flags: stripe unit: 8192 kB stripe count: 1 ~~~ ~~~ [stack@undercloud-1 ~]$ glance --os-image-api-version 1 image-list +--------------------------------------+--------------------------+-------------+------------------+-------------+--------+ | ID | Name | Disk Format | Container Format | Size | Status | +--------------------------------------+--------------------------+-------------+------------------+-------------+--------+ | f746cecf-7f89-4ba0-870a-6b0f2e961651 | rhel-guest-image-7.2-RAW | raw | bare | 10737418240 | active | +--------------------------------------+--------------------------+-------------+------------------+-------------+--------+ ~~~ According to Jason Dillaman: "looking at the glance code, it looks like the conversion routine creates the raw image as a local (sparse) file and then uses the same RBD glance_store routine to create the rbd image (thus creating a non-sparse image). It doesn't have the smarts to use qemu-img to directly write the image into the cluster. As a result, it only saves the user the bandwidth with uploading a sparse image to the glance backend -- and doesn't save space within the Ceph cluster. Short term solution is to fix the rbd glance_store implementation [1]. http://tracker.ceph.com/issues/17178 "
f a task flow isn't being utilized to automatically convert qcow2 images to raw images on the glance backend, the user must upload raw images to RBD-back glance pools. It would be nice if the RBD glance store would support skipping fully zeroed blocks when importing the image to the RBD pool.
There's no conversion happening on the Glance side. The work on automatic image conversion has been put on hold upstream. As far as the RBD store goes, I guess it could be improved but I'm not very familiar with it or the improvement proposed here. I'd recommend writing a spec and bringing it upstream.
Indeed, you could probably copy/paste that on the glance_store upstream bug tracker, and see what people think about this. This looks like a huge feature.
Closing upstream accourdingly with http://tracker.ceph.com/issues/17178 Sean