Description of problem: Glance and nova uses ceph backend. The nova vm booted from the glance image that makes use of copy-on-write clone, does not show the vm image created as a clone of the parent image. Version-Release number of selected component (if applicable): ceph-0.94.1-11.el7cp.x86_64 RHOSP 6 (juno) How reproducible: Steps to Reproduce: 1. Create a glance image on rbd pool 2. Boot a nova instance with the glance image created. The instance image is in a different rbd pool. 3. Check the info on the ephemeral disk image. Actual results: The image does not show as a clone and does not list the glance image as a parent image. Expected results: The nova disk image should show as a clone. I do see from "ceph df" that it occupies very small space when compared to the parent image.
This should be a clone (it should show a parent in 'rbd info'), but there are several ways configuration can stop that from happening. Can you attach /etc/glance/glance-api.conf and /etc/nova/nova.conf? Other things to check: 1) the image must be raw according to glance 2) the capabilities for the rados user must be able to access both pools - what are the caps according to 'ceph auth list'?
Have attached glance and nova.conf. Here is a snip of both: glance-api-conf: <snip> [DEFAULT] show_image_direct_url = True default_store = rbd [glance_store] stores=glance.store.rbd.Store, rbd_store_user = glance rbd_store_pool = images rbd_store_chunk_size = 8 rbd_store_ceph_conf = /etc/ceph/ceph.conf </snip> nova.conf: <snip> [libvirt] images_type = rbd images_rbd_pool = vms images_rbd_ceph_conf = /etc/ceph/ceph.conf rbd_user = cinder rbd_secret_uuid = fb0040d4-3b4a-48dd-9982-eb7dde3ada3a inject_password = false inject_key = false inject_partition = -2 </snip> Info on the glance image created with disk-format raw: # glance image-show cirros-raw1 +------------------+--------------------------------------+ | Property | Value | +------------------+--------------------------------------+ | checksum | 133eae9fb1c98f45894a4e60d8736619 | | container_format | ovf | | created_at | 2015-05-27T06:15:36 | | deleted | False | | disk_format | raw | | id | 74e7d399-c4d1-40fe-8d89-168aac5b920f | | is_public | True | | min_disk | 0 | | min_ram | 0 | | name | cirros-raw1 | | owner | 21c276f413af40459eea2448b45021a5 | | protected | False | | size | 13200896 | | status | active | | updated_at | 2015-05-27T06:15:41 | +------------------+--------------------------------------+ Booted a vm with flavor option-2 (20G) # rbd ls -l images 74e7d399-c4d1-40fe-8d89-168aac5b920f 12891k 2 74e7d399-c4d1-40fe-8d89-168aac5b920f@snap 12891k 2 yes # rbd ls -l vms NAME SIZE PARENT FMT PROT LOCK 5a7a2067-a50d-4e35-b967-36948dd4e53c_disk 20480M 2 # rbd info vms/5a7a2067-a50d-4e35-b967-36948dd4e53c_disk rbd image '5a7a2067-a50d-4e35-b967-36948dd4e53c_disk': size 20480 MB in 5120 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.1ddf2ae8944a format: 2 features: layering flags: Auth list for openstack clients: client.cinder key: AQDKUWRVX4NSBhAAF38BQwusQzOqConNJ9oPxw== caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rx pool=images client.cinder-backup key: AQDfUWRVHCwIHRAAfjW7tcjrcbRAhoJv5BToqA== caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=backups client.glance key: AQDUUWRVWJUXMBAAZTmLR5VQ8crpAMIKVQD/rQ== caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=images Cinder client does have access to both img and vm pools. Am I missing something here?
Found the cause. Seems like the downstream version of juno, OSP6 (openstack-glance-2014.2.3-1.el7ost.noarch) still uses glance api v1 as default instead of v2. Changed the setting in glance-api.conf from enable_v2_api=False to enable_v2_api=True. # rbd info vms/8ce11cac-3e6d-455c-a3af-b99034d99212_disk rbd image '8ce11cac-3e6d-455c-a3af-b99034d99212_disk': size 20480 MB in 2560 objects order 23 (8192 kB objects) block_name_prefix: rbd_data.3a73ca5b3d1 format: 2 features: layering flags: parent: images/356f0716-6bcc-4080-8356-174dc2f4e02b@snap overlap: 12891 kB
I think we have to document that we need to uncomment the line "enable_v2_api=True" in /etc/glance/glance-api.conf and restart glance-api service.. Otherwise it uses v1 in which case openstack does not make use of ceph's COW clone feature while booting nova instances. Currently glance-api.conf looks like this: <snip> # Allow access to version 1 of glance api #enable_v1_api=True # Allow access to version 2 of glance api #enable_v2_api=True </snip> We need to change it to: <snip> # Allow access to version 1 of glance api #enable_v1_api=True # Allow access to version 2 of glance api enable_v2_api=True </snip> I don't see this documented in http://ceph.com/docs/master/rbd/rbd-openstack/ glance section.
Hi Josh, Can you confirm contents listed in Comments 5 (by shilp) should be documented in CEPH release notes ? Thanks, Kiran raje urs J
That should be documented in the OpenStack+Ceph setup, not the ceph release notes. Ideally we should change the default in OSP.
Thanks Josh. I'll clone this bug to the openstack-glance component so this will get on to the OSP developers' radar.
With bz 1226454, This is OpenStack's issue to document this point. Monti, do we have a "FAQ" place where this can be documented? It would be nice to doc this on the Ceph side too.
Per Monday manager's review meeting, not a 1.3.0 release blocker. Pushing to 1.3.1 or async update as priority is determined.
Since this issue is being tracked in Bug #1227519, can we close it?
Per Deon, no need to doc this on Ceph side. Closing bug.