Bug 1239260 - Unable to create a snapshot image from instance when glance and nova are ceph backends
Summary: Unable to create a snapshot image from instance when glance and nova are ceph...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 5.0 (RHEL 7)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 5.0 (RHEL 7)
Assignee: RHOS Documentation Team
QA Contact: RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-05 05:38 UTC by bkopilov
Modified: 2019-10-10 09:55 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-31 05:16:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nova logs (244.22 KB, text/plain)
2015-07-05 05:51 UTC, bkopilov
no flags Details
logs from /var/log (358.68 KB, application/x-gzip)
2015-07-05 05:56 UTC, bkopilov
no flags Details
cinder_conf (19.88 KB, application/x-gzip)
2015-07-06 07:54 UTC, bkopilov
no flags Details
nova_conf (33.57 KB, application/x-gzip)
2015-07-06 07:57 UTC, bkopilov
no flags Details
glance_conf (10.63 KB, application/x-gzip)
2015-07-06 09:24 UTC, bkopilov
no flags Details

Description bkopilov 2015-07-05 05:38:26 UTC
Description of problem:
Hi , 
RHEL 7.1 with rhos5 .  
Cinder , nova and glance use ceph backend.
nova and glance debug are enabled.
When running a tempest code , there are failures.
tempest.api.compute.images.ImagesOneServerTestJSON.test_create_delete_image

The root casue for the failure is a glance snapshot from instance.
I repeat the same test with rhos6 and did not manage to reproduce it ( on the same ceph backedn with the same pools).

From /var/log/glance/api.conf :

2015-07-05 01:08:31.316 15912 DEBUG routes.middleware [bb70fe65-cc3b-42e1-adb5-ed66da48325d e4fd8e37490b415382674920c655f9a5 988df51c8c6e4d99b3a47672c6155c19 - - -] Matched PUT /images/0b3df65a-a72c-4a95-abde-9b961dc67b45 __call__ /usr/lib/python2.7/site-packages/routes/middleware.py:100
2015-07-05 01:08:31.317 15912 DEBUG routes.middleware [bb70fe65-cc3b-42e1-adb5-ed66da48325d e4fd8e37490b415382674920c655f9a5 988df51c8c6e4d99b3a47672c6155c19 - - -] Route path: '/images/{id}', defaults: {'action': u'update', 'controller': <glance.common.wsgi.Resource object at 0x2bb7650>} __call__ /usr/lib/python2.7/site-packages/routes/middleware.py:102
2015-07-05 01:08:31.317 15912 DEBUG routes.middleware [bb70fe65-cc3b-42e1-adb5-ed66da48325d e4fd8e37490b415382674920c655f9a5 988df51c8c6e4d99b3a47672c6155c19 - - -] Match dict: {'action': u'update', 'controller': <glance.common.wsgi.Resource object at 0x2bb7650>, 'id': u'0b3df65a-a72c-4a95-abde-9b961dc67b45'} __call__ /usr/lib/python2.7/site-packages/routes/middleware.py:103
2015-07-05 01:08:31.319 15912 DEBUG glance.common.client [bb70fe65-cc3b-42e1-adb5-ed66da48325d e4fd8e37490b415382674920c655f9a5 988df51c8c6e4d99b3a47672c6155c19 - - -] Constructed URL: http://0.0.0.0:9191/images/0b3df65a-a72c-4a95-abde-9b961dc67b45 _construct_url /usr/lib/python2.7/site-packages/glance/common/client.py:411
2015-07-05 01:08:31.355 15912 DEBUG glance.registry.client.v1.client [bb70fe65-cc3b-42e1-adb5-ed66da48325d e4fd8e37490b415382674920c655f9a5 988df51c8c6e4d99b3a47672c6155c19 - - -] Registry request GET /images/0b3df65a-a72c-4a95-abde-9b961dc67b45 HTTP 200 request id req-a44a2612-73e8-48e6-b271-8773deea2544 do_request /usr/lib/python2.7/site-packages/glance/registry/client/v1/client.py:114
2015-07-05 01:08:31.357 15912 DEBUG glance.common.client [bb70fe65-cc3b-42e1-adb5-ed66da48325d e4fd8e37490b415382674920c655f9a5 988df51c8c6e4d99b3a47672c6155c19 - - -] Constructed URL: http://0.0.0.0:9191/images/0b3df65a-a72c-4a95-abde-9b961dc67b45/members _construct_url /usr/lib/python2.7/site-packages/glance/common/client.py:411
2015-07-05 01:08:31.394 15912 DEBUG glance.registry.client.v1.client [bb70fe65-cc3b-42e1-adb5-ed66da48325d e4fd8e37490b415382674920c655f9a5 988df51c8c6e4d99b3a47672c6155c19 - - -] Registry request GET /images/0b3df65a-a72c-4a95-abde-9b961dc67b45/members HTTP 200 request id req-61464e6e-3ed0-4fdd-bb77-a7ee51c57428 do_request /usr/lib/python2.7/site-packages/glance/registry/client/v1/client.py:114
2015-07-05 01:08:31.395 15912 DEBUG glance.store [bb70fe65-cc3b-42e1-adb5-ed66da48325d e4fd8e37490b415382674920c655f9a5 988df51c8c6e4d99b3a47672c6155c19 - - -] Skipping store.set_acls... not implemented. set_acls /usr/lib/python2.7/site-packages/glance/store/__init__.py:422
2015-07-05 01:08:31.456 15912 INFO glance.wsgi.server [bb70fe65-cc3b-42e1-adb5-ed66da48325d e4fd8e37490b415382674920c655f9a5 988df51c8c6e4d99b3a47672c6155c19 - - -] Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 389, in handle_one_response
    result = self.application(self.environ, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 130, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 195, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/glance/common/wsgi.py", line 378, in __call__
    response = req.get_response(self.application)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1296, in send
    application, catch_exc_info=False)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1260, in call_application
    app_iter = application(self.environ, start_response)
  File "/usr/lib/python2.7/site-packages/keystoneclient/middleware/auth_token.py", line 684, in __call__
    return self.app(env, start_response)
 File "/usr/lib/python2.7/site-packages/webob/dec.py", line 130, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 195, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/glance/common/wsgi.py", line 378, in __call__
    response = req.get_response(self.application)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1296, in send
    application, catch_exc_info=False)
  File "/usr/lib/python2.7/site-packages/webob/request.py", line 1260, in call_application
    app_iter = application(self.environ, start_response)
  File "/usr/lib/python2.7/site-packages/paste/urlmap.py", line 203, in __call__
    return app(environ, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/site-packages/routes/middleware.py", line 131, in __call__
    response = self.app(environ, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 130, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/webob/dec.py", line 195, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/glance/common/wsgi.py", line 644, in __call__
    request, **action_args)
  File "/usr/lib/python2.7/site-packages/glance/common/wsgi.py", line 668, in dispatch
    return method(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/glance/common/utils.py", line 436, in wrapped
    return func(self, req, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/glance/api/v1/images.py", line 911, in update
    location)
  File "/usr/lib/python2.7/site-packages/glance/api/v1/images.py", line 672, in _get_size
    get_size_from_backend(context, location))
  File "/usr/lib/python2.7/site-packages/glance/store/__init__.py", line 290, in get_size_from_backend
    return store.get_size(loc)
  File "/usr/lib/python2.7/site-packages/glance/store/rbd.py", line 239, in get_size
    snapshot=loc.snapshot) as image:
  File "/usr/lib/python2.7/site-packages/rbd.py", line 371, in __init__
    raise make_ex(ret, 'error opening image %s at snapshot %s' % (name, snapshot))
PermissionError: error opening image 771fb9a5-429e-4f26-902b-898a0546ea98_disk_clone_c1a6df5350104f2aa46d1254cded857b at snapshot snap
2015-07-05 01:08:31.457 15912 INFO glance.wsgi.server [bb70fe65-cc3b-42e1-adb5-ed66da48325d e4fd8e37490b415382674920c655f9a5 988df51c8c6e4d99b3a47672c6155c19 - - -] 10.35.186.222 - - [05/Jul/2015 01:08:31] "PUT /v1/images/0b3df65a-a72c-4a95-abde-9b961dc67b45 HTTP/1.1" 500 139 0.147813
2015-07-05 01:08:51.765 15911 DEBUG glance.api.middleware.version_negotiation [-] Determining version of request: PUT /v1/images/0b3df65a-a72c-4a95-abde-9b961dc67b45 Accept:  process_request /usr/lib/python2.7/site-packages/glance/api/middleware/version_negotiation.py:44



#### Traceback from NOVA :

 log_http_response /usr/lib/python2.7/site-packages/glanceclient/common/http.py:167
2015-07-05 01:08:31.460 15653 DEBUG glanceclient.common.http [-] Request returned failure status: 500 _http_request /usr/lib/python2.7/site-packages/glanceclient/common/http.py:263
2015-07-05 01:08:31.460 15653 DEBUG nova.virt.libvirt.driver [req-e5aaeb51-405c-481d-89ff-44bf0992b573 e4fd8e37490b415382674920c655f9a5 988df51c8c6e4d99b3a47672c6155c19] failed to update glance snapshot /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:1618
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver Traceback (most recent call last):
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1616, in snapshot
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver     metadata)
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver   File "/usr/lib/python2.7/site-packages/nova/image/glance.py", line 384, in update
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver     _reraise_translated_image_exception(image_id)
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver   File "/usr/lib/python2.7/site-packages/nova/image/glance.py", line 382, in update
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver     image_id, **image_meta)
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver   File "/usr/lib/python2.7/site-packages/nova/image/glance.py", line 216, in call
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver     return getattr(client.images, method)(*args, **kwargs)
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver   File "/usr/lib/python2.7/site-packages/glanceclient/v1/images.py", line 331, in update
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver     'PUT', url, headers=hdrs, body=image_data)
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver   File "/usr/lib/python2.7/site-packages/glanceclient/common/http.py", line 328, in raw_request
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver     return self._http_request(url, method, **kwargs)
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver   File "/usr/lib/python2.7/site-packages/glanceclient/common/http.py", line 264, in _http_request
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver     raise exc.from_response(resp, body_str)
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver HTTPInternalServerError: HTTPInternalServerError (HTTP 500)
2015-07-05 01:08:31.460 15653 TRACE nova.virt.libvirt.driver
2015-07-05 01:08:31.462 15653 DEBUG nova.virt.libvirt.driver [req-e5aaeb51-405c-481d-89ff-44bf0992b573 e4fd8e37490b415382674920c655f9a5 988df51c8c6e4d99b3a47672c6155c19] direct snapshot failed snapshot /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:1639
2015-07-05 01:08:31.462 15653 TRACE nova.virt.libvirt.driver Traceback (most recent call last):
2015-07-05 01:08:31.462 15653 TRACE nova.virt.libvirt.driver   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1621, in snapshot
2015-07-05 01:08:31.462 15653 TRACE nova.virt.libvirt.driver     reason=reason)
2015-07-05 01:08:31.462 15653 TRACE nova.virt.libvirt.driver ImageUnacceptable: Image 8a156957-19f2-4175-a2a4-cb01c3ca320e is unacceptable: failed to update glance: HTTPInternalServerError (HTTP 500)
2015-07-05 01:08:31.462 15653 TRACE nova.virt.libvirt.driver
2015-07-05 01:08:31.463 15653 DEBUG nova.openstack.common.processutils [-] Running cmd (subprocess): qemu-img convert -O raw rbd:automation-cinder/771fb9a5-429e-4f26-902b-898a0546ea98_disk:id=automation-cinder:conf=/etc/ceph/ceph.conf /var/lib/nova/instances/snapshots/tmp4ljmoV/c1a6df5350104f2aa46d1254cded857b execute /usr/lib/python2.7/site-packages/nova/openstack/common/processutils.py:154
2015-07-05 01:08:32.188 15653 DEBUG nova.virt.driver [-] Emitting event <LifecycleEvent: 1436072912.19, eab09c24-ab68-41af-bf13-64dbdae49f99 => Started> emit_event /usr/lib/python2.7/site-packages/nova/virt/driver.py:1230
2015-07-05 01:08:32.189 15653 INFO nova.compute.manager [-] [instance: eab09c24-ab68-41af-bf13-64dbdae49f99] VM Started (Lifecycle Event)


Version-Release number of selected component (if applicable):


How reproducible:
#1 create an instance from image
#2 try to create a snapshot image from above instance.

Steps to Reproduce:
1.
2.
3.

Actual results:
Unable to create a snapshot image from instance.

Expected results:
Manage to create a snapshot image from instance.


Additional info:
Attaching to the bug logs with debug enabled from nova and glance.

Benny

Comment 3 bkopilov 2015-07-05 05:51:41 UTC
Created attachment 1046126 [details]
nova logs

Comment 4 bkopilov 2015-07-05 05:56:13 UTC
Created attachment 1046127 [details]
logs from /var/log

Comment 6 Flavio Percoco 2015-07-06 07:43:20 UTC
This doesn't seem to be a Glance issue but rather a Nova one. The image exists in Glance and its status is a valid one. The snapshot also exists in Ceph but Glance fails to access it because there are wrong permissions.

Could you please upload your glance/nova/cinder configs?

Josh, what else can be asked from the ceph side to determine what the underlying permission issues are?

Comment 7 bkopilov 2015-07-06 07:54:55 UTC
Created attachment 1048691 [details]
cinder_conf

Comment 8 bkopilov 2015-07-06 07:57:29 UTC
Created attachment 1048692 [details]
nova_conf

Comment 9 Flavio Percoco 2015-07-06 09:11:13 UTC
Sorry, could you also upload Glance's configs and whatever you have for ceph ?

Comment 10 bkopilov 2015-07-06 09:24:17 UTC
Created attachment 1048735 [details]
glance_conf

Comment 12 Josh Durgin 2015-07-06 16:39:02 UTC
'ceph auth list' will show all the ceph permissions. Most likely you'll need to add 'allow rwx pool=vms' to client.glance. You can reset the permissions via the 'ceph auth caps' command: http://ceph.com/docs/master/rados/operations/user-management/#modify-user-capabilities

Comment 13 bkopilov 2015-07-06 16:46:07 UTC
Hi , 
Here are my glance pool configuration:

[client.automation-glance]
	key = AQCvqFhVNLwWHRAAWi57BSCmVePk0Kyv/Yoybg==
	caps mon = "allow r"
	caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=automation-glance"


The strange issue here is that same pool name works on rhos6 but fails with rhos5.

Benny

Comment 14 bkopilov 2015-07-06 16:46:34 UTC
Hi , 
Here are my glance pool configuration:

[client.automation-glance]
	key = AQCvqFhVNLwWHRAAWi57BSCmVePk0Kyv/Yoybg==
	caps mon = "allow r"
	caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=automation-glance"


The strange issue here is that same pool name works on rhos6 but fails with rhos5.

Benny

Comment 15 Josh Durgin 2015-07-06 16:55:01 UTC
Yeah, you'll need to append ", allow rwx pool=automation-nova" or whatever pool nova is using to those osd caps.

This is due to a feature backported first to 5 and later to 6, which may be why you didn't see this issue with 6.

Comment 16 bkopilov 2015-07-06 17:13:02 UTC
Nova uses cinder pool too :

[client.automation-cinder]
	key = AQCuqFhVVLlcJxAAcl5+NQa3EiMoXVL67vB35g==
	caps mon = "allow r"
	caps osd = "allow class-read object_prefix rbd_children,allow rwx pool=automation-cinder, allo
w rwx pool=automation-nova , allow rx pool=automation-glance"

Comment 17 bkopilov 2015-07-06 17:13:14 UTC
Nova uses cinder pool too :

[client.automation-cinder]
	key = AQCuqFhVVLlcJxAAcl5+NQa3EiMoXVL67vB35g==
	caps mon = "allow r"
	caps osd = "allow class-read object_prefix rbd_children,allow rwx pool=automation-cinder, allo
w rwx pool=automation-nova , allow rx pool=automation-glance"

Comment 18 Josh Durgin 2015-07-07 01:22:35 UTC
Nova snapshots (that may be cleaned up by glance) are just in nova's pool, not cinder's. Is it still not working?

Comment 20 Josh Durgin 2015-07-07 06:50:15 UTC
The patch that requires this is for efficient nova snapshots on rbd:
https://bugzilla.redhat.com/show_bug.cgi?id=1176818

It should be documented that glance needs rwx access to the pool nova uses for it to clean up after the snapshots.

Comment 21 Flavio Percoco 2015-07-07 10:54:44 UTC
Thanks, Josh.

Comment 23 Federico Lucifredi 2016-08-16 21:42:44 UTC
CLOSED/CURRENTRELEASE as this is the doc clone of an issue fixed with OSP6. 

No need to doc this.

Comment 24 Federico Lucifredi 2016-08-16 21:44:02 UTC
It should be documented that glance needs rwx access to the pool nova uses for it to clean up after the snapshots.

Comment 25 Flavio Percoco 2016-08-20 20:04:31 UTC
Agreed this must be documented and it's not the first time this issue has been hit because of the missing permissions. FWIW, c#24 sums it up.

Comment 26 Tim Wilkinson 2016-10-21 20:48:05 UTC
I couldn't find this in any beta-10 docs. osp9 docs do say to allow rwx for vms pool for client.openstack and the glance-api.conf file specifies rbd_store_user = openstack. I see that client.openstack does have the correct permissions but I still can not snapshot to ceph storage. 


(root@c04-h33-6018r) - (20:29) - (~)
-=>>ceph auth get client.openstack
exported keyring for client.openstack
[client.openstack]
	key = AQCI2wRY5K13KRAAecXoDV75tdjrLtXH40cBag==
	caps mon = "allow r"
	caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=images"


I saw that client.glance did not have permissions to the vms pool so I added it using ceph auth caps. 


(root@c04-h33-6018r) - (20:27) - (~)
-=>>ceph auth get client.glance
exported keyring for client.glance
[client.glance]
	key = AQABmwdYH487NRAAqIi+t4uEni4uLz1AVLxpGg==
	caps mon = "allow r"
	caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=vms"


I've restarted all openstack services but can not snapshot ...


root@overcloud-controller-0:~
# nova image-create --show --poll a0260de6-6c5a-478d-8d65-92021b5611df rhel7.2_snap1

Server snapshotting... 0% completeERROR (NotFound): Not found (HTTP 404)


Am I missing something? This is blocking time critical testing of RHCS 2.0 in OSP10 in the scale lab.

Comment 27 Ben England 2016-10-23 22:15:49 UTC
cc'ing perf team members.  Tim's config. is an OSP-external Ceph cluster, not created by OSPd.  

OSP version here:

http://pastebin.test.redhat.com/423539

Comment 28 Tim Wilkinson 2016-10-24 14:30:11 UTC
Since my issue is with RHOSP-10, is more a functional problem , and does not appear to be documentation, I opened bz 1388119 for my specific problem.

Comment 29 Lucy Bopf 2016-11-07 06:43:39 UTC
Moving to 'NEW' to be triaged as resources allow.

Comment 30 Lucy Bopf 2017-08-31 05:16:44 UTC
Red Hat Enterprise Linux OpenStack Platform 5 is no longer supported. Per the support policy, we are no longer updating the documentation for this version.

If this bug applies to a supported version of RHOSP, please raise a new bug against that version.

I am closing this bug.


Note You need to log in before you can comment on or make changes to this bug.