Bug 1485403
| Summary: | Rbd chunks still written if glance image_size_cap is hit | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | stphilli | ||||||
| Component: | python-glance-store | Assignee: | Cyril Roelandt <cyril> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Mike Abrams <mabrams> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 8.0 (Liberty) | CC: | apevec, cyril, ealcaniz, eglynn, fpercoco, hklein, jjoyce, jobernar, lhh, maugarci, mlopes, mschuppe, pablo.iranzo, pdeore, pgrist, scohen, shan, sputhenp, srevivo, tshefi | ||||||
| Target Milestone: | zstream | Keywords: | Triaged, ZStream | ||||||
| Target Release: | 8.0 (Liberty) | Flags: | tshefi:
automate_bug-
|
||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | python-glance-store-0.9.1-4.el7ost | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-10-25 17:06:46 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
stphilli
2017-08-25 15:16:12 UTC
Issue repeat date;openstack image create test_082517_1519_max_upload --file /etc/glance/fake_image2G.img;date Fri Aug 25 15:19:32 CEST 2017 413 Request Entity Too Large: Image exceeds the storage quota: The size of the data None will exceed the limit. None bytes remaining. (HTTP 413) Fri Aug 25 15:19:56 CEST 2017 Created attachment 1319039 [details]
controller issue
There is no quota configured [ealcaniz@ealcaniz glance]$ grep quota glance-api.conf #image_member_quota=128 #image_property_quota=128 #image_tag_quota=128 #image_location_quota=10 # Set a system wide quota for every user. This value is the total user_storage_quota=0 There are no quotas on the pool: # ceph osd pool get-quota images quotas for pool 'images': max objects: N/A max bytes : N/A Yes the log included is part of the attempt. The glance only complains about the quota but it writes the image in the ceph pool instead of rollback or don't write. (In reply to Edu Alcaniz from comment #7) > Yes the log included is part of the attempt. The glance only complains about > the quota but it writes the image in the ceph pool instead of rollback or > don't write. Here, Glance has to start writing the image, because it does not know its size (you can see in the error message that the image size is 'None'). At some point, it has written so much data that it has to rollback, as you wrote. The RBD driver in glance_store tries to remove the image from the Ceph cluster, but for some reason, this fails. Could we check the Ceph logs and see what happened there? I wonder whether the deletion request never reached Ceph, or whether it reached it but could not complete. I might be mistaken, but I can't see any actual Ceph logs in there. This is what I'd really like to read :( Hi, What log do you need ? I assume Ceph has logs (probably in /var/log/ceph/). They may not be enabled by default, though. I'd like to see whether Ceph logs a "please delete this image" request, sent from Glance, and whether it complains about anything. Attaching logs files from ceph nodes but we didn't see any notes of Ceph actions. I am really not a Ceph expert, but isn't Ceph supposed to log this kind of things? Like image creation, deletion, etc. Is this bug something "new"? Has it only been happening since an upgrade? Do you witness the same issue on other setups? Sorry, it seems a new bug but there is not upgrade performed or anything else. The issue is happening in other regions too. It is not only the lab that we are replicating. After modifying RBD.py with LOG.error('TEST calling remove') on line 367 I've executed the test again. However, the glance api.log is not printing the message.
I'm attaching the glance api.log. The image used has this id: 0788c9d0-6efc-4239-a733-593ce5a4c4f2
Okay, this is helpful information and suggests the issue is likely within glance's logic and not a lower layer. Erno identified a few patches in successive versions that may address this issue and are not currently backported. OK, so I think this backport should be enough. I'd like to hear from Erno, who seems to have identified more than one patch to backport, and from Jon, who knows this driver better than I do. @Edu: in the meantime, if the user could try the patch I posted and tell us whether that fixes their issue, it would be great! (In reply to Cyril Roelandt from comment #35) > OK, so I think this backport should be enough. I'd like to hear from Erno, > who seems to have identified more than one patch to backport, and from Jon, > who knows this driver better than I do. > > @Edu: in the meantime, if the user could try the patch I posted and tell us > whether that fixes their issue, it would be great! Yes if you deliver an rpm, I could ask to try the test package. (In reply to Cyril Roelandt from comment #35) > OK, so I think this backport should be enough. I'd like to hear from Erno, > who seems to have identified more than one patch to backport, and from Jon, > who knows this driver better than I do. > > @Edu: in the meantime, if the user could try the patch I posted and tell us > whether that fixes their issue, it would be great! Hi Cyril, I've tested rpm python-glance-store-0.9.1-4.el7ost.src.rpm on customer's environment however it seems installation isn't finishing properly. The following was the execution I've done: # rpm -ivh python-glance-store-0.9.1-4.el7ost.src.rpm Updating / installing... 1:python-glance-store-0.9.1-4.el7os################################# [100%] warning: user cyril does not exist - using root warning: group cyril does not exist - using root warning: user cyril does not exist - using root warning: group cyril does not exist - using root warning: user cyril does not exist - using root warning: group cyril does not exist - using root warning: user cyril does not exist - using root warning: group cyril does not exist - using root warning: user cyril does not exist - using root warning: group cyril does not exist - using root warning: user cyril does not exist - using root warning: group cyril does not exist - using root # rpm -qa | grep python-glance-store python-glance-store-0.9.1-3.el7ost.noarch Then, I created user cyril to avoid the warning but still the package is not been installed: # rpm -ivh python-glance-store-0.9.1-4.el7ost.src.rpm Updating / installing... 1:python-glance-store-0.9.1-4.el7os################################# [100%] # rpm -qa | grep python-glance-store python-glance-store-0.9.1-3.el7ost.noarch Created attachment 1326002 [details]
Hopefully working package.
OK, don't create users/groups, this should not be required. I gave you the source package, my bad.
This patch should work :)
Hi Cyril, I tried to install the new version but the package isn't been recognized as an update. $ rpm -ivh python-glance-store-0.9.1-rhbz1485403.el7ost.noarch.rpm Preparing... ################################# [100%] package python-glance-store-0.9.1-3.el7ost.noarch (which is newer than python-glance-store-0.9.1-rhbz1485403.el7ost.noarch) is already installed file /usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.py from install of python-glance-store-0.9.1-rhbz1485403.el7ost.noarch conflicts with file from package python-glance-store-0.9.1-3.el7ost.noarch file /usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.pyc from install of python-glance-store-0.9.1-rhbz1485403.el7ost.noarch conflicts with file from package python-glance-store-0.9.1-3.el7ost.noarch file /usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.pyo from install of python-glance-store-0.9.1-rhbz1485403.el7ost.noarch conflicts with file from package python-glance-store-0.9.1-3.el7ost.noarch ... ... ... Also I tried with localinstall but I got the same result: $ yum localinstall python-glance-store-0.9.1-rhbz1485403.el7ost.noarch.rpm Loaded plugins: package_upload, product-id, search-disabled-repos, subscription-manager Examining python-glance-store-0.9.1-rhbz1485403.el7ost.noarch.rpm: python-glance-store-0.9.1-rhbz1485403.el7ost.noarch python-glance-store-0.9.1-rhbz1485403.el7ost.noarch.rpm: does not update installed package. Nothing to do I can't remove version 0.9.1-3 in this environment since both openstack-glance and python-glance are dependent packages How can I install it ? Hm, maybe it's my weird suffix that prevents it from being recognized as an upgrade. I'm not sure, since I'm really not an RPM expert. @Jon: any idea? Hi Older package is : - python-glance-store-0.9.1-3.el7ost.noarch Newer package is: - python-glance-store-0.9.1-rhbz1485403.el7ost.noarch.rpm one being -3 and other being -r makes it not to take as an update. Do force it with: rpm -ivh --force $package instead Regards, Pablo Tested package python-glance-store-0.9.1-rhbz1485403.el7ost.noarch.rpm after installed it with --force option. It's working fine. It doesn't leave garbage in Ceph cluster. Verified on:
python-glance-store-0.9.1-4.el7ost.noarch
Deploy a system with ceph (external or internal)as Glance's backend
On Glance-api.conf
image_size_cap=524288000 (or smaller)
While on this file verify/check rbd config
stores=glance.store.http.Store,glance.store.rbd.Store
default_store=rbd
rbd_store_pool=images -> you'll need this later below
Restart Glance services
upload a smaller than limit image, in my case a tiny cirros.
#openstack image create --disk-format qcow2 --container-format bare --file cirros-0.3.5-x86_64-disk.img cirros
Log on to one of the ceph nodes
#rbd -p images ls -l -> images is the pool name -> rbd_store_pool
[root@ceph-0 ~]# rbd -p images ls -l
NAME SIZE PARENT FMT PROT LOCK
007c3e4c-e584-48e6-a7fb-fcb07b663168 12957k 2
007c3e4c-e584-48e6-a7fb-fcb07b663168@snap 12957k 2 yes
Now try to upload a larger than 524288000 (bytes)
$ openstack image create --disk-format qcow2 --container-format bare --file CentOS-7-x86_64-GenericCloud.qcow2 CentOS
/usr/lib/python2.7/site-packages/keyring/backends/Gnome.py:6: PyGIWarning: GnomeKeyring was imported without specifying a version first. Use gi.require_version('GnomeKeyring', '1.0') before import to ensure that the right version gets loaded.
from gi.repository import GnomeKeyring
413 Request Entity Too Large: Image exceeds the storage quota: The size of the data None will exceed the limit. None bytes remaining. (HTTP 413)
Great upload failed as expected, now lets catch ceph in the act
#watch -d -n 2 rbd -p images ls -l ->
very 2.0s: rbd -p images ls -l Mon Oct 2 06:34:03 2017
NAME SIZE PARENT FMT PROT LOCK
007c3e4c-e584-48e6-a7fb-fcb07b663168 12957k 2
007c3e4c-e584-48e6-a7fb-fcb07b663168@snap 12957k 2 yes
3d11526b-b725-47ff-b4c0-f3c489c3e8e8 24576k 2
Notice the third line it will only show momentarily as long as upload remains below limit, once limit is reached Glance upload will stop/fail and the line/data will be deleted from ceph.
Below ceph ls after the upload failed, only the two lines of Cirros's image remain, verified.
[root@ceph-0 ~]# rbd -p images ls -l
NAME SIZE PARENT FMT PROT LOCK
007c3e4c-e584-48e6-a7fb-fcb07b663168 12957k 2
007c3e4c-e584-48e6-a7fb-fcb07b663168@snap 12957k 2 yes
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3067 |