Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1485403

Summary: Rbd chunks still written if glance image_size_cap is hit
Product: Red Hat OpenStack Reporter: stphilli
Component: python-glance-storeAssignee: Cyril Roelandt <cyril>
Status: CLOSED ERRATA QA Contact: Mike Abrams <mabrams>
Severity: high Docs Contact:
Priority: high    
Version: 8.0 (Liberty)CC: apevec, cyril, ealcaniz, eglynn, fpercoco, hklein, jjoyce, jobernar, lhh, maugarci, mlopes, mschuppe, pablo.iranzo, pdeore, pgrist, scohen, shan, sputhenp, srevivo, tshefi
Target Milestone: zstreamKeywords: Triaged, ZStream
Target Release: 8.0 (Liberty)Flags: tshefi: automate_bug-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: python-glance-store-0.9.1-4.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-25 17:06:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
controller issue
none
Hopefully working package. none

Description stphilli 2017-08-25 15:16:12 UTC
Description of problem:
rbd chunks are being written and not removed before glance reports: Image exceeds the storage quota: The size of the data None will exceed the limit. None bytes remaining.

Version-Release number of selected component (if applicable):
openstack-glance-11.0.1-4.el7ost.noarch

How reproducible:
easily

Steps to Reproduce:
1.Setup glance with rbd backend
2.Set image_size_cap in glance-api.conf to 1g
3.Attempt to upload a 2g image to glance and let it error out

Actual results:
Chunks will be written to ceph rbd before glance returns a quota error

Expected results:
If glance image fails to upload chunks should be removed.

Comment 3 Edu Alcaniz 2017-08-28 09:59:52 UTC
Issue repeat

date;openstack image create test_082517_1519_max_upload --file /etc/glance/fake_image2G.img;date
Fri Aug 25 15:19:32 CEST 2017
413 Request Entity Too Large: Image exceeds the storage quota: The size of the data None will exceed the limit. None bytes remaining. (HTTP 413)
Fri Aug 25 15:19:56 CEST 2017

Comment 4 Edu Alcaniz 2017-08-28 10:00:16 UTC
Created attachment 1319039 [details]
controller issue

Comment 5 Edu Alcaniz 2017-08-28 10:30:22 UTC
There is no quota configured

[ealcaniz@ealcaniz glance]$  grep quota glance-api.conf 
#image_member_quota=128
#image_property_quota=128
#image_tag_quota=128
#image_location_quota=10
# Set a system wide quota for every user. This value is the total
user_storage_quota=0


There are no quotas on the pool:

# ceph osd pool get-quota images
quotas for pool 'images':
  max objects: N/A
  max bytes  : N/A

Comment 7 Edu Alcaniz 2017-08-29 05:46:25 UTC
Yes the log included is part of the attempt. The glance only complains about the quota but it writes the image in the ceph pool instead of rollback or don't write.

Comment 8 Cyril Roelandt 2017-08-29 16:29:38 UTC
(In reply to Edu Alcaniz from comment #7)
> Yes the log included is part of the attempt. The glance only complains about
> the quota but it writes the image in the ceph pool instead of rollback or
> don't write.

Here, Glance has to start writing the image, because it does not know its size (you can see in the error message that the image size is 'None'). At some point, it has written so much data that it has to rollback, as you wrote.

The RBD driver in glance_store tries to remove the image from the Ceph cluster, but for some reason, this fails. Could we check the Ceph logs and see what happened there? I wonder whether the deletion request never reached Ceph, or whether it reached it but could not complete.

Comment 10 Cyril Roelandt 2017-08-30 15:09:46 UTC
I might be mistaken, but I can't see any actual Ceph logs in there. This is what I'd really like to read :(

Comment 12 Edu Alcaniz 2017-08-30 15:15:07 UTC
Hi, What log do you need ?

Comment 13 Cyril Roelandt 2017-08-30 15:41:22 UTC
I assume Ceph has logs (probably in /var/log/ceph/). They may not be enabled by default, though.

I'd like to see whether Ceph logs a "please delete this image" request, sent from Glance, and whether it complains about anything.

Comment 17 Edu Alcaniz 2017-08-31 12:04:17 UTC
Attaching logs files from ceph nodes but we didn't see any notes of Ceph actions.

Comment 18 Cyril Roelandt 2017-08-31 20:33:20 UTC
I am really not a Ceph expert, but isn't Ceph supposed to log this kind of things? Like image creation, deletion, etc.

Is this bug something "new"? Has it only been happening since an upgrade? Do you witness the same issue on other setups?

Comment 19 Edu Alcaniz 2017-09-01 05:57:46 UTC
Sorry, it seems a new bug but there is not upgrade performed or anything else. 
The issue is happening in other regions too. It is not only the lab that we are replicating.

Comment 31 Edu Alcaniz 2017-09-12 10:27:12 UTC
After modifying RBD.py with LOG.error('TEST calling remove') on line 367 I've executed the test again. However, the glance api.log is not printing the message.

I'm attaching the glance api.log. The image used has this id: 0788c9d0-6efc-4239-a733-593ce5a4c4f2

Comment 33 Jon Bernard 2017-09-12 13:27:47 UTC
Okay, this is helpful information and suggests the issue is likely within glance's logic and not a lower layer.  Erno identified a few patches in successive versions that may address this issue and are not currently backported.

Comment 35 Cyril Roelandt 2017-09-13 14:13:40 UTC
OK, so I think this backport should be enough. I'd like to hear from Erno, who seems to have identified more than one patch to backport, and from Jon, who knows this driver better than I do.

@Edu: in the meantime, if the user could try the patch I posted and tell us whether that fixes their issue, it would be great!

Comment 36 Edu Alcaniz 2017-09-13 14:15:50 UTC
(In reply to Cyril Roelandt from comment #35)
> OK, so I think this backport should be enough. I'd like to hear from Erno,
> who seems to have identified more than one patch to backport, and from Jon,
> who knows this driver better than I do.
> 
> @Edu: in the meantime, if the user could try the patch I posted and tell us
> whether that fixes their issue, it would be great!

Yes if you deliver an rpm, I could ask to try the test package.

Comment 38 maugarci 2017-09-14 09:49:24 UTC
(In reply to Cyril Roelandt from comment #35)
> OK, so I think this backport should be enough. I'd like to hear from Erno,
> who seems to have identified more than one patch to backport, and from Jon,
> who knows this driver better than I do.
> 
> @Edu: in the meantime, if the user could try the patch I posted and tell us
> whether that fixes their issue, it would be great!

Hi Cyril,

I've tested rpm python-glance-store-0.9.1-4.el7ost.src.rpm on customer's environment however it seems installation isn't finishing properly.

The following was the execution I've done:

# rpm -ivh python-glance-store-0.9.1-4.el7ost.src.rpm 
Updating / installing...
   1:python-glance-store-0.9.1-4.el7os################################# [100%]
warning: user cyril does not exist - using root
warning: group cyril does not exist - using root
warning: user cyril does not exist - using root
warning: group cyril does not exist - using root
warning: user cyril does not exist - using root
warning: group cyril does not exist - using root
warning: user cyril does not exist - using root
warning: group cyril does not exist - using root
warning: user cyril does not exist - using root
warning: group cyril does not exist - using root
warning: user cyril does not exist - using root
warning: group cyril does not exist - using root

# rpm -qa | grep python-glance-store
python-glance-store-0.9.1-3.el7ost.noarch

Then, I created user cyril to avoid the warning but still the package is not been installed:

# rpm -ivh python-glance-store-0.9.1-4.el7ost.src.rpm
Updating / installing...
   1:python-glance-store-0.9.1-4.el7os################################# [100%]

# rpm -qa | grep python-glance-store
python-glance-store-0.9.1-3.el7ost.noarch

Comment 39 Cyril Roelandt 2017-09-14 13:11:04 UTC
Created attachment 1326002 [details]
Hopefully working package.

OK, don't create users/groups, this should not be required. I gave you the source package, my bad.

This patch should work :)

Comment 40 maugarci 2017-09-14 13:58:08 UTC
Hi Cyril,

I tried to install the new version but the package isn't been recognized as an update. 

$ rpm -ivh python-glance-store-0.9.1-rhbz1485403.el7ost.noarch.rpm 
Preparing...                          ################################# [100%]
	package python-glance-store-0.9.1-3.el7ost.noarch (which is newer than python-glance-store-0.9.1-rhbz1485403.el7ost.noarch) is already installed
	file /usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.py from install of python-glance-store-0.9.1-rhbz1485403.el7ost.noarch conflicts with file from package python-glance-store-0.9.1-3.el7ost.noarch
	file /usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.pyc from install of python-glance-store-0.9.1-rhbz1485403.el7ost.noarch conflicts with file from package python-glance-store-0.9.1-3.el7ost.noarch
	file /usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.pyo from install of python-glance-store-0.9.1-rhbz1485403.el7ost.noarch conflicts with file from package python-glance-store-0.9.1-3.el7ost.noarch
...
...
...


Also I tried with localinstall but I got the same result:

$ yum localinstall python-glance-store-0.9.1-rhbz1485403.el7ost.noarch.rpm 
Loaded plugins: package_upload, product-id, search-disabled-repos, subscription-manager
Examining python-glance-store-0.9.1-rhbz1485403.el7ost.noarch.rpm: python-glance-store-0.9.1-rhbz1485403.el7ost.noarch
python-glance-store-0.9.1-rhbz1485403.el7ost.noarch.rpm: does not update installed package.
Nothing to do

I can't remove version 0.9.1-3 in this environment since both openstack-glance and python-glance are dependent packages

How can I install it ?

Comment 41 Cyril Roelandt 2017-09-14 15:57:52 UTC
Hm, maybe it's my weird suffix that prevents it from being recognized as an upgrade. I'm not sure, since I'm really not an RPM expert. 

@Jon: any idea?

Comment 42 Pablo Iranzo Gómez 2017-09-15 13:16:01 UTC
Hi
Older package is :

- python-glance-store-0.9.1-3.el7ost.noarch


Newer package is:

- python-glance-store-0.9.1-rhbz1485403.el7ost.noarch.rpm 

one being -3 and other being -r makes it not to take as an update.

Do force it with:

rpm -ivh --force $package instead

Regards,
Pablo

Comment 43 maugarci 2017-09-18 09:19:42 UTC
Tested package python-glance-store-0.9.1-rhbz1485403.el7ost.noarch.rpm after installed it with --force option.

It's working fine. It doesn't leave garbage in Ceph cluster.

Comment 49 Tzach Shefi 2017-10-02 07:37:22 UTC
Verified on:
python-glance-store-0.9.1-4.el7ost.noarch

Deploy a system with ceph (external or internal)as Glance's backend  

On Glance-api.conf
image_size_cap=524288000  (or smaller)

While on this file verify/check rbd config
stores=glance.store.http.Store,glance.store.rbd.Store
default_store=rbd
rbd_store_pool=images  -> you'll need this later below

Restart Glance services

upload a smaller than limit image, in my case a tiny cirros.

#openstack image create --disk-format qcow2 --container-format bare --file cirros-0.3.5-x86_64-disk.img cirros


Log on to one of the ceph nodes
#rbd -p images ls -l    -> images is the pool name -> rbd_store_pool

[root@ceph-0 ~]# rbd -p images ls -l
NAME                                        SIZE PARENT FMT PROT LOCK 
007c3e4c-e584-48e6-a7fb-fcb07b663168      12957k          2           
007c3e4c-e584-48e6-a7fb-fcb07b663168@snap 12957k          2 yes 


Now try to upload a larger than 524288000 (bytes) 

$ openstack image create --disk-format qcow2 --container-format bare --file CentOS-7-x86_64-GenericCloud.qcow2 CentOS
/usr/lib/python2.7/site-packages/keyring/backends/Gnome.py:6: PyGIWarning: GnomeKeyring was imported without specifying a version first. Use gi.require_version('GnomeKeyring', '1.0') before import to ensure that the right version gets loaded.
  from gi.repository import GnomeKeyring
413 Request Entity Too Large: Image exceeds the storage quota: The size of the data None will exceed the limit. None bytes remaining. (HTTP 413)

Great upload failed as expected, now lets catch ceph in the act
#watch -d -n 2 rbd -p images ls -l -> 

  very 2.0s: rbd -p images ls -l                                                                                                                                                                                      Mon Oct  2 06:34:03 2017

NAME                                        SIZE PARENT FMT PROT LOCK
007c3e4c-e584-48e6-a7fb-fcb07b663168      12957k          2
007c3e4c-e584-48e6-a7fb-fcb07b663168@snap 12957k          2 yes
3d11526b-b725-47ff-b4c0-f3c489c3e8e8      24576k          2

Notice the third line it will only show momentarily as long as upload remains below limit, once limit is reached Glance upload will stop/fail and the line/data will be deleted from ceph. 

Below ceph ls after the upload failed, only the two lines of Cirros's image remain, verified. 

[root@ceph-0 ~]# rbd -p images ls -l                                                                                                                                                                                           
NAME                                        SIZE PARENT FMT PROT LOCK                                                                                                                                                                        
007c3e4c-e584-48e6-a7fb-fcb07b663168      12957k          2                                                                                                                                                                                  
007c3e4c-e584-48e6-a7fb-fcb07b663168@snap 12957k          2 yes

Comment 52 errata-xmlrpc 2017-10-25 17:06:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3067