This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours

Bug 858372

Summary: glance-api unresponsive during long-lived I/O-bound operations
Product: [Fedora] Fedora Reporter: Russell Bryant <rbryant>
Component: openstack-glanceAssignee: Eoghan Glynn <eglynn>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: akscram, alexander.sakhnov, apevec, asalkeld, bfilippov, breu, eglynn, Jan.van.Eldik, jonathansteffan, jose.castro.leon, markmc, matt_domsch, mlvov, nsantos, p, rbryant, rkukura
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-29 04:31:39 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Russell Bryant 2012-09-18 15:38:39 EDT
Description of problem:

While following the following test case, I ran into a problem:

https://fedoraproject.org/wiki/QA:Testcase_register_images_with_OpenStack

I started by adding an image using copy_from.  This command returned promptly.  After that, attempting another glance API action blocks for a long time.  After a while, it recovers and resumes normal behavior (perhaps after the image download is complete?)

See the following commands for an exmaple session:

[rbryant@f17-openstack-test-day ~]$ time glance add name=f16 is_public=true disk_format=qcow2 container_format=bare copy_from=http://berrange.fedorapeople.org/images/2012-02-29/f16-x86_64-openstack-sda.qcow2
Added new image with ID: 9717656c-2564-4b15-812c-8706ca038d2c

real	0m0.715s
user	0m0.117s
sys	0m0.033s
[rbryant@f17-openstack-test-day ~]$ time glance index
ID                                   Name                           Disk Format          Container Format     Size          
------------------------------------ ------------------------------ -------------------- -------------------- --------------
9717656c-2564-4b15-812c-8706ca038d2c f16                            qcow2                bare                      213581824

real	1m11.992s
user	0m0.109s
sys	0m0.022s
[rbryant@f17-openstack-test-day ~]$ time glance index
ID                                   Name                           Disk Format          Container Format     Size          
------------------------------------ ------------------------------ -------------------- -------------------- --------------
9717656c-2564-4b15-812c-8706ca038d2c f16                            qcow2                bare                      213581824

real	1m1.287s
user	0m0.124s
sys	0m0.061s

(repeat 'time glance index' 4 more times, with times varying from 1 to 1.5 minutes)

[rbryant@f17-openstack-test-day ~]$ time glance index
ID                                   Name                           Disk Format          Container Format     Size          
------------------------------------ ------------------------------ -------------------- -------------------- --------------
9717656c-2564-4b15-812c-8706ca038d2c f16                            qcow2                bare                      213581824

real	0m24.255s
user	0m0.125s
sys	0m0.034s
[rbryant@f17-openstack-test-day ~]$ time glance index
ID                                   Name                           Disk Format          Container Format     Size          
------------------------------------ ------------------------------ -------------------- -------------------- --------------
9717656c-2564-4b15-812c-8706ca038d2c f16                            qcow2                bare                      213581824

real	0m0.443s
user	0m0.110s
sys	0m0.017s

(all further instances return quickly like this)


Version-Release number of selected component (if applicable):

python-glance-2012.2-0.5.rc1.fc18.noarch
openstack-glance-2012.2-0.5.rc1.fc18.noarch
python-glanceclient-0.4.1-1.fc18.noarch


How reproducible:

After running all of the quoted commands, I can delete the image I added and start over and I get the same results


Expected results:

I would expect that all glance API calls be processed in a timely manner, even if an image is being downloaded (or whatever is causing this problem).
Comment 1 Russell Bryant 2012-09-18 15:54:22 EDT
After talking to Eoghan Glynn, this seems to be a confirmed issue so I have reported it upstream here: https://bugs.launchpad.net/glance/+bug/1052640
Comment 2 Eoghan Glynn 2012-09-18 17:23:06 EDT
Recently the glance copy-from logic was made asynchronous, so that a 202 response code is returned immediately, and the download from the remote location proceeds on a greenthread.

This change was the obvious candidate for causing the blockage on the subsequent API calls.

However, it turns out that with the copy-from reverting to its original synchronous form, or even just using a slow direct upload, we still see concurrent API calls being blocked.

So there's something more fundamental awry on the glance dispatch path.

Continuing to investigate ...
Comment 3 Eoghan Glynn 2012-09-19 07:13:54 EDT
Patch proposed to master upstream:

  https://review.openstack.org/13279
Comment 4 Pádraig Brady 2012-09-19 08:02:12 EDT
So this is only an issue when CONF.workers=0 (left at the default) ?
The code comments suggest that's default is best only for testing/profiling.
Should we be changing that default?
I suppose a higher level installer script might configure it (as well).
Comment 5 Eoghan Glynn 2012-09-20 10:15:23 EDT
Pádraig,

I proposed a change to the workers config option default value upstream also:

  https://review.openstack.org/13283

Needs a +1 to get it over the line.

Cheers,
Eoghan