Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1244852

Summary:	openstack-nova-api metadata service is surprisingly slow with memcached
Product:	Red Hat OpenStack	Reporter:	Attila Fazekas <afazekas>
Component:	openstack-nova	Assignee:	Sven Anderson <svanders>
Status:	CLOSED CURRENTRELEASE	QA Contact:	nlevinki <nlevinki>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	Director	CC:	berrange, dasmith, eglynn, kchamart, sbauza, sferdjao, sgordon, srevivo, svanders, vromanso
Target Milestone:	---	Keywords:	ZStream
Target Release:	8.0 (Liberty)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-05-10 14:53:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Attila Fazekas 2015-07-20 15:21:29 UTC

Description of problem:
Nova metedata service is able to cache the instance metadata entry in all nova api process or in a centralised memcached server.

The local cache expected to be little faster, but not 10 times faster than the the external memcached service.


Version-Release number of selected component (if applicable):
openstack-ironic-common-2015.1.0-9.el7ost.noarch
openstack-ironic-api-2015.1.0-9.el7ost.noarch
openstack-ironic-discoverd-1.1.0-5.el7ost.noarch
python-ironic-discoverd-1.1.0-5.el7ost.noarch
openstack-ironic-conductor-2015.1.0-9.el7ost.noarch
python-ironicclient-0.5.1-9.el7ost.noarch
openstack-nova-compute-2015.1.0-15.el7ost.noarch
openstack-nova-scheduler-2015.1.0-15.el7ost.noarch
openstack-nova-cert-2015.1.0-15.el7ost.noarch
openstack-nova-conductor-2015.1.0-15.el7ost.noarch
python-novaclient-2.23.0-1.el7ost.noarch
openstack-nova-common-2015.1.0-15.el7ost.noarch
openstack-nova-console-2015.1.0-15.el7ost.noarch
python-nova-2015.1.0-15.el7ost.noarch
openstack-nova-api-2015.1.0-15.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-15.el7ost.noarch
python-pymemcache-1.2.5-2.el7ost.noarch
python-memcached-1.48-4.el7.noarch
memcached-1.4.15-9.el7.x86_64


How reproducible:
always

Steps to Reproduce:
compare the ab results with
[DEFAULT]memcached_servers=127.0.0.1:11211 option and without this option.


Actual results:
####################
## WITH memcached ##
####################
[heat-admin@overcloud-compute-0 ~]$ ab -c 10 -n 1000 http://169.254.169.254/latest/meta-data/local-hostname
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 169.254.169.254 (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        
Server Hostname:        169.254.169.254
Server Port:            80

Document Path:          /latest/meta-data/local-hostname
Document Length:        19 bytes

Concurrency Level:      10
Time taken for tests:   23.814 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      155000 bytes
HTML transferred:       19000 bytes
Requests per second:    41.99 [#/sec] (mean)
Time per request:       238.137 [ms] (mean)
Time per request:       23.814 [ms] (mean, across all concurrent requests)
Transfer rate:          6.36 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:    57  236 225.0    183    2262
Waiting:       57  236 224.9    183    2262
Total:         57  236 225.0    183    2263

#######################
## Without memcached ##
#######################
[heat-admin@overcloud-compute-0 ~]$ ab -c 10 -n 1000 http://169.254.169.254/latest/meta-data/local-hostname
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 169.254.169.254 (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        
Server Hostname:        169.254.169.254
Server Port:            80

Document Path:          /latest/meta-data/local-hostname
Document Length:        19 bytes

Concurrency Level:      10
Time taken for tests:   0.611 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      155000 bytes
HTML transferred:       19000 bytes
Requests per second:    1636.23 [#/sec] (mean)
Time per request:       6.112 [ms] (mean)
Time per request:       0.611 [ms] (mean, across all concurrent requests)
Transfer rate:          247.67 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       2
Processing:     2    6   4.1      5      61
Waiting:        1    6   4.1      5      61
Total:          2    6   4.1      5      61

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      6
  75%      7
  80%      8
  90%     10
  95%     11
  98%     13
  99%     18
 100%     61 (longest request)


Expected results:
On 4 core VM this thing must be above 200 req/sec even with memcached,
Both the undercloud and compute vm had 4 core.


Additional info:
If you try to do similar thing in non ironic vm, it is recommended to repeat only the request reaches the nova-api, otherwise you might be limited by the per network neutron metadata proxy service.

For example.:
 ab -c 10 -n 1000 -H 'Host: 172.16.40.33:8775' -H 'Content-Length: 0' -H 'accept-encoding: gzip, deflate' -H 'x-instance-id: ba29c5dd-3814-4407-80f1-3e443fe9bae6' -H 'x-forwarded-for: 10.1.0.5' -H 'x-instance-id-signature: 25a27df9e0d7c2ded0025e4dc761528b69b04dabe1e940bab1e559169af65ece' -H 'user-agent: Python-httplib2/0.9.1 (gzip)' -H 'x-tenant-id: 626d9a79239a4e8fa8a743c56ce1ae82' http://localhost:8775/latest/meta-data/local-hostname

Comment 3 Sven Anderson 2016-02-24 17:54:57 UTC

The problem is, that the cache is populated with metadata when lazy fields of the instance object are yet missing. In this case "flavor" data has not been loaded, and therefore is not cached and consequently queried for each request from the database. Triggering the the lazy load before caching increases the performance by a factor of 25.

I will investigate, how this can be solved properly.

Comment 4 Sven Anderson 2016-02-25 16:32:20 UTC

Related RHOS5 issue: https://bugzilla.redhat.com/show_bug.cgi?id=1302413

Comment 5 Sven Anderson 2016-02-25 16:34:27 UTC

The metadata caching is flawed in general: https://bugs.launchpad.net/nova/+bug/1549814

There are fixes upstream, which pre-fetches some of data before caching, here:

https://github.com/openstack/nova/commit/3a761270581d1ac61a3b4669c130d211f1ad5a17#diff-969229657f01b56c336e01497df732d7R1226

and here:

https://github.com/openstack/nova/commit/cc41015d463e11ac11bbaaac0b5c441329dc5f0b#diff-567f52edc17aff6c473d69c341a4cb0cR513

The second alone would fix this issue, since it pre-fetches the flavor data.

Unfortunately the second change introduced the pre-fetch as a side-effect, so it cannot be backported as is.

Comment 6 Sven Anderson 2016-02-29 11:24:23 UTC

I have submitted two upstream changes that are related to this.

Disabling memached for metadata caching:
https://review.openstack.org/#/c/285530

No parallel queries of the same data (this addresses point 2 in the description):
https://review.openstack.org/#/c/285562

Comment 8 Sven Anderson 2016-05-10 14:53:13 UTC

Because of upstream fixes there is no performance issue in liberty/OSP8 anymore.