Hide Forgot
Description of problem: Nova API is very slow on an IPV6 environment with Nova control plane services running on a custom role: nova service-list takes almost 1 minute while nova list takes 20 seconds. Note that this is a fresh empty overcloud installation where tempest is currently running. I wasn't able to reproduce these issues on OSP10 installed on the same environment. [stack@undercloud-0 ~]$ time nova service-list 2> /dev/null +----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+ | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+ | 29 | nova-conductor | serviceapi-1.localdomain | internal | enabled | up | 2017-03-28T15:14:37.000000 | - | | 38 | nova-conductor | serviceapi-2.localdomain | internal | enabled | up | 2017-03-28T15:14:37.000000 | - | | 68 | nova-compute | compute-1.localdomain | nova | enabled | up | 2017-03-28T15:14:36.000000 | - | | 71 | nova-scheduler | serviceapi-1.localdomain | internal | enabled | up | 2017-03-28T15:14:37.000000 | - | | 74 | nova-scheduler | serviceapi-2.localdomain | internal | enabled | up | 2017-03-28T15:14:37.000000 | - | | 77 | nova-compute | compute-0.localdomain | nova | enabled | up | 2017-03-28T15:14:31.000000 | - | | 80 | nova-consoleauth | serviceapi-2.localdomain | internal | enabled | up | 2017-03-28T15:14:37.000000 | - | | 83 | nova-consoleauth | serviceapi-1.localdomain | internal | enabled | up | 2017-03-28T15:14:37.000000 | - | | 86 | nova-conductor | serviceapi-0.localdomain | internal | enabled | up | 2017-03-28T15:14:38.000000 | - | | 95 | nova-scheduler | serviceapi-0.localdomain | internal | enabled | up | 2017-03-28T15:14:37.000000 | - | | 98 | nova-consoleauth | serviceapi-0.localdomain | internal | enabled | up | 2017-03-28T15:14:36.000000 | - | +----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+ real 0m55.932s user 0m0.790s sys 0m0.110s [stack@undercloud-0 ~]$ time nova list 2> /dev/null +----+------+--------+------------+-------------+----------+ | ID | Name | Status | Task State | Power State | Networks | +----+------+--------+------------+-------------+----------+ +----+------+--------+------------+-------------+----------+ real 0m20.051s user 0m0.736s sys 0m0.117s Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-6.0.0-0.5.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy overcloud with IPv6 networking with 3 controllers, 3 ceph nodes, 2 computes and 3 custom roles running systemd managed services Actual results: Nova commands are very slow. Expected results: Usually nova list returns a result in ~ 1s. Additional info: Attaching the sosreports.
I'm asking for a blocker here since I don't think this can be considered a working environment even though the deployment has passed.
Deploy command #!/bin/bash openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates/ \ -r /home/stack/composable_roles/roles/roles_data.yaml \ -e /home/stack/composable_roles/roles/nodes.yaml \ -e /home/stack/composable_roles/roles/network-config.yaml \ -e /home/stack/composable_roles/debug.yaml \ --environment-file /usr/share/openstack-tripleo-heat-templates/environments/services/sahara.yaml \ --environment-file /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \ -e /home/stack/composable_roles/internal.yaml \ -e /home/stack/composable_roles/network/network-environment-v6.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \ -e /home/stack/composable_roles/enable-tls.yaml \ -e /home/stack/composable_roles/inject-trust-anchor.yaml \ -e /home/stack/composable_roles/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \ -e /home/stack/composable_roles/debug.yaml \ --log-file overcloud_deployment_27.log roles_data.yaml: http://paste.openstack.org/show/604512/
Just for posterity, I see that a single request to show one server takes 9 seconds, yet all of our logs emitted for the whole request is ~400ms: [heat-admin@serviceapi-2 ~]$ sudo grep req-64596a8d-bbfb-4969-bac0-b472ff7a28f6 /var/log/nova/nova-api.log 2017-03-29 15:19:51.389 24574 DEBUG nova.api.openstack.wsgi [req-64596a8d-bbfb-4969-bac0-b472ff7a28f6 af7ad370da914c4b85adac00265eb883 0f29a52bc7384d25b6e02e6a8f360fc4 - default default] Calling method '<bound method ServersController.show of <nova.api.openstack.compute.servers.ServersController object at 0x73890d0>>' _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:626 2017-03-29 15:19:51.393 24574 DEBUG nova.compute.api [req-64596a8d-bbfb-4969-bac0-b472ff7a28f6 af7ad370da914c4b85adac00265eb883 0f29a52bc7384d25b6e02e6a8f360fc4 - default default] [instance: bc7f1160-bfc0-4467-8902-d6989e2c52c6] Fetching instance by UUID get /usr/lib/python2.7/site-packages/nova/compute/api.py:2349 2017-03-29 15:19:51.415 24574 DEBUG nova.policy [req-64596a8d-bbfb-4969-bac0-b472ff7a28f6 af7ad370da914c4b85adac00265eb883 0f29a52bc7384d25b6e02e6a8f360fc4 - default default] Policy check for os_compute_api:os-extended-server-attributes failed with credentials {'service_roles': [], 'user_id': u'af7ad370da914c4b85adac00265eb883', 'roles': [u'swiftoperator'], 'user_domain_id': u'default', 'service_project_id': None, 'service_user_id': None, 'service_user_domain_id': None, 'service_project_domain_id': None, 'is_admin_project': True, 'is_admin': False, 'project_id': u'0f29a52bc7384d25b6e02e6a8f360fc4', 'project_domain_id': u'default'} authorize /usr/lib/python2.7/site-packages/nova/policy.py:168 2017-03-29 15:19:51.711 24574 DEBUG neutronclient.v2_0.client [req-64596a8d-bbfb-4969-bac0-b472ff7a28f6 af7ad370da914c4b85adac00265eb883 0f29a52bc7384d25b6e02e6a8f360fc4 - default default] GET call to neutron for http://[fd00:fd00:fd00:2000::19]:9696/v2.0/ports.json?device_id=bc7f1160-bfc0-4467-8902-d6989e2c52c6 used request id req-ce6f8f3f-abab-4c08-8486-234588e41cc2 _append_request_id /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:128 2017-03-29 15:19:51.735 24574 INFO nova.osapi_compute.wsgi.server [req-64596a8d-bbfb-4969-bac0-b472ff7a28f6 af7ad370da914c4b85adac00265eb883 0f29a52bc7384d25b6e02e6a8f360fc4 - default default] fd00:fd00:fd00:2000::14 "GET /v2.1/servers/bc7f1160-bfc0-4467-8902-d6989e2c52c6 HTTP/1.1" status: 200 len: 1687 time: 9.4964290 That really makes me think that maybe this delay is coming from keystone middleware or something like that.
With keystone middleware debug enabled, I see that the connection is accepted, then a long (~10s) delay while auth'ing the token with keystone, followed by an immediate response: 2017-03-29 15:51:42.737 106427 DEBUG nova.osapi_compute.wsgi.server [req-6704d652-c50a-4960-ac6c-8a43a94c9533 - - - - -] (106427) accepted ('fd00:fd00:fd00:2000::14', 35420, 0, 0) server /usr/lib/python2.7/site-packages/eventlet/wsgi.py:867 2017-03-29 15:51:42.741 106427 DEBUG keystonemiddleware.auth_token [-] Authenticating user token process_request /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:401 2017-03-29 15:51:51.753 106427 DEBUG keystoneauth.session [-] REQ: curl -g -i -X GET http://[fd00:fd00:fd00:2000::19]:5000 -H "Accept: application/json" -H "User-Agent: nova/15.0.1 keystonemiddleware.auth_token/4.14.0 keystoneauth1/2.18.0 python-requests/2.11.1 CPython/2.7.5" _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:347 2017-03-29 15:51:51.778 106427 DEBUG keystoneauth.session [-] RESP: [300] Date: Wed, 29 Mar 2017 15:51:51 GMT Server: Apache Vary: X-Auth-Token Content-Length: 624 Content-Type: application/json RESP BODY: {"versions": {"values": [{"status": "stable", "updated": "2017-02-22T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.8", "links": [{"href": "http://[fd00:fd00:fd00:2000::19]:5000/v3/", "rel": "self"}]}, {"status": "deprecated", "updated": "2016-08-04T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://[fd00:fd00:fd00:2000::19]:5000/v2.0/", "rel": "self"}, {"href": "https://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}]}} _http_log_response /usr/lib/python2.7/site-packages/keystoneauth1/session.py:395 2017-03-29 15:51:51.779 106427 DEBUG keystoneauth.identity.v2 [-] Making authentication request to http://[fd00:fd00:fd00:2000::19]:5000/v2.0/tokens get_auth_ref /usr/lib/python2.7/site-packages/keystoneauth1/identity/v2.py:63 2017-03-29 15:51:52.001 106427 DEBUG keystoneauth.session [-] REQ: curl -g -i -X GET http://192.168.24.12:35357/ -H "Accept: application/json" -H "User-Agent: nova/15.0.1 keystonemiddleware.auth_token/4.14.0 keystoneauth1/2.18.0 python-requests/2.11.1 CPython/2.7.5" _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:347 2017-03-29 15:51:52.007 106427 DEBUG keystoneauth.session [-] RESP: [300] Date: Wed, 29 Mar 2017 15:51:52 GMT Server: Apache Vary: X-Auth-Token Content-Length: 602 Content-Type: application/json RESP BODY: {"versions": {"values": [{"status": "stable", "updated": "2017-02-22T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.8", "links": [{"href": "http://192.168.24.12:35357/v3/", "rel": "self"}]}, {"status": "deprecated", "updated": "2016-08-04T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://192.168.24.12:35357/v2.0/", "rel": "self"}, {"href": "https://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}]}} _http_log_response /usr/lib/python2.7/site-packages/keystoneauth1/session.py:395 2017-03-29 15:51:52.008 106427 DEBUG keystonemiddleware.auth_token [-] Auth Token confirmed use of None apis _get_strategy_class /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_identity.py:192 2017-03-29 15:51:52.010 106427 DEBUG keystoneauth.session [-] REQ: curl -g -i -X GET http://192.168.24.12:35357/v3/auth/tokens -H "X-Subject-Token: {SHA1}4a82989a67f252a0b1499df1da8f95799a6c1bb7" -H "User-Agent: python-keystoneclient" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}4ac8008c14baf2a5f675a8211b83755c407163f4" _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:347 2017-03-29 15:51:52.228 106427 DEBUG keystoneauth.session [-] RESP: [200] Date: Wed, 29 Mar 2017 15:51:52 GMT Server: Apache X-Subject-Token: {SHA1}4a82989a67f252a0b1499df1da8f95799a6c1bb7 Vary: X-Auth-Token,Accept-Encoding x-openstack-request-id: req-3ffa52b8-e427-401b-9cbd-499f2cd76fe9 Content-Encoding: gzip Content-Length: 2386 Content-Type: application/json RESP BODY: {"token": {"is_domain": false, "methods": ["password"], "roles": [{"id": "7ccb94cab31947ab8aaff6a58beb5028", "name": "admin"}], "expires_at": "2017-03-29T16:51:42.000000Z", "project": {"domain": {"id": "default", "name": "Default"}, "id": "ad3c065f889d435d844858b4ffda1290", "name": "admin"}, "catalog": "<removed>", "user": {"domain": {"id": "default", "name": "Default"}, "password_expires_at": null, "name": "admin", "id": "30ed2ad8ec1f45619fdca42abf077c16"}, "audit_ids": ["oLTvXzhxQ9-B8qekdeki1A"], "issued_at": "2017-03-29T15:51:42.000000Z"}} _http_log_response /usr/lib/python2.7/site-packages/keystoneauth1/session.py:395 2017-03-29 15:51:52.228 106427 DEBUG keystoneauth.session [-] GET call to identity for http://192.168.24.12:35357/v3/auth/tokens used request id req-3ffa52b8-e427-401b-9cbd-499f2cd76fe9 request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:640 2017-03-29 15:51:52.230 106427 DEBUG keystonemiddleware.auth_token [-] Received request from user: user_id 30ed2ad8ec1f45619fdca42abf077c16, project_id ad3c065f889d435d844858b4ffda1290, roles admin process_request /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:672 2017-03-29 15:51:53.478 106427 INFO nova.osapi_compute.wsgi.server [req-a18a7ebc-4384-4731-9735-431b12dade38 30ed2ad8ec1f45619fdca42abf077c16 ad3c065f889d435d844858b4ffda1290 - default default] fd00:fd00:fd00:2000::14 "GET /v2.1 HTTP/1.1" status: 302 len: 245 time: 10.7393012 So, pretty sure that this is something related to the environment/config and not anything fundamentally incompatible with ipv6 in nova (i.e. a nova-related regression).
I think this is the problem: [heat-admin@serviceapi-2 ~]$ sudo grep '^memcache' /etc/nova/nova.conf memcache_servers=inet6:[192.168.24.15]:11211,inet6:[192.168.24.6]:11211,inet6:[192.168.24.11]:11211 memcached_servers=inet6:[192.168.24.15]:11211,inet6:[192.168.24.6]:11211,inet6:[192.168.24.11]:11211 Nova is configured for contacting memcached over ipv6 without ipv6 addrs. I think those are being looked up as hostnames because they're not ipv6 addrs, that delays us, then we go straight to keystone to validate the token each time.
[root@serviceapi-0 hieradata]# hiera memcached_node_ips_v6 ["inet6:[192.168.24.15]", "inet6:[192.168.24.6]", "inet6:[192.168.24.11]"] Additional info - the memcached servers are running on the controller nodes but the v4 addresses in memcached_node_ips_v6 are the ceph nodes ctlplane addresses.
Looks like the role order is significant: https://github.com/openstack/tripleo-heat-templates/blob/0ae9d9b346f1c5d7e29b8f281474577eca260670/overcloud.j2.yaml#L1 roles[0] in http://paste.openstack.org/show/604512/ is CephStorage.
Dan (Prince) the commit has you're fingerprints on it. Is this expected/documented that Controller has to be the first role?
s/you're/your/ This is the culprit: https://github.com/openstack/tripleo-heat-templates/commit/b1fe2e8d60ad3dcfeec858539343be57115dd698
Does this mean that currently the memcached service is tied to the primary role? And what is the 'primary' role?
The controller has to be the first (primary) role in roles_data.yaml. This was to unhardcode the controller in deployments. Please update your roles_data.yaml to list the controller first. I'll work on something to make the ordering less important but it probably won't land for a while.
Also to answer your question Marius, yes memcache is currently tied to the 'primary' role which used to be the controller. This means you can't move memcache to it's own node and needs to be on the controller. This is due to: https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.j2.yaml#L476
I placed the Controller role first in roles_data.yaml and the issue didn't reproduce any longer.
(In reply to Ollie Walsh from comment #9) > Dan (Prince) the commit has you're fingerprints on it. Is this > expected/documented that Controller has to be the first role? In order to avoid hard coding 'Controller' in various places within the tripleo-heat-templates we created a primary_role mechanism where the first role is used for certain things. Users who edit their own roles_data.yaml files will need to be mindful of this requirement. We do have comments in some of the t-h-t templates but I can look to find the right places to document this as well.
The ability to tag roles has been added which allows for any ordering of the roles. The only requirement is that there be a role tagged with 'primary' and 'controller'. This is traditionally the main controller role.
Verified on a build 2017-12-01.4
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462