Bug 1436769

Summary: Controller role is required to be the first item in the roles_data.yaml when using composable roles
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-heat-templatesAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: Gurenko Alex <agurenko>
Severity: urgent Docs Contact:
Priority: high    
Version: 11.0 (Ocata)CC: apevec, aschultz, dasmith, dbecker, dclarke, dprince, lhh, mburns, morazi, owalsh, rhel-osp-director-maint, sgordon, srevivo, tvignaud
Target Milestone: gaKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-7.0.0-0.20170616123155.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1437433 (view as bug list) Environment:
Last Closed: 2017-12-13 21:20:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2017-03-28 15:33:20 UTC
Description of problem:
Nova API is very slow on an IPV6 environment with Nova control plane services running on a custom role:

nova service-list takes almost 1 minute while nova list takes 20 seconds. Note that this is a fresh empty overcloud installation where tempest is currently running. I wasn't able to reproduce these issues on OSP10 installed on the same environment.

[stack@undercloud-0 ~]$ time nova service-list 2> /dev/null
+----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary           | Host                     | Zone     | Status  | State | Updated_at                 | Disabled Reason |
+----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+
| 29 | nova-conductor   | serviceapi-1.localdomain | internal | enabled | up    | 2017-03-28T15:14:37.000000 | -               |
| 38 | nova-conductor   | serviceapi-2.localdomain | internal | enabled | up    | 2017-03-28T15:14:37.000000 | -               |
| 68 | nova-compute     | compute-1.localdomain    | nova     | enabled | up    | 2017-03-28T15:14:36.000000 | -               |
| 71 | nova-scheduler   | serviceapi-1.localdomain | internal | enabled | up    | 2017-03-28T15:14:37.000000 | -               |
| 74 | nova-scheduler   | serviceapi-2.localdomain | internal | enabled | up    | 2017-03-28T15:14:37.000000 | -               |
| 77 | nova-compute     | compute-0.localdomain    | nova     | enabled | up    | 2017-03-28T15:14:31.000000 | -               |
| 80 | nova-consoleauth | serviceapi-2.localdomain | internal | enabled | up    | 2017-03-28T15:14:37.000000 | -               |
| 83 | nova-consoleauth | serviceapi-1.localdomain | internal | enabled | up    | 2017-03-28T15:14:37.000000 | -               |
| 86 | nova-conductor   | serviceapi-0.localdomain | internal | enabled | up    | 2017-03-28T15:14:38.000000 | -               |
| 95 | nova-scheduler   | serviceapi-0.localdomain | internal | enabled | up    | 2017-03-28T15:14:37.000000 | -               |
| 98 | nova-consoleauth | serviceapi-0.localdomain | internal | enabled | up    | 2017-03-28T15:14:36.000000 | -               |
+----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+

real	0m55.932s
user	0m0.790s
sys	0m0.110s
[stack@undercloud-0 ~]$ time nova list 2> /dev/null
+----+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+----+------+--------+------------+-------------+----------+
+----+------+--------+------------+-------------+----------+

real	0m20.051s
user	0m0.736s
sys	0m0.117s 

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-6.0.0-0.5.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with IPv6 networking with 3 controllers, 3 ceph nodes, 2 computes and 3 custom roles running systemd managed services

Actual results:
Nova commands are very slow.

Expected results:
Usually nova list returns a result in ~ 1s.

Additional info:
Attaching the sosreports.

Comment 1 Marius Cornea 2017-03-28 15:34:49 UTC
I'm asking for a blocker here since I don't think this can be considered a working environment even though the deployment has passed.

Comment 3 Marius Cornea 2017-03-28 16:10:47 UTC
Deploy command #!/bin/bash

openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates/ \
  -r /home/stack/composable_roles/roles/roles_data.yaml \
  -e /home/stack/composable_roles/roles/nodes.yaml \
  -e /home/stack/composable_roles/roles/network-config.yaml \
  -e /home/stack/composable_roles/debug.yaml \
--environment-file /usr/share/openstack-tripleo-heat-templates/environments/services/sahara.yaml \
--environment-file /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/composable_roles/internal.yaml \
-e /home/stack/composable_roles/network/network-environment-v6.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \
-e /home/stack/composable_roles/enable-tls.yaml \
-e /home/stack/composable_roles/inject-trust-anchor.yaml \
-e /home/stack/composable_roles/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \
-e /home/stack/composable_roles/debug.yaml \
--log-file overcloud_deployment_27.log

roles_data.yaml:
http://paste.openstack.org/show/604512/

Comment 4 Dan Smith 2017-03-29 15:36:35 UTC
Just for posterity, I see that a single request to show one server takes 9 seconds, yet all of our logs emitted for the whole request is ~400ms:

[heat-admin@serviceapi-2 ~]$ sudo grep req-64596a8d-bbfb-4969-bac0-b472ff7a28f6 /var/log/nova/nova-api.log
2017-03-29 15:19:51.389 24574 DEBUG nova.api.openstack.wsgi [req-64596a8d-bbfb-4969-bac0-b472ff7a28f6 af7ad370da914c4b85adac00265eb883 0f29a52bc7384d25b6e02e6a8f360fc4 - default default] Calling method '<bound method ServersController.show of <nova.api.openstack.compute.servers.ServersController object at 0x73890d0>>' _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:626
2017-03-29 15:19:51.393 24574 DEBUG nova.compute.api [req-64596a8d-bbfb-4969-bac0-b472ff7a28f6 af7ad370da914c4b85adac00265eb883 0f29a52bc7384d25b6e02e6a8f360fc4 - default default] [instance: bc7f1160-bfc0-4467-8902-d6989e2c52c6] Fetching instance by UUID get /usr/lib/python2.7/site-packages/nova/compute/api.py:2349
2017-03-29 15:19:51.415 24574 DEBUG nova.policy [req-64596a8d-bbfb-4969-bac0-b472ff7a28f6 af7ad370da914c4b85adac00265eb883 0f29a52bc7384d25b6e02e6a8f360fc4 - default default] Policy check for os_compute_api:os-extended-server-attributes failed with credentials {'service_roles': [], 'user_id': u'af7ad370da914c4b85adac00265eb883', 'roles': [u'swiftoperator'], 'user_domain_id': u'default', 'service_project_id': None, 'service_user_id': None, 'service_user_domain_id': None, 'service_project_domain_id': None, 'is_admin_project': True, 'is_admin': False, 'project_id': u'0f29a52bc7384d25b6e02e6a8f360fc4', 'project_domain_id': u'default'} authorize /usr/lib/python2.7/site-packages/nova/policy.py:168
2017-03-29 15:19:51.711 24574 DEBUG neutronclient.v2_0.client [req-64596a8d-bbfb-4969-bac0-b472ff7a28f6 af7ad370da914c4b85adac00265eb883 0f29a52bc7384d25b6e02e6a8f360fc4 - default default] GET call to neutron for http://[fd00:fd00:fd00:2000::19]:9696/v2.0/ports.json?device_id=bc7f1160-bfc0-4467-8902-d6989e2c52c6 used request id req-ce6f8f3f-abab-4c08-8486-234588e41cc2 _append_request_id /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:128
2017-03-29 15:19:51.735 24574 INFO nova.osapi_compute.wsgi.server [req-64596a8d-bbfb-4969-bac0-b472ff7a28f6 af7ad370da914c4b85adac00265eb883 0f29a52bc7384d25b6e02e6a8f360fc4 - default default] fd00:fd00:fd00:2000::14 "GET /v2.1/servers/bc7f1160-bfc0-4467-8902-d6989e2c52c6 HTTP/1.1" status: 200 len: 1687 time: 9.4964290

That really makes me think that maybe this delay is coming from keystone middleware or something like that.

Comment 5 Dan Smith 2017-03-29 15:53:58 UTC
With keystone middleware debug enabled, I see that the connection is accepted, then a long (~10s) delay while auth'ing the token with keystone, followed by an immediate response:

2017-03-29 15:51:42.737 106427 DEBUG nova.osapi_compute.wsgi.server [req-6704d652-c50a-4960-ac6c-8a43a94c9533 - - - - -] (106427) accepted ('fd00:fd00:fd00:2000::14', 35420, 0, 0) server /usr/lib/python2.7/site-packages/eventlet/wsgi.py:867
2017-03-29 15:51:42.741 106427 DEBUG keystonemiddleware.auth_token [-] Authenticating user token process_request /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:401
2017-03-29 15:51:51.753 106427 DEBUG keystoneauth.session [-] REQ: curl -g -i -X GET http://[fd00:fd00:fd00:2000::19]:5000 -H "Accept: application/json" -H "User-Agent: nova/15.0.1 keystonemiddleware.auth_token/4.14.0 keystoneauth1/2.18.0 python-requests/2.11.1 CPython/2.7.5" _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:347
2017-03-29 15:51:51.778 106427 DEBUG keystoneauth.session [-] RESP: [300] Date: Wed, 29 Mar 2017 15:51:51 GMT Server: Apache Vary: X-Auth-Token Content-Length: 624 Content-Type: application/json
RESP BODY: {"versions": {"values": [{"status": "stable", "updated": "2017-02-22T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.8", "links": [{"href": "http://[fd00:fd00:fd00:2000::19]:5000/v3/", "rel": "self"}]}, {"status": "deprecated", "updated": "2016-08-04T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://[fd00:fd00:fd00:2000::19]:5000/v2.0/", "rel": "self"}, {"href": "https://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}]}}
 _http_log_response /usr/lib/python2.7/site-packages/keystoneauth1/session.py:395
2017-03-29 15:51:51.779 106427 DEBUG keystoneauth.identity.v2 [-] Making authentication request to http://[fd00:fd00:fd00:2000::19]:5000/v2.0/tokens get_auth_ref /usr/lib/python2.7/site-packages/keystoneauth1/identity/v2.py:63
2017-03-29 15:51:52.001 106427 DEBUG keystoneauth.session [-] REQ: curl -g -i -X GET http://192.168.24.12:35357/ -H "Accept: application/json" -H "User-Agent: nova/15.0.1 keystonemiddleware.auth_token/4.14.0 keystoneauth1/2.18.0 python-requests/2.11.1 CPython/2.7.5" _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:347
2017-03-29 15:51:52.007 106427 DEBUG keystoneauth.session [-] RESP: [300] Date: Wed, 29 Mar 2017 15:51:52 GMT Server: Apache Vary: X-Auth-Token Content-Length: 602 Content-Type: application/json
RESP BODY: {"versions": {"values": [{"status": "stable", "updated": "2017-02-22T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.8", "links": [{"href": "http://192.168.24.12:35357/v3/", "rel": "self"}]}, {"status": "deprecated", "updated": "2016-08-04T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://192.168.24.12:35357/v2.0/", "rel": "self"}, {"href": "https://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}]}}
 _http_log_response /usr/lib/python2.7/site-packages/keystoneauth1/session.py:395
2017-03-29 15:51:52.008 106427 DEBUG keystonemiddleware.auth_token [-] Auth Token confirmed use of None apis _get_strategy_class /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_identity.py:192
2017-03-29 15:51:52.010 106427 DEBUG keystoneauth.session [-] REQ: curl -g -i -X GET http://192.168.24.12:35357/v3/auth/tokens -H "X-Subject-Token: {SHA1}4a82989a67f252a0b1499df1da8f95799a6c1bb7" -H "User-Agent: python-keystoneclient" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}4ac8008c14baf2a5f675a8211b83755c407163f4" _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:347
2017-03-29 15:51:52.228 106427 DEBUG keystoneauth.session [-] RESP: [200] Date: Wed, 29 Mar 2017 15:51:52 GMT Server: Apache X-Subject-Token: {SHA1}4a82989a67f252a0b1499df1da8f95799a6c1bb7 Vary: X-Auth-Token,Accept-Encoding x-openstack-request-id: req-3ffa52b8-e427-401b-9cbd-499f2cd76fe9 Content-Encoding: gzip Content-Length: 2386 Content-Type: application/json
RESP BODY: {"token": {"is_domain": false, "methods": ["password"], "roles": [{"id": "7ccb94cab31947ab8aaff6a58beb5028", "name": "admin"}], "expires_at": "2017-03-29T16:51:42.000000Z", "project": {"domain": {"id": "default", "name": "Default"}, "id": "ad3c065f889d435d844858b4ffda1290", "name": "admin"}, "catalog": "<removed>", "user": {"domain": {"id": "default", "name": "Default"}, "password_expires_at": null, "name": "admin", "id": "30ed2ad8ec1f45619fdca42abf077c16"}, "audit_ids": ["oLTvXzhxQ9-B8qekdeki1A"], "issued_at": "2017-03-29T15:51:42.000000Z"}}
 _http_log_response /usr/lib/python2.7/site-packages/keystoneauth1/session.py:395
2017-03-29 15:51:52.228 106427 DEBUG keystoneauth.session [-] GET call to identity for http://192.168.24.12:35357/v3/auth/tokens used request id req-3ffa52b8-e427-401b-9cbd-499f2cd76fe9 request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:640
2017-03-29 15:51:52.230 106427 DEBUG keystonemiddleware.auth_token [-] Received request from user: user_id 30ed2ad8ec1f45619fdca42abf077c16, project_id ad3c065f889d435d844858b4ffda1290, roles admin process_request /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:672
2017-03-29 15:51:53.478 106427 INFO nova.osapi_compute.wsgi.server [req-a18a7ebc-4384-4731-9735-431b12dade38 30ed2ad8ec1f45619fdca42abf077c16 ad3c065f889d435d844858b4ffda1290 - default default] fd00:fd00:fd00:2000::14 "GET /v2.1 HTTP/1.1" status: 302 len: 245 time: 10.7393012

So, pretty sure that this is something related to the environment/config and not anything fundamentally incompatible with ipv6 in nova (i.e. a nova-related regression).

Comment 6 Dan Smith 2017-03-29 16:17:36 UTC
I think this is the problem:

[heat-admin@serviceapi-2 ~]$ sudo grep '^memcache' /etc/nova/nova.conf
memcache_servers=inet6:[192.168.24.15]:11211,inet6:[192.168.24.6]:11211,inet6:[192.168.24.11]:11211
memcached_servers=inet6:[192.168.24.15]:11211,inet6:[192.168.24.6]:11211,inet6:[192.168.24.11]:11211

Nova is configured for contacting memcached over ipv6 without ipv6 addrs. I think those are being looked up as hostnames because they're not ipv6 addrs, that delays us, then we go straight to keystone to validate the token each time.

Comment 7 Marius Cornea 2017-03-29 16:38:10 UTC
[root@serviceapi-0 hieradata]# hiera memcached_node_ips_v6
["inet6:[192.168.24.15]", "inet6:[192.168.24.6]", "inet6:[192.168.24.11]"]


Additional info - the memcached servers are running on the controller nodes but the v4 addresses in memcached_node_ips_v6 are the ceph nodes ctlplane addresses.

Comment 9 Ollie Walsh 2017-03-29 20:04:32 UTC
Dan (Prince) the commit has you're fingerprints on it. Is this expected/documented that Controller has to be the first role?

Comment 10 Ollie Walsh 2017-03-29 20:05:37 UTC
s/you're/your/

This is the culprit:
https://github.com/openstack/tripleo-heat-templates/commit/b1fe2e8d60ad3dcfeec858539343be57115dd698

Comment 11 Marius Cornea 2017-03-29 20:29:25 UTC
Does this mean that currently the memcached service is tied to the primary role?
 And what is the 'primary' role?

Comment 12 Alex Schultz 2017-03-29 20:32:30 UTC
The controller has to be the first (primary) role in roles_data.yaml. This was to unhardcode the controller in deployments. Please update your roles_data.yaml to list the controller first. I'll work on something to make the ordering less important but it probably won't land for a while.

Comment 13 Alex Schultz 2017-03-29 20:55:20 UTC
Also to answer your question Marius, yes memcache is currently tied to the 'primary' role which used to be the controller. This means you can't move memcache to it's own node and needs to be on the controller. This is due to: https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.j2.yaml#L476

Comment 14 Marius Cornea 2017-03-30 10:06:24 UTC
I placed the Controller role first in roles_data.yaml and the issue didn't reproduce any longer.

Comment 15 Dan Prince 2017-03-31 12:22:12 UTC
(In reply to Ollie Walsh from comment #9)
> Dan (Prince) the commit has you're fingerprints on it. Is this
> expected/documented that Controller has to be the first role?

In order to avoid hard coding 'Controller' in various places within the tripleo-heat-templates we created a primary_role mechanism where the first role is used for certain things. Users who edit their own roles_data.yaml files will need to be mindful of this requirement.

We do have comments in some of the t-h-t templates but I can look to find the right places to document this as well.

Comment 16 Alex Schultz 2017-04-18 15:34:19 UTC
The ability to tag roles has been added which allows for any ordering of the roles. The only requirement is that there be a role tagged with 'primary' and 'controller'. This is traditionally the main controller role.

Comment 21 Gurenko Alex 2017-12-11 14:26:53 UTC
Verified on a build 2017-12-01.4

Comment 24 errata-xmlrpc 2017-12-13 21:20:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462