Created attachment 1279746 [details] longer log octavia.tests.tempest.v1.scenario.test_load_balancer_tree_minimal.TestLoadBalancerTreeMinimal.test_load_balancer_tree_minimal test fails : Traceback (most recent call last): File "tempest/test.py", line 96, in wrapper return f(self, *func_args, **func_kwargs) File "/home/centos/tempest-upstream/octavia/octavia/tests/tempest/v1/scenario/test_load_balancer_tree_minimal.py", line 45, in test_load_balancer_tree_minimal self._create_load_balancer_tree(cleanup=False) File "/home/centos/tempest-upstream/octavia/octavia/tests/tempest/v1/scenario/base.py", line 859, in _create_load_balancer_tree self._set_quotas(project_id=project_id) File "/home/centos/tempest-upstream/octavia/octavia/tests/tempest/v1/scenario/base.py", line 838, in _set_quotas return self.quotas_client.update_quotas(project_id, **body) File "/home/centos/tempest-upstream/octavia/octavia/tests/tempest/v1/clients/quotas_client.py", line 48, in update_quotas url = self._QUOTAS_URL.format(project_id=project_id) File "tempest/lib/common/rest_client.py", line 334, in put return self.request('PUT', url, extra_headers, headers, body, chunked) File "tempest/lib/common/rest_client.py", line 644, in request body=body, chunked=chunked) File "tempest/lib/common/rest_client.py", line 533, in _request method, url, headers, body, self.filters) File "tempest/lib/auth.py", line 188, in auth_request filters, method, url, headers, body) File "tempest/lib/auth.py", line 278, in _decorate_request base_url = self.base_url(filters=filters, auth_data=auth_data) File "tempest/lib/auth.py", line 570, in base_url endpoint_type, catalog)) tempest.lib.exceptions.EndpointNotFound: Endpoint not found Details: No matching service found in the catalog.
I'm still looking into this, but here is what I learned so far: Tests Branch: ============= The above-mentioned test suite was invoked using the wrong branch. A stable/ocata Octavia should always be tested against stable/ocata Octavia scenarios (via Octavia tempest plugin), using the master (which is currently Pike) branch resulted the 'Endpoint not found' error due to the way Octavia is registered as an endpoint with Keystone. The "Service Type" was changed from 'Octavia' to 'load-balancer', see: https://review.openstack.org/#/c/450916/ Octavia API IP Address: ======================= This should probably be filed against the Octavia deployment steps as a seperate bug (setting NEEDINFO on Brent to review this). The Octavia API service listens to an internal controller IP address: [root@controller-0 /]# grep bind_host /etc/octavia/conf.d/common/octavia-post-deploy.conf #bind_host = 192.168.24.10 Nevertheless, the Octavia endpoint URL in Keystone specifies an address which $ openstack endpoint show ee324ac4b1e848bb98c94a7819fe794e +--------------+----------------------------------+ | Field | Value | +--------------+----------------------------------+ | adminurl | http://172.17.1.16:9876 | | enabled | True | | id | ee324ac4b1e848bb98c94a7819fe794e | | internalurl | http://172.17.1.16:9876 | | publicurl | http://10.0.0.107:9876 | | region | regionOne | | service_id | 326d587373c947b082253beffded3cac | | service_name | octavia | | service_type | octavia | +--------------+----------------------------------+ This fails the scenario test since it cannot reach the Octavia API endpoint. Looking at other API services such as neutron-server, I noticed a haproxy process actually listens to the publicurl, I presume that's related to the high availability solution we have for multiple controllers. Perhaps we should do the same for octavia-api? [root@controller-0 /]# netstat -ntpl | grep 9696 tcp 0 0 172.17.1.12:9696 0.0.0.0:* LISTEN 688876/python2 tcp 0 0 172.17.1.16:9696 0.0.0.0:* LISTEN 138228/haproxy tcp 0 0 10.0.0.107:9696 0.0.0.0:* LISTEN 138228/haproxy DBDeadlock issue: ================= I'm still looking at this issue, which does not seem to reproduce in upstream gates. While upstream tests for stable/ocata pass[1], I get the following issue: (Pdb) lock_session.query(models.Quotas).filter_by(project_id=project_id).with_for_update().first() *** DBDeadlock: (pymysql.err.InternalError) (1205, u'Lock wait timeout exceeded; try restarting transaction') [SQL: u'SELECT quotas.project_id AS quotas_project_id, quotas.health_monitor AS quotas_health_monitor, quotas.listener AS quotas_listener, quotas.load_balancer AS quotas_load_balancer, quotas.member AS quotas_member, quotas.pool AS quotas_pool, quotas.in_use_health_monitor AS quotas_in_use_health_monitor, quotas.in_use_listener AS quotas_in_use_listener, quotas.in_use_load_balancer AS quotas_in_use_load_balancer, quotas.in_use_member AS quotas_in_use_member, quotas.in_use_pool AS quotas_in_use_pool \nFROM quotas \nWHERE quotas.project_id = %(project_id_1)s \n LIMIT %(param_1)s FOR UPDATE'] [parameters: {u'project_id_1': '33ee71cd44e44ca0b2f8d0189c78d307', u'param_1': 1}] The code in which this happens: https://github.com/openstack/octavia/blob/stable/ocata/octavia/db/repositories.py#L286-L287 [1] http://logs.openstack.org/38/464838/1/gate/gate-octavia-v1-dsvm-scenario-ubuntu-xenial/cc36f0a/logs/testr_results.html.gz Will update as soon as I learn more about this.
The DBDeadlock issue will be handled in bug 1454762
Moving this bug to be against tripleO, since the issue mentioned in Comment #1 blocks (among other things) this scenario tests. Re-posting the relevant part: The Octavia API service listens to an internal controller IP address: [root@controller-0 /]# grep bind_host /etc/octavia/conf.d/common/octavia-post-deploy.conf #bind_host = 192.168.24.10 Nevertheless, the Octavia endpoint URL in Keystone specifies an address which $ openstack endpoint show ee324ac4b1e848bb98c94a7819fe794e +--------------+----------------------------------+ | Field | Value | +--------------+----------------------------------+ | adminurl | http://172.17.1.16:9876 | | enabled | True | | id | ee324ac4b1e848bb98c94a7819fe794e | | internalurl | http://172.17.1.16:9876 | | publicurl | http://10.0.0.107:9876 | | region | regionOne | | service_id | 326d587373c947b082253beffded3cac | | service_name | octavia | | service_type | octavia | +--------------+----------------------------------+ This fails the scenario test since it cannot reach the Octavia API endpoint. Looking at other API services such as neutron-server, I noticed a haproxy process actually listens to the publicurl, I presume that's related to the high availability solution we have for multiple controllers. Perhaps we should do the same for octavia-api? [root@controller-0 /]# netstat -ntpl | grep 9696 tcp 0 0 172.17.1.12:9696 0.0.0.0:* LISTEN 688876/python2 tcp 0 0 172.17.1.16:9696 0.0.0.0:* LISTEN 138228/haproxy tcp 0 0 10.0.0.107:9696 0.0.0.0:* LISTEN 138228/haproxy
I misread this bug, all that needs to be done here is remove the step from the post deployment.
Spoke too soon - realized the haproxy endpoint was also missing. Patch posted.
https://review.openstack.org/#/c/490082/ got merged to upstream master (queens)
For verification: ================= 0. Inspect the running config of an haproxy service that is supposed to load balance between 3 Octavia API services. Thus, 3 "server" entries that mean 3 actual controllers. 1. tail the logs of 3 Octavia API services. 2. query the API service (you can just list the loadbalancers or any other action), repeat and see traffic is being load balanced between the controllers. 3. kill one of the API services and repeat step #2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086