1451829 – Missing HAProxy process that listens to publicurl and redirect traffic to Octavia-API

Bug 1451829 - Missing HAProxy process that listens to publicurl and redirect traffic to Octavia-API

Summary: Missing HAProxy process that listens to publicurl and redirect traffic to Oct...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	puppet-tripleo
Sub Component:
Version:	11.0 (Ocata)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	beta
Target Release:	13.0 (Queens)
Assignee:	Brent Eagles
QA Contact:	Alexander Stafeyev
Docs Contact:
URL:
Whiteboard:
Depends On:	1451777 1454762
Blocks:	1433523
TreeView+	depends on / blocked

Reported:	2017-05-17 15:15 UTC by Alexander Stafeyev
Modified:	2019-09-10 14:10 UTC (History)
CC List:	16 users (show)
Fixed In Version:	puppet-tripleo-8.3.0-0.20180228184228.b3d0b2f.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-06-27 13:31:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
longer log (22.29 KB, text/plain) 2017-05-17 15:15 UTC, Alexander Stafeyev	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1728589	None	None	None	2017-10-30 13:13:43 UTC
OpenStack gerrit	490082	None	master: MERGED	puppet-tripleo: Add Octavia API endpoint to haproxy (I978b83fa5f3900d2f09c2affc59e90e150a42892)	2018-02-28 13:29:51 UTC
Red Hat Product Errata	RHEA-2018:2086	None	None	None	2018-06-27 13:33:17 UTC

Description Alexander Stafeyev 2017-05-17 15:15:45 UTC

Created attachment 1279746 [details]
longer log

octavia.tests.tempest.v1.scenario.test_load_balancer_tree_minimal.TestLoadBalancerTreeMinimal.test_load_balancer_tree_minimal test fails : 


Traceback (most recent call last):
  File "tempest/test.py", line 96, in wrapper
    return f(self, *func_args, **func_kwargs)
  File "/home/centos/tempest-upstream/octavia/octavia/tests/tempest/v1/scenario/test_load_balancer_tree_minimal.py", line 45, in test_load_balancer_tree_minimal
    self._create_load_balancer_tree(cleanup=False)
  File "/home/centos/tempest-upstream/octavia/octavia/tests/tempest/v1/scenario/base.py", line 859, in _create_load_balancer_tree
    self._set_quotas(project_id=project_id)
  File "/home/centos/tempest-upstream/octavia/octavia/tests/tempest/v1/scenario/base.py", line 838, in _set_quotas
    return self.quotas_client.update_quotas(project_id, **body)
  File "/home/centos/tempest-upstream/octavia/octavia/tests/tempest/v1/clients/quotas_client.py", line 48, in update_quotas
    url = self._QUOTAS_URL.format(project_id=project_id)
  File "tempest/lib/common/rest_client.py", line 334, in put
    return self.request('PUT', url, extra_headers, headers, body, chunked)
  File "tempest/lib/common/rest_client.py", line 644, in request
    body=body, chunked=chunked)
  File "tempest/lib/common/rest_client.py", line 533, in _request
    method, url, headers, body, self.filters)
  File "tempest/lib/auth.py", line 188, in auth_request
    filters, method, url, headers, body)
  File "tempest/lib/auth.py", line 278, in _decorate_request
    base_url = self.base_url(filters=filters, auth_data=auth_data)
  File "tempest/lib/auth.py", line 570, in base_url
    endpoint_type, catalog))
tempest.lib.exceptions.EndpointNotFound: Endpoint not found
Details: No matching service found in the catalog.

Comment 1 Nir Magnezi 2017-05-22 13:00:37 UTC

I'm still looking into this, but here is what I learned so far:

Tests Branch:
=============
The above-mentioned test suite was invoked using the wrong branch. 
A stable/ocata Octavia should always be tested against stable/ocata Octavia scenarios (via Octavia tempest plugin), using the master (which is currently Pike) branch resulted the 'Endpoint not found' error due to the way Octavia is registered as an endpoint with Keystone. The "Service Type" was changed from 'Octavia' to 'load-balancer', see: https://review.openstack.org/#/c/450916/

Octavia API IP Address:
=======================
This should probably be filed against the Octavia deployment steps as a seperate bug (setting NEEDINFO on Brent to review this).
The Octavia API service listens to an internal controller IP address:

[root@controller-0 /]# grep bind_host /etc/octavia/conf.d/common/octavia-post-deploy.conf 
#bind_host = 192.168.24.10

Nevertheless, the Octavia endpoint URL in Keystone specifies an address which 
$ openstack endpoint show ee324ac4b1e848bb98c94a7819fe794e
+--------------+----------------------------------+
| Field        | Value                            |
+--------------+----------------------------------+
| adminurl     | http://172.17.1.16:9876          |
| enabled      | True                             |
| id           | ee324ac4b1e848bb98c94a7819fe794e |
| internalurl  | http://172.17.1.16:9876          |
| publicurl    | http://10.0.0.107:9876           |
| region       | regionOne                        |
| service_id   | 326d587373c947b082253beffded3cac |
| service_name | octavia                          |
| service_type | octavia                          |
+--------------+----------------------------------+

This fails the scenario test since it cannot reach the Octavia API endpoint.

Looking at other API services such as neutron-server, I noticed a haproxy process actually listens to the publicurl, I presume that's related to the high availability solution we have for multiple controllers.
Perhaps we should do the same for octavia-api?

[root@controller-0 /]# netstat -ntpl | grep 9696
tcp        0      0 172.17.1.12:9696        0.0.0.0:*               LISTEN      688876/python2      
tcp        0      0 172.17.1.16:9696        0.0.0.0:*               LISTEN      138228/haproxy      
tcp        0      0 10.0.0.107:9696         0.0.0.0:*               LISTEN      138228/haproxy    


DBDeadlock issue:
=================

I'm still looking at this issue, which does not seem to reproduce in upstream gates.
While upstream tests for stable/ocata pass[1], I get the following issue:

(Pdb) lock_session.query(models.Quotas).filter_by(project_id=project_id).with_for_update().first()
*** DBDeadlock: (pymysql.err.InternalError) (1205, u'Lock wait timeout exceeded; try restarting transaction') [SQL: u'SELECT quotas.project_id AS quotas_project_id, quotas.health_monitor AS quotas_health_monitor, quotas.listener AS quotas_listener, quotas.load_balancer AS quotas_load_balancer, quotas.member AS quotas_member, quotas.pool AS quotas_pool, quotas.in_use_health_monitor AS quotas_in_use_health_monitor, quotas.in_use_listener AS quotas_in_use_listener, quotas.in_use_load_balancer AS quotas_in_use_load_balancer, quotas.in_use_member AS quotas_in_use_member, quotas.in_use_pool AS quotas_in_use_pool \nFROM quotas \nWHERE quotas.project_id = %(project_id_1)s \n LIMIT %(param_1)s FOR UPDATE'] [parameters: {u'project_id_1': '33ee71cd44e44ca0b2f8d0189c78d307', u'param_1': 1}]


The code in which this happens: https://github.com/openstack/octavia/blob/stable/ocata/octavia/db/repositories.py#L286-L287


[1] http://logs.openstack.org/38/464838/1/gate/gate-octavia-v1-dsvm-scenario-ubuntu-xenial/cc36f0a/logs/testr_results.html.gz

Will update as soon as I learn more about this.

Comment 2 Nir Magnezi 2017-05-23 12:51:32 UTC

The DBDeadlock issue will be handled in bug 1454762

Comment 3 Nir Magnezi 2017-05-23 13:08:54 UTC

Moving this bug to be against tripleO, since the issue mentioned in Comment #1 blocks (among other things) this scenario tests.

Re-posting the relevant part:

The Octavia API service listens to an internal controller IP address:

[root@controller-0 /]# grep bind_host /etc/octavia/conf.d/common/octavia-post-deploy.conf 
#bind_host = 192.168.24.10

Nevertheless, the Octavia endpoint URL in Keystone specifies an address which 
$ openstack endpoint show ee324ac4b1e848bb98c94a7819fe794e
+--------------+----------------------------------+
| Field        | Value                            |
+--------------+----------------------------------+
| adminurl     | http://172.17.1.16:9876          |
| enabled      | True                             |
| id           | ee324ac4b1e848bb98c94a7819fe794e |
| internalurl  | http://172.17.1.16:9876          |
| publicurl    | http://10.0.0.107:9876           |
| region       | regionOne                        |
| service_id   | 326d587373c947b082253beffded3cac |
| service_name | octavia                          |
| service_type | octavia                          |
+--------------+----------------------------------+

This fails the scenario test since it cannot reach the Octavia API endpoint.

Looking at other API services such as neutron-server, I noticed a haproxy process actually listens to the publicurl, I presume that's related to the high availability solution we have for multiple controllers.
Perhaps we should do the same for octavia-api?

[root@controller-0 /]# netstat -ntpl | grep 9696
tcp        0      0 172.17.1.12:9696        0.0.0.0:*               LISTEN      688876/python2      
tcp        0      0 172.17.1.16:9696        0.0.0.0:*               LISTEN      138228/haproxy      
tcp        0      0 10.0.0.107:9696         0.0.0.0:*               LISTEN      138228/haproxy

Comment 4 Brent Eagles 2017-08-02 16:08:18 UTC

I misread this bug, all that needs to be done here is remove the step from the post deployment.

Comment 5 Brent Eagles 2017-08-02 16:47:47 UTC

Spoke too soon - realized the haproxy endpoint was also missing. Patch posted.

Comment 10 Nir Magnezi 2017-12-06 13:43:45 UTC

https://review.openstack.org/#/c/490082/ got merged to upstream master (queens)

Comment 18 Nir Magnezi 2018-04-01 13:31:43 UTC

For verification:
=================
0. Inspect the running config of an haproxy service that is supposed to load balance between 3 Octavia API services. Thus, 3 "server" entries that mean 3 actual controllers. 
1. tail the logs of 3 Octavia API services.
2. query the API service (you can just list the loadbalancers or any other action), repeat and see traffic is being load balanced between the controllers.
3. kill one of the API services and repeat step #2

Comment 21 errata-xmlrpc 2018-06-27 13:31:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086

Note You need to log in before you can comment on or make changes to this bug.