Bug 1451829

Summary: Missing HAProxy process that listens to publicurl and redirect traffic to Octavia-API
Product: Red Hat OpenStack Reporter: Alexander Stafeyev <astafeye>
Component: puppet-tripleoAssignee: Brent Eagles <beagles>
Status: CLOSED ERRATA QA Contact: Alexander Stafeyev <astafeye>
Severity: high Docs Contact:
Priority: high    
Version: 11.0 (Ocata)CC: amuller, beagles, bperkins, dbecker, ihrachys, jjoyce, jlibosva, jschluet, lpeer, majopela, mburns, morazi, nyechiel, rhel-osp-director-maint, slinaber, tvignaud
Target Milestone: betaKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-8.3.0-0.20180228184228.b3d0b2f.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:31:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1451777, 1454762    
Bug Blocks: 1433523    
Attachments:
Description Flags
longer log none

Description Alexander Stafeyev 2017-05-17 15:15:45 UTC
Created attachment 1279746 [details]
longer log

octavia.tests.tempest.v1.scenario.test_load_balancer_tree_minimal.TestLoadBalancerTreeMinimal.test_load_balancer_tree_minimal test fails : 


Traceback (most recent call last):
  File "tempest/test.py", line 96, in wrapper
    return f(self, *func_args, **func_kwargs)
  File "/home/centos/tempest-upstream/octavia/octavia/tests/tempest/v1/scenario/test_load_balancer_tree_minimal.py", line 45, in test_load_balancer_tree_minimal
    self._create_load_balancer_tree(cleanup=False)
  File "/home/centos/tempest-upstream/octavia/octavia/tests/tempest/v1/scenario/base.py", line 859, in _create_load_balancer_tree
    self._set_quotas(project_id=project_id)
  File "/home/centos/tempest-upstream/octavia/octavia/tests/tempest/v1/scenario/base.py", line 838, in _set_quotas
    return self.quotas_client.update_quotas(project_id, **body)
  File "/home/centos/tempest-upstream/octavia/octavia/tests/tempest/v1/clients/quotas_client.py", line 48, in update_quotas
    url = self._QUOTAS_URL.format(project_id=project_id)
  File "tempest/lib/common/rest_client.py", line 334, in put
    return self.request('PUT', url, extra_headers, headers, body, chunked)
  File "tempest/lib/common/rest_client.py", line 644, in request
    body=body, chunked=chunked)
  File "tempest/lib/common/rest_client.py", line 533, in _request
    method, url, headers, body, self.filters)
  File "tempest/lib/auth.py", line 188, in auth_request
    filters, method, url, headers, body)
  File "tempest/lib/auth.py", line 278, in _decorate_request
    base_url = self.base_url(filters=filters, auth_data=auth_data)
  File "tempest/lib/auth.py", line 570, in base_url
    endpoint_type, catalog))
tempest.lib.exceptions.EndpointNotFound: Endpoint not found
Details: No matching service found in the catalog.

Comment 1 Nir Magnezi 2017-05-22 13:00:37 UTC
I'm still looking into this, but here is what I learned so far:

Tests Branch:
=============
The above-mentioned test suite was invoked using the wrong branch. 
A stable/ocata Octavia should always be tested against stable/ocata Octavia scenarios (via Octavia tempest plugin), using the master (which is currently Pike) branch resulted the 'Endpoint not found' error due to the way Octavia is registered as an endpoint with Keystone. The "Service Type" was changed from 'Octavia' to 'load-balancer', see: https://review.openstack.org/#/c/450916/

Octavia API IP Address:
=======================
This should probably be filed against the Octavia deployment steps as a seperate bug (setting NEEDINFO on Brent to review this).
The Octavia API service listens to an internal controller IP address:

[root@controller-0 /]# grep bind_host /etc/octavia/conf.d/common/octavia-post-deploy.conf 
#bind_host = 192.168.24.10

Nevertheless, the Octavia endpoint URL in Keystone specifies an address which 
$ openstack endpoint show ee324ac4b1e848bb98c94a7819fe794e
+--------------+----------------------------------+
| Field        | Value                            |
+--------------+----------------------------------+
| adminurl     | http://172.17.1.16:9876          |
| enabled      | True                             |
| id           | ee324ac4b1e848bb98c94a7819fe794e |
| internalurl  | http://172.17.1.16:9876          |
| publicurl    | http://10.0.0.107:9876           |
| region       | regionOne                        |
| service_id   | 326d587373c947b082253beffded3cac |
| service_name | octavia                          |
| service_type | octavia                          |
+--------------+----------------------------------+

This fails the scenario test since it cannot reach the Octavia API endpoint.

Looking at other API services such as neutron-server, I noticed a haproxy process actually listens to the publicurl, I presume that's related to the high availability solution we have for multiple controllers.
Perhaps we should do the same for octavia-api?

[root@controller-0 /]# netstat -ntpl | grep 9696
tcp        0      0 172.17.1.12:9696        0.0.0.0:*               LISTEN      688876/python2      
tcp        0      0 172.17.1.16:9696        0.0.0.0:*               LISTEN      138228/haproxy      
tcp        0      0 10.0.0.107:9696         0.0.0.0:*               LISTEN      138228/haproxy    


DBDeadlock issue:
=================

I'm still looking at this issue, which does not seem to reproduce in upstream gates.
While upstream tests for stable/ocata pass[1], I get the following issue:

(Pdb) lock_session.query(models.Quotas).filter_by(project_id=project_id).with_for_update().first()
*** DBDeadlock: (pymysql.err.InternalError) (1205, u'Lock wait timeout exceeded; try restarting transaction') [SQL: u'SELECT quotas.project_id AS quotas_project_id, quotas.health_monitor AS quotas_health_monitor, quotas.listener AS quotas_listener, quotas.load_balancer AS quotas_load_balancer, quotas.member AS quotas_member, quotas.pool AS quotas_pool, quotas.in_use_health_monitor AS quotas_in_use_health_monitor, quotas.in_use_listener AS quotas_in_use_listener, quotas.in_use_load_balancer AS quotas_in_use_load_balancer, quotas.in_use_member AS quotas_in_use_member, quotas.in_use_pool AS quotas_in_use_pool \nFROM quotas \nWHERE quotas.project_id = %(project_id_1)s \n LIMIT %(param_1)s FOR UPDATE'] [parameters: {u'project_id_1': '33ee71cd44e44ca0b2f8d0189c78d307', u'param_1': 1}]


The code in which this happens: https://github.com/openstack/octavia/blob/stable/ocata/octavia/db/repositories.py#L286-L287


[1] http://logs.openstack.org/38/464838/1/gate/gate-octavia-v1-dsvm-scenario-ubuntu-xenial/cc36f0a/logs/testr_results.html.gz

Will update as soon as I learn more about this.

Comment 2 Nir Magnezi 2017-05-23 12:51:32 UTC
The DBDeadlock issue will be handled in bug 1454762

Comment 3 Nir Magnezi 2017-05-23 13:08:54 UTC
Moving this bug to be against tripleO, since the issue mentioned in Comment #1 blocks (among other things) this scenario tests.

Re-posting the relevant part:

The Octavia API service listens to an internal controller IP address:

[root@controller-0 /]# grep bind_host /etc/octavia/conf.d/common/octavia-post-deploy.conf 
#bind_host = 192.168.24.10

Nevertheless, the Octavia endpoint URL in Keystone specifies an address which 
$ openstack endpoint show ee324ac4b1e848bb98c94a7819fe794e
+--------------+----------------------------------+
| Field        | Value                            |
+--------------+----------------------------------+
| adminurl     | http://172.17.1.16:9876          |
| enabled      | True                             |
| id           | ee324ac4b1e848bb98c94a7819fe794e |
| internalurl  | http://172.17.1.16:9876          |
| publicurl    | http://10.0.0.107:9876           |
| region       | regionOne                        |
| service_id   | 326d587373c947b082253beffded3cac |
| service_name | octavia                          |
| service_type | octavia                          |
+--------------+----------------------------------+

This fails the scenario test since it cannot reach the Octavia API endpoint.

Looking at other API services such as neutron-server, I noticed a haproxy process actually listens to the publicurl, I presume that's related to the high availability solution we have for multiple controllers.
Perhaps we should do the same for octavia-api?

[root@controller-0 /]# netstat -ntpl | grep 9696
tcp        0      0 172.17.1.12:9696        0.0.0.0:*               LISTEN      688876/python2      
tcp        0      0 172.17.1.16:9696        0.0.0.0:*               LISTEN      138228/haproxy      
tcp        0      0 10.0.0.107:9696         0.0.0.0:*               LISTEN      138228/haproxy

Comment 4 Brent Eagles 2017-08-02 16:08:18 UTC
I misread this bug, all that needs to be done here is remove the step from the post deployment.

Comment 5 Brent Eagles 2017-08-02 16:47:47 UTC
Spoke too soon - realized the haproxy endpoint was also missing. Patch posted.

Comment 10 Nir Magnezi 2017-12-06 13:43:45 UTC
https://review.openstack.org/#/c/490082/ got merged to upstream master (queens)

Comment 18 Nir Magnezi 2018-04-01 13:31:43 UTC
For verification:
=================
0. Inspect the running config of an haproxy service that is supposed to load balance between 3 Octavia API services. Thus, 3 "server" entries that mean 3 actual controllers. 
1. tail the logs of 3 Octavia API services.
2. query the API service (you can just list the loadbalancers or any other action), repeat and see traffic is being load balanced between the controllers.
3. kill one of the API services and repeat step #2

Comment 21 errata-xmlrpc 2018-06-27 13:31:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086