Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1825171

Summary: Octavia Service Unavailable (HTTP 503) after deployment
Product: Red Hat OpenStack Reporter: rohit londhe <rlondhe>
Component: openstack-octaviaAssignee: Carlos Goncalves <cgoncalves>
Status: CLOSED DUPLICATE QA Contact: Bruna Bonguardo <bbonguar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16.0 (Train)CC: cgoncalves, ihrachys, lpeer, majopela, scohen
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-06 14:29:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description rohit londhe 2020-04-17 10:01:51 UTC
Description of problem:

After deploying Octivia following https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html-single/networking_guide/index#sec-octavia and using the default template /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml, and letting director generate certificates, we are not able to list load balancers

~~~
(overcloud) [stack@undercloud templates]$ openstack loadbalancer list
Service Unavailable (HTTP 503)
~~~

A network lb-mgmt-net has been created and its subnet is created during deployed
Image octavia-amphora-16.0-20200226.1.x86_64 is present in openstack image list

~~~
octavia_wsgi_error_ssl.log on controller tells [Thu Apr 09 00:01:18.485293 2020] [wsgi:error] [pid 37] (11)Resource temporarily unavailable: [client 192.168.xx.xx:57916] mod_wsgi (pid=37): Unable to connect to WSGI daemon process 'octavia' on '/var/run/wsgi.7.0.1.sock' after multiple attempts as listener backlog limit was exceeded or the socket does not exist.
~~~

~~~
octavia.log tellls : "Could not retrieve schema from tcp:192.168.xx.xx:6641: Unknown error" 
~~~


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
Deploy octavia as per the templates attached

Actual results:

Service Unavailable (HTTP 503) for octavia


Expected results:

octavia commands should work.

Additional info:

In the initial phase of deployment  we went through this error :

TASK [Write octavia inventory] *************************************************
Monday 06 April 2020  21:20:03 +0200 (0:00:00.678)       0:47:29.821 **********
fatal: [undercloud]: FAILED! => {"changed": false, "checksum": "d951bca6d7b63f908db46a20ed6da7d52d9c9384", "msg": "Destination /var/lib/mistral/overcloud/octavia-ansible not writable"}

Applied this to change the ownership of the dirctory : https://bugs.launchpad.net/tripleo/+bug/1847608

Then the deploys looks fine at least from an ansible point of view : 

2020-04-06 22:10:46,805 p=440802 u=root |  PLAY RECAP *********************************************************************
2020-04-06 22:10:46,805 p=440802 u=root |  ctl-01                     : ok=49   changed=22   unreachable=0    failed=0    skipped=3    rescued=0    ignored=0
2020-04-06 22:10:46,805 p=440802 u=root |  ctl-02                     : ok=36   changed=14   unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
2020-04-06 22:10:46,805 p=440802 u=root |  ctl-03                     : ok=36   changed=14   unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
2020-04-06 22:10:46,806 p=440802 u=root |  und01                      : ok=21   changed=9    unreachable=0    failed=0    skipped=10   rescued=0    ignored=1
2020-04-06 22:10:46,806 p=440802 u=root |  Monday 06 April 2020  22:10:46 +0200 (0:00:00.719)       0:03:17.460 **********
2020-04-06 22:10:46,806 p=440802 u=root |  ===============================================================================
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-undercloud : upload image to glance ---------------------------- 41.13s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-undercloud : convert image from qcow2 to raw ------------------- 10.65s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-overcloud-config : create subnet -------------------------------- 8.35s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-overcloud-config : create management network for load balancers --- 8.03s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-undercloud : upload pub key to overcloud ------------------------ 7.52s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-overcloud-config : create security group for health manager ----- 7.30s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-overcloud-config : create security group (get the security group id) --- 7.11s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-controller-config : create management port ---------------------- 6.49s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-overcloud-config : create security group rule to open amphora management API port --- 5.75s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-overcloud-config : create security group rule to open amphora management ssh port --- 5.38s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-controller-config : Bring the management port interface up ------ 5.22s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-overcloud-config : create security group rule for health manager --- 5.17s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-overcloud-config : increase quotas for project used for amphora --- 4.43s
2020-04-06 22:10:46,807 p=440802 u=root |  octavia-overcloud-config : getting management network ID ---------------- 4.05s
2020-04-06 22:10:46,808 p=440802 u=root |  Gathering Facts --------------------------------------------------------- 3.83s
2020-04-06 22:10:46,808 p=440802 u=root |  octavia-undercloud : gather facts about the service project ------------- 3.77s
2020-04-06 22:10:46,808 p=440802 u=root |  octavia-undercloud : check if amphora image file exists ----------------- 3.62s 
2020-04-06 22:10:46,808 p=440802 u=root |  octavia-controller-config : getting management port --------------------- 3.53s
2020-04-06 22:10:46,808 p=440802 u=root |  octavia-undercloud : check there's an image in glance already ----------- 3.51s
2020-04-06 22:10:46,808 p=440802 u=root |  octavia-controller-config : gather facts about the service project ------ 3.31s

This looks similar to BZ https://bugzilla.redhat.com/show_bug.cgi?id=1755683, but in this case octavia_api looks good.

Comment 3 Carlos Goncalves 2020-04-17 10:07:27 UTC
I believe this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1797670. The fix is scheduled to be available in 16.0.2.
As a temporary workaround, please remove the OVN provider driver from the enabled_provider_drivers list in octavia.conf and restart the octavia-api container in all controllers.

Comment 4 rohit londhe 2020-04-17 10:12:19 UTC
Hey Carlos, 

You are fast enough :) appreciated!

Sure, I'll check the fix and workaround.

Comment 5 rohit londhe 2020-04-21 14:13:36 UTC
Hello,

Disabled OVN provider driver from the enabled_provider_drivers list in octavia.conf and restart the octavia-api container in all controllers. 
Then the API does not throw any error when executing openstack loadbalancer list. 
But, the following error occurs when i try to create a load balancer :

2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server [-] Exception during message handling: octavia.common.exceptions.CertificateGenerationException: Could not sign the certificate request: Failed to load CA Certificate /etc/octavia/certs/ca_01.pem.
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/octavia/certificates/generator/local.py", line 49, in _validate_cert
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     ca_cert = open(CONF.certificates.ca_certificate, 'rb').read()
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server FileNotFoundError: [Errno 2] No such file or directory: '/etc/octavia/certs/ca_01.pem'
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/octavia/controller/queue/v1/endpoints.py", line 45, in create_load_balancer
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     self.worker.create_load_balancer(load_balancer_id, flavor)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 292, in wrapped_f
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     return self.call(f, *args, **kw)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 358, in call
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     do = self.iter(retry_state=retry_state)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 319, in iter
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     return fut.result()
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     return self.__get_result()
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     raise self._exception
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 361, in call
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     result = fn(*args, **kwargs)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/octavia/controller/worker/v1/controller_worker.py", line 344, in create_load_balancer
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     create_lb_tf.run()
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/taskflow/engines/action_engine/engine.py", line 247, in run
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     for _state in self.run_iter(timeout=timeout):
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/taskflow/engines/action_engine/engine.py", line 340, in run_iter
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     failure.Failure.reraise_if_any(er_failures)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/taskflow/types/failure.py", line 339, in reraise_if_any
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     failures[0].reraise()
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/taskflow/types/failure.py", line 346, in reraise
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     six.reraise(*self._exc_info)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     raise value
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     result = task.execute(**arguments)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/octavia/controller/worker/v1/tasks/cert_task.py", line 47, in execute
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     validity=CONF.certificates.cert_validity_time)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/octavia/certificates/generator/local.py", line 234, in generate_cert_key_pair
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     cert = cls.sign_cert(csr, validity, **kwargs)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/octavia/certificates/generator/local.py", line 91, in sign_cert
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     cls._validate_cert(ca_cert, ca_key, ca_key_pass)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/octavia/certificates/generator/local.py", line 53, in _validate_cert
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server     .format(CONF.certificates.ca_certificate)
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server octavia.common.exceptions.CertificateGenerationException: Could not sign the certificate request: Failed to load CA Certificate /etc/octavia/certs/ca_01.pem.
2020-04-20 23:17:55.045 23 ERROR oslo_messaging.rpc.server


The file "/etc/octavia/certs/ca_01.pem" does not exist on the 3 controllers. The file /var/lib/config-data/puppet-generated/octavia/etc/octavia/certs/ca_01.pem does not exist either.

Then applied https://access.redhat.com/solutions/4909471 and then modified again all octavia.conf and restarted octavia-api : same result.

Deleted the overcloud and cleaned the node, the redeploy, the Ocatavia directory error is back :
TASK [Write octavia inventory] *************************************************
Monday 20 April 2020  21:27:25 +0200 (0:00:00.757)       0:41:59.657 **********
fatal: [undercloud]: FAILED! => {"changed": false, "checksum": "f95c84fe009e8462fc8fde4e3faae97b012e839c", "msg": "Destination /var/lib/mistral/overcloud/octavia-ansible not writable"}

Reapplied https://bugs.launchpad.net/tripleo/+bug/1847608, deployed again & applied the fix on ocatavia.conf. Still fails with the non existing ca01.pem. As a workaround, i could force this certificate with OctaviaCaCert (and others) but i cannot find the procedure to generate those certificates.

Comment 6 Carlos Goncalves 2020-04-23 07:02:22 UTC
OK, I think I understand why it is failing for you. Let me go over step by step to help explain my thought.

(In reply to rohit londhe from comment #5)
> Disabled OVN provider driver from the enabled_provider_drivers list in
> octavia.conf and restart the octavia-api container in all controllers. 
> Then the API does not throw any error when executing openstack loadbalancer
> list. 
[...]
> The file "/etc/octavia/certs/ca_01.pem" does not exist on the 3 controllers.
> The file
> /var/lib/config-data/puppet-generated/octavia/etc/octavia/certs/ca_01.pem
> does not exist either.

Right. The certificate files were already missing prior to disabling the OVN provider driver so disabling the provider driver won't change anything in that regard.
 
> Then applied https://access.redhat.com/solutions/4909471 and then modified
> again all octavia.conf and restarted octavia-api : same result.

Just applying the workaround in file octavia-deployment-config.yaml is not enough. You need to run an overcloud update.

> 
> Deleted the overcloud and cleaned the node, the redeploy, the Ocatavia
> directory error is back :

[...]

Deleting the overcloud was the reason why the certificate files were not generated. In the step before you changed TripleO to generate the certificates on UPDATE but since you've deleted the overcloud and deployed a new one it is a CREATE stack action.

> TASK [Write octavia inventory]
> *************************************************
> Monday 20 April 2020  21:27:25 +0200 (0:00:00.757)       0:41:59.657
> **********
> fatal: [undercloud]: FAILED! => {"changed": false, "checksum":
> "f95c84fe009e8462fc8fde4e3faae97b012e839c", "msg": "Destination
> /var/lib/mistral/overcloud/octavia-ansible not writable"}

This will be fixed in 16.0.2. See https://bugzilla.redhat.com/show_bug.cgi?id=1824068.

 
> Reapplied https://bugs.launchpad.net/tripleo/+bug/1847608, deployed again &
> applied the fix on ocatavia.conf. Still fails with the non existing
> ca01.pem. As a workaround, i could force this certificate with OctaviaCaCert
> (and others) but i cannot find the procedure to generate those certificates.

I'm guessing you deleted the overcloud and created a new one again, yes? If so, in doing it the stack action is of value CREATE than an UPDATE so contradicts the workaround in https://access.redhat.com/solutions/4909471.

Comment 7 rohit londhe 2020-05-05 01:26:00 UTC
Hello,

Thanks for your suggestions.

It worked after the following operations :
- Delete overcloud
- Reset the stack action is of value 'CREATE'
- Redeploy (without any permission error so the the first deploy is successful - if an error occurs delete again & fix the error & deploy again)
- Modified all octavia.conf and restarted octavia-api to disable OVN provider driver from the enabled_provider_drivers list(as in https://bugzilla.redhat.com/show_bug.cgi?id=1825171 )
- Loadbalancer commands are now OK and certificates are deployed in octavia directory. Reproduced these steps successfully on 2 platforms which had the same problem, Octavia is finally working.

Looks like https://access.redhat.com/solutions/4909471 is not working if the first deploy fails (due to the right permission for example which was the initial problem).

Comment 8 Carlos Goncalves 2020-05-06 14:29:22 UTC

*** This bug has been marked as a duplicate of bug 1797670 ***