Bug 1645536 - Octavia controllers components fails to update loadbalancers after every deploy
Summary: Octavia controllers components fails to update loadbalancers after every deploy
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: z11
: 13.0 (Queens)
Assignee: Brent Eagles
QA Contact: Bruna Bonguardo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-02 13:44 UTC by Patrik Martinsson
Modified: 2023-09-07 19:33 UTC (History)
20 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.4.1-6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-04 11:52:34 UTC
Target Upstream Version:
Embargoed:
pveiga: needinfo-
pveiga: needinfo-
pveiga: needinfo-
pveiga: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 672529 0 'None' MERGED Allow distribution of non-autogenerated certs 2021-02-15 08:29:18 UTC
OpenStack gerrit 682659 0 'None' MERGED Allow distribution of non-autogenerated certs 2021-02-15 08:29:18 UTC
Red Hat Issue Tracker OSP-28201 0 None None None 2023-09-07 19:33:34 UTC

Description Patrik Martinsson 2018-11-02 13:44:01 UTC
Description of problem:

If you have 'OctaviaGenerateCerts: true' in your octavia-template.yaml (or whatever name you have for your template containing octavia config) and use the 'openstack deploy-command' to update your osp-cluster, you will 're-generate' the 'CA' (from which the amphoras get's their certificates from) *on* every deploy. 

You will not be able to alter loadbalancers (amp's) that you configured in during your previous deploy (since they use certificates issued by the CA that was present in the previous deploy to communicate).

You will see the following lines in the octavia-logs, 

2018-11-02 13:14:55.402 23 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [req-f1d499b2-ac22-47da-87b4-9804eaecf77e - 0516798f465c404f822a69ecc2ec5a11 - - -] Could not connect to instance. Retrying.: SSLError: ("bad handshake: Error([('rsa routines', 'RSA_padding_check_PKCS1_type_1', 'block type is not 01'), ('rsa routines', 'RSA_EAY_PUBLIC_DECRYPT', 'padding check failed'), ('asn1 encoding routines', 'ASN1_item_verify', 'EVP lib'), ('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",)

Which again, makes perfect sense, since octavia can't verify the certificate present in the amp. 

I guess the "solution", if you will, is to actually use your own CA/certificates - then you would have the same CA on every deploy.
However, the 'OctaviaGenerateCerts' is there for and should be usable - and if current methodology is the intended (which I can't imagine, since it makes no sense at all) - it needs to be documented as such at https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_guide/sec-octavia or appropriate place. 

Assuming I'm not misunderstanding anything here this implementation is *severely* broken. 


Version-Release number of selected component (if applicable):

OSP 13 


How reproducible:

Always reproducible when using the 'OctaviaGenerateCerts: true' in your templates. 


Steps to Reproduce:

1. Use the 'openstack deploy-command' from the director (with appropriate templates to configure octavia) to deploy a configuration to your OSP 13 installation and include the option 'OctaviaGenerateCerts: true'.

We use the following command to deploy,

   $> openstack overcloud ${g_operation} --templates     \
     -e ${g_tripleo}/network-isolation.yaml              \
     -e ${g_tripleo}/storage/external-ceph.yaml          \
     -e ${g_tripleo}/ssl/enable-tls.yaml                 \
     -e ${g_tripleo}/ssl/inject-trust-anchor-hiera.yaml  \
     -e ${g_tripleo}/services-docker/octavia.yaml        \
     -e ${g_tripleo}/services-docker/barbican.yaml       \
     -e ${g_tripleo}/barbican-backend-simple-crypto.yaml \
     -e ${g_custom}/controller-config.yaml               \
     -e ${g_custom}/network-environment.yaml             \
     -e ${g_custom}/overcloud_images.yaml                \
     -e ${g_custom}/tls-certificates.yaml                \
     -e ${g_custom}/tls-endpoints-public-dns.yaml        \
     -e ${g_custom}/config.yaml                          \
     --timeout 90                                        \
     --validation-errors-nonfatal                        \
     --validation-warnings-fatal

2. When the stack update is complete, log in to one of the controllers, and verify that octavia-* components are up and running (docker ps | grep octavia). 



3. Check and note the serial on the ca, 

   $ > openssl x509 -text -noout -in /var/lib/config-data/puppet-generated/octavia/etc/octavia/certs/ca_01.pem | grep Serial -A1



4. Create loadbalancer, add a listener, pool, members and health-checks. And verify that it works as intended. 



5. Log in to one the amp's created by octavia and check the serial of the client_ca, 

   $ > ssh amp-ip 
   $ > sudo openssl x509 -text -noout -in /etc/octavia/certs/client_ca.pem 

   Note that the serial is the same as the one currently present on the controllers.   



6. Do a deploy from the director with the same commands and templates used in step 1.



7. When the stack update is complete, log in to one of the controllers, and verify that octavia-* components are up and running (docker ps | grep octavia). 



8. Check the serial on the ca, *NOTE* that this serial is different now from the previous one. 

   $ > openssl x509 -text -noout -in /var/lib/config-data/puppet-generated/octavia/etc/octavia/certs/ca_01.pem | grep Serial -A1



9. Try to make a change to your loadbalancer, like removing a member for example. 



10. Check the octavia logs from one of the controllers and notice how it will tell you that it can't communicate with the amp (since it doesn't recognize the issuer) in question to update the listener (or whatever operation you where doing), 

2018-11-02 13:14:55.402 23 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [req-f1d499b2-ac22-47da-87b4-9804eaecf77e - 0516798f465c404f822a69ecc2ec5a11 - - -] Could not connect to instance. Retrying.: SSLError: ("bad handshake: Error([('rsa routines', 'RSA_padding_check_PKCS1_type_1', 'block type is not 01'), ('rsa routines', 'RSA_EAY_PUBLIC_DECRYPT', 'padding check failed'), ('asn1 encoding routines', 'ASN1_item_verify', 'EVP lib'), ('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",)


Actual results:


- Updating loadbalancers created in a 'previous deploy' won't work because the CA is regenerated everytime on the controllers. When using 'OctaviaGenerateCerts: true'.


Expected results:

- Updating loadbalancer created in previous deploys (and future deploys), should work. When using 'OctaviaGenerateCerts: true'.


Additional info:

The certificate issued to the amps is currently only valid for a year, which would mean that you would end up in this situation after a year (if you didn't do a deploy). 
I think one year is way to short in this case.


Best regards, 
Patrik Martinsson 
Sweden

Comment 1 Patrik Martinsson 2018-11-06 09:54:20 UTC
Hi again, 

>> The certificate issued to the amps is currently only valid for a year, which would mean that you would end up in this situation after a year (if you didn't do a deploy). 
>> I think one year is way to short in this case.

Never mind this, I just realized through https://docs.openstack.org/octavia/queens/admin/guides/operator-maintenance.html that octavia rotate's the amp certificates automatically.

The original issue remains though, CA seems to be updated on every 'openstack deploy' which makes amp operations on previous deploy fail.

// Patrik

Comment 6 Gregory Thiemonge 2019-07-18 15:48:21 UTC
For production environments, we recommend users to provide they own certificates (see https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_guide/sec-octavia#configuring_octavia_certificates_and_keys), so they can choose the expiration date.

If OctaviaGenerateCerts is true (or if the user wants to update its own certificates), he should perform a failover of the existing load balancers after re-deploying.


1. if listener's provisioning status is ACTIVE:

user may call "openstack loadbalancer failover <loadbalancer_id>", the load balancer will be updated with new certificates.


2. if provisioning status is PENDING_UPDATE:

It means the Octavia agent is trying to update a listener, but cannot reach it since the amphora certificates haven't been updated yet.
In this case, the status is updated to ERROR after a timeout (default value is 25 min), then the user may perform the failover with "openstack loadbalancer failover <loadbalancer_id>".
After the failover the provisioning status may be stuck into ERROR, this is an known issue that can be work around by updating the listener with a dummy configuration value such as "openstack loadbalancer listener set --connection-limit -1 <listener_id>" (it sets the connection-list of the listener to its default value), then the status is set to ACTIVE.


Note that while the provisioning status is ERROR or PENDING_UPDATE, the load balancer is fully functional, but during the failover, load balancer may experience a short outage.

Comment 7 PURANDHAR SAIRAM MANNIDI 2019-07-23 23:44:26 UTC
@Greg , @carlos even with default config, the above steps are like workaround but not a permanent fix. 

Can we update Octavia Post deploy playbooks to check for existing certificates and its validity before regenerating new one's? 

Also, can we increase the CA validity period or make it configurable?

Comment 8 Gregory Thiemonge 2019-07-25 13:26:06 UTC
The team discussed this issue yesterday and we now think that tripleo would handle the certificate expiration instead of recommending customers to provide their own certs.

We're currently checking with the Security Team if we could set the validity of certificate to a long period (>= 10 years), and this value could be configurable.

And there is some work in tripleo that checks for existing certificates and prevents creating them at each update/re-deploy (https://review.opendev.org/#/c/672529/).

Comment 14 Brent Eagles 2019-08-27 19:32:14 UTC
As a sidenote, there is an easy workaround: provide an environment file at the end of the command line with OctaviaGenerateCerts set to false when redeploying. e.g.:

create a file named disable_cert_generation.yaml with the contents:

parameter_defaults:
   OctaviaGenerateCerts: false

And redeploy with it as the last environment file, ensuring that it overrides any other values

overcloud deploy --tempates << other command line options >> -e disable_cert_generation.yaml

Comment 25 Lon Hohberger 2019-11-08 11:47:43 UTC
According to our records, this should be resolved by openstack-tripleo-heat-templates-8.4.1-13.el7ost.  This build is available now.

Comment 28 Bruna Bonguardo 2020-02-03 16:39:22 UTC
#Heat version:

(undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep trip | grep heat
openstack-tripleo-heat-templates-8.4.1-16.el7ost.noarch


#Overcloud deployment file:

(undercloud) [stack@undercloud-0 ~]$  cat overcloud_deploy.sh 
#!/bin/bash

openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
  --environment-file /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml \
  --environment-file /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock1.rdu2.redhat.com \
-e /home/stack/virt/config_lvm.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /home/stack/virt/docker-images.yaml \
--log-file overcloud_deployment_84.log


#Octavia.yaml file:
(undercloud) [stack@undercloud-0 ~]$ cat /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml
resource_registry:
  OS::TripleO::Services::OctaviaApi: ../../docker/services/octavia-api.yaml
  OS::TripleO::Services::OctaviaHousekeeping: ../../docker/services/octavia-housekeeping.yaml
  OS::TripleO::Services::OctaviaHealthManager: ../../docker/services/octavia-health-manager.yaml
  OS::TripleO::Services::OctaviaWorker: ../../docker/services/octavia-worker.yaml
  OS::TripleO::Services::OctaviaDeploymentConfig: ../../docker/services/octavia/octavia-deployment-config.yaml

parameter_defaults:
    NeutronEnableForceMetadata: true

    # This flag enables internal generation of certificates for communication
    # with amphorae. Use OctaviaCaCert, OctaviaCaKey, OctaviaCaKeyPassphrase,
    # OctaviaClient and OctaviaServerCertsKeyPassphrase cert to configure
    # secure production environments.
    OctaviaGenerateCerts: true  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<


#Checking CA serial in controller:

[root@controller-0 ~]# openssl x509 -text -noout -in /var/lib/config-data/puppet-generated/octavia/etc/octavia/certs/ca_01.pem | grep Serial -A1
        Serial Number:
            f3:2e:a7:af:df:c2:b8:c0


#Created LB tree, one LB, one listener, one pool, 2 members. Worked successfully:

[2020-02-03 10:16:24] (overcloud) [stack@undercloud-0 ~]$ req='curl 10.0.0.222'; for i in {1..10}; do $req; echo; done
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server2-37cgw2ryf5iw
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server2-37cgw2ryf5iw
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server2-37cgw2ryf5iw
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server2-37cgw2ryf5iw
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server2-37cgw2ryf5iw

#Checked serial from the amphora:

[cloud-user@amphora-cb736e59-256d-4a7f-b91b-36d14a68624a ~]$ sudo openssl x509 -text -noout -in /etc/octavia/certs/client_ca.pem | grep Serial -A1
        Serial Number:
            f3:2e:a7:af:df:c2:b8:c0


#Ran overcloud update command - successfully.


#Checking CA serial in controller:

[root@controller-1 octavia]# openssl x509 -text -noout -in /var/lib/config-data/puppet-generated/octavia/etc/octavia/certs/ca_01.pem | grep Serial -A1
        Serial Number:
            f3:2e:a7:af:df:c2:b8:c0


#Checked serial from the amphora:

[cloud-user@amphora-cb736e59-256d-4a7f-b91b-36d14a68624a ~]$ sudo openssl x509 -text -noout -in /etc/octavia/certs/client_ca.pem  | grep Serial -A1
        Serial Number:
            f3:2e:a7:af:df:c2:b8:c0

#Stayed the same serial.


#Deleted a LB member:

[2020-02-03 11:32:23] (tester) [stack@undercloud-0 ~]$ openstack loadbalancer member list lbtreevmshttp3-pool-3ptx6rbi6llo
+--------------------------------------+------+----------------------------------+---------------------+---------------------------+---------------+------------------+--------+
| id                                   | name | project_id                       | provisioning_status | address                   | protocol_port | operating_status | weight |
+--------------------------------------+------+----------------------------------+---------------------+---------------------------+---------------+------------------+--------+
| 93e5e550-6739-4769-a4b3-cda8911a1a2c |      | 001d97babbb6405ab6d9e9894ba33710 | ACTIVE              | 2001::f816:3eff:fe30:448d |          8080 | ONLINE           |      1 |
| f078c818-389d-4df9-8867-b40b033b7d44 |      | 001d97babbb6405ab6d9e9894ba33710 | ACTIVE              | 2001::f816:3eff:fe13:d9ce |          8080 | ONLINE           |      1 |
+--------------------------------------+------+----------------------------------+---------------------+---------------------------+---------------+------------------+--------+
[2020-02-03 11:32:35] (tester) [stack@undercloud-0 ~]$ openstack loadbalancer member delete lbtreevmshttp3-pool-3ptx6rbi6llo 93e5e550-6739-4769-a4b3-cda8911a1a2c


#Member was deleted successfully:

[2020-02-03 11:33:02] (tester) [stack@undercloud-0 ~]$ req='curl 10.0.0.222'; for i in {1..10}; do $req; echo; done
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server1-5wpjmu24mbxk
lbtreevmshttp3-server1-5wpjmu24mbxk

#No "bad handshake" message in controller logs:

[root@controller-1 octavia]# grep -ir "bad handsh" /var/log/containers/octavia/
[root@controller-1 octavia]# 

[root@controller-0 ~]# grep -ir "bad handsh" /var/log/containers/octavia/
[root@controller-0 ~]# 


Moving the bug to VERIFIED.


Note You need to log in before you can comment on or make changes to this bug.