Bug 1645536
Summary: | Octavia controllers components fails to update loadbalancers after every deploy | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Patrik Martinsson <martinsson.patrik> |
Component: | openstack-tripleo-heat-templates | Assignee: | Brent Eagles <beagles> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Bruna Bonguardo <bbonguar> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 13.0 (Queens) | CC: | asimonel, astafeye, bbonguar, beagles, bshephar, cgoncalves, fiezzi, gthiemon, ihrachys, jschluet, lpeer, majopela, mburns, moddi, pmannidi, pveiga, rlondhe, shdunne, slinaber, sputhenp |
Target Milestone: | z11 | Keywords: | TestOnly, Triaged, ZStream |
Target Release: | 13.0 (Queens) | Flags: | pveiga:
needinfo-
pveiga: needinfo- pveiga: needinfo- pveiga: needinfo- |
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-heat-templates-8.4.1-6.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-02-04 11:52:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Patrik Martinsson
2018-11-02 13:44:01 UTC
Hi again, >> The certificate issued to the amps is currently only valid for a year, which would mean that you would end up in this situation after a year (if you didn't do a deploy). >> I think one year is way to short in this case. Never mind this, I just realized through https://docs.openstack.org/octavia/queens/admin/guides/operator-maintenance.html that octavia rotate's the amp certificates automatically. The original issue remains though, CA seems to be updated on every 'openstack deploy' which makes amp operations on previous deploy fail. // Patrik For production environments, we recommend users to provide they own certificates (see https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_guide/sec-octavia#configuring_octavia_certificates_and_keys), so they can choose the expiration date. If OctaviaGenerateCerts is true (or if the user wants to update its own certificates), he should perform a failover of the existing load balancers after re-deploying. 1. if listener's provisioning status is ACTIVE: user may call "openstack loadbalancer failover <loadbalancer_id>", the load balancer will be updated with new certificates. 2. if provisioning status is PENDING_UPDATE: It means the Octavia agent is trying to update a listener, but cannot reach it since the amphora certificates haven't been updated yet. In this case, the status is updated to ERROR after a timeout (default value is 25 min), then the user may perform the failover with "openstack loadbalancer failover <loadbalancer_id>". After the failover the provisioning status may be stuck into ERROR, this is an known issue that can be work around by updating the listener with a dummy configuration value such as "openstack loadbalancer listener set --connection-limit -1 <listener_id>" (it sets the connection-list of the listener to its default value), then the status is set to ACTIVE. Note that while the provisioning status is ERROR or PENDING_UPDATE, the load balancer is fully functional, but during the failover, load balancer may experience a short outage. @Greg , @carlos even with default config, the above steps are like workaround but not a permanent fix. Can we update Octavia Post deploy playbooks to check for existing certificates and its validity before regenerating new one's? Also, can we increase the CA validity period or make it configurable? The team discussed this issue yesterday and we now think that tripleo would handle the certificate expiration instead of recommending customers to provide their own certs. We're currently checking with the Security Team if we could set the validity of certificate to a long period (>= 10 years), and this value could be configurable. And there is some work in tripleo that checks for existing certificates and prevents creating them at each update/re-deploy (https://review.opendev.org/#/c/672529/). As a sidenote, there is an easy workaround: provide an environment file at the end of the command line with OctaviaGenerateCerts set to false when redeploying. e.g.: create a file named disable_cert_generation.yaml with the contents: parameter_defaults: OctaviaGenerateCerts: false And redeploy with it as the last environment file, ensuring that it overrides any other values overcloud deploy --tempates << other command line options >> -e disable_cert_generation.yaml According to our records, this should be resolved by openstack-tripleo-heat-templates-8.4.1-13.el7ost. This build is available now. #Heat version: (undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep trip | grep heat openstack-tripleo-heat-templates-8.4.1-16.el7ost.noarch #Overcloud deployment file: (undercloud) [stack@undercloud-0 ~]$ cat overcloud_deploy.sh #!/bin/bash openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --environment-file /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml \ --environment-file /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock1.rdu2.redhat.com \ -e /home/stack/virt/config_lvm.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /home/stack/virt/docker-images.yaml \ --log-file overcloud_deployment_84.log #Octavia.yaml file: (undercloud) [stack@undercloud-0 ~]$ cat /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml resource_registry: OS::TripleO::Services::OctaviaApi: ../../docker/services/octavia-api.yaml OS::TripleO::Services::OctaviaHousekeeping: ../../docker/services/octavia-housekeeping.yaml OS::TripleO::Services::OctaviaHealthManager: ../../docker/services/octavia-health-manager.yaml OS::TripleO::Services::OctaviaWorker: ../../docker/services/octavia-worker.yaml OS::TripleO::Services::OctaviaDeploymentConfig: ../../docker/services/octavia/octavia-deployment-config.yaml parameter_defaults: NeutronEnableForceMetadata: true # This flag enables internal generation of certificates for communication # with amphorae. Use OctaviaCaCert, OctaviaCaKey, OctaviaCaKeyPassphrase, # OctaviaClient and OctaviaServerCertsKeyPassphrase cert to configure # secure production environments. OctaviaGenerateCerts: true <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< #Checking CA serial in controller: [root@controller-0 ~]# openssl x509 -text -noout -in /var/lib/config-data/puppet-generated/octavia/etc/octavia/certs/ca_01.pem | grep Serial -A1 Serial Number: f3:2e:a7:af:df:c2:b8:c0 #Created LB tree, one LB, one listener, one pool, 2 members. Worked successfully: [2020-02-03 10:16:24] (overcloud) [stack@undercloud-0 ~]$ req='curl 10.0.0.222'; for i in {1..10}; do $req; echo; done lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server2-37cgw2ryf5iw lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server2-37cgw2ryf5iw lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server2-37cgw2ryf5iw lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server2-37cgw2ryf5iw lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server2-37cgw2ryf5iw #Checked serial from the amphora: [cloud-user@amphora-cb736e59-256d-4a7f-b91b-36d14a68624a ~]$ sudo openssl x509 -text -noout -in /etc/octavia/certs/client_ca.pem | grep Serial -A1 Serial Number: f3:2e:a7:af:df:c2:b8:c0 #Ran overcloud update command - successfully. #Checking CA serial in controller: [root@controller-1 octavia]# openssl x509 -text -noout -in /var/lib/config-data/puppet-generated/octavia/etc/octavia/certs/ca_01.pem | grep Serial -A1 Serial Number: f3:2e:a7:af:df:c2:b8:c0 #Checked serial from the amphora: [cloud-user@amphora-cb736e59-256d-4a7f-b91b-36d14a68624a ~]$ sudo openssl x509 -text -noout -in /etc/octavia/certs/client_ca.pem | grep Serial -A1 Serial Number: f3:2e:a7:af:df:c2:b8:c0 #Stayed the same serial. #Deleted a LB member: [2020-02-03 11:32:23] (tester) [stack@undercloud-0 ~]$ openstack loadbalancer member list lbtreevmshttp3-pool-3ptx6rbi6llo +--------------------------------------+------+----------------------------------+---------------------+---------------------------+---------------+------------------+--------+ | id | name | project_id | provisioning_status | address | protocol_port | operating_status | weight | +--------------------------------------+------+----------------------------------+---------------------+---------------------------+---------------+------------------+--------+ | 93e5e550-6739-4769-a4b3-cda8911a1a2c | | 001d97babbb6405ab6d9e9894ba33710 | ACTIVE | 2001::f816:3eff:fe30:448d | 8080 | ONLINE | 1 | | f078c818-389d-4df9-8867-b40b033b7d44 | | 001d97babbb6405ab6d9e9894ba33710 | ACTIVE | 2001::f816:3eff:fe13:d9ce | 8080 | ONLINE | 1 | +--------------------------------------+------+----------------------------------+---------------------+---------------------------+---------------+------------------+--------+ [2020-02-03 11:32:35] (tester) [stack@undercloud-0 ~]$ openstack loadbalancer member delete lbtreevmshttp3-pool-3ptx6rbi6llo 93e5e550-6739-4769-a4b3-cda8911a1a2c #Member was deleted successfully: [2020-02-03 11:33:02] (tester) [stack@undercloud-0 ~]$ req='curl 10.0.0.222'; for i in {1..10}; do $req; echo; done lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server1-5wpjmu24mbxk lbtreevmshttp3-server1-5wpjmu24mbxk #No "bad handshake" message in controller logs: [root@controller-1 octavia]# grep -ir "bad handsh" /var/log/containers/octavia/ [root@controller-1 octavia]# [root@controller-0 ~]# grep -ir "bad handsh" /var/log/containers/octavia/ [root@controller-0 ~]# Moving the bug to VERIFIED. |