Bug 1768063 - When External LB is used, minor updates fail while trying to work with haproxy-bundle
Summary: When External LB is used, minor updates fail while trying to work with haprox...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: z11
: 13.0 (Queens)
Assignee: RHOS Maint
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-02 04:06 UTC by Brendan Shephard
Modified: 2022-08-23 18:25 UTC (History)
7 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.4.1-18.el7ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-10 11:22:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1851297 0 None None None 2019-11-04 20:56:29 UTC
OpenStack gerrit 692904 0 'None' MERGED Disable haproxy when using external LB 2020-08-12 21:56:50 UTC
Red Hat Issue Tracker OSP-3296 0 None None None 2022-08-23 18:25:24 UTC
Red Hat Product Errata RHBA-2020:0760 0 None None None 2020-03-10 11:22:45 UTC

Description Brendan Shephard 2019-11-02 04:06:14 UTC
Description of problem:
When using an External LB as per: 
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/external_load_balancing_for_the_overcloud/index

Minor updates fail in the Controller/update_tasks.yaml playbook. When it tries to work with haproxy-bundle. Which doesn't exist because we're using a External LB.

Version-Release number of selected component (if applicable):
RHOSP13

How reproducible:
Easily with enough resources

Steps to Reproduce:
1. Deploy the Overcloud with external LB as per the above document
2. Run the update prepare
3. Run the update on the controllers openstack overcloud update run --node Controller

Actual results:
TASK [Disable the haproxy cluster resource] ************************************
Friday 01 November 2019  20:56:22 -0400 (0:00:00.373)       0:37:50.310 ******* 
FAILED - RETRYING: Disable the haproxy cluster resource (5 retries left).
FAILED - RETRYING: Disable the haproxy cluster resource (4 retries left).
FAILED - RETRYING: Disable the haproxy cluster resource (3 retries left).
FAILED - RETRYING: Disable the haproxy cluster resource (2 retries left).
FAILED - RETRYING: Disable the haproxy cluster resource (1 retries left).

fatal: [controller-0]: FAILED! => {"attempts": 5, "changed": false, "error": "Error: bundle/clone/group/master/resource 'haproxy-bundle' does not exist\n", "msg": "
Failed, to set the resource haproxy-bundle to the state disable", "output": "", "rc": 1}

PLAY RECAP *********************************************************************
controller-0       : ok=22   changed=3    unreachable=0    failed=1   

Expected results:
If we include EnableLoadBalancer: False in the templates, we should be skipping the update_tasks related to haproxy

Additional info:
Best solution I've been able to come up with is to do the following:
- openstack overcloud config download --config-dir blah
- vi blah/tripleo-*/Controller/update_tasks.yaml
- add a tags section to the haproxy block
- ansible-playbook --module-path /usr/share/ansible-modules/ -i /usr/bin/tripleo-ansible-inventory --ssh-extra-args '-o UserKnownHostsFile=/dev/null' --ssh-common-args '-o StrictHostKeyChecking=no' -b update_steps_playbook.yaml --limit Controller --skip-tags haproxy_lb

This appears to have worked so far. But I'm not testing with an external LB, I'm just skipping the haproxy steps and making the assumption the same would be true if I had an external LB. I need someone to confirm that there is no issues with this.

Additionally, I know we have sections that move the VIP in the playbook as well:

233 - name: Move virtual IPs to another node before stopping pacemaker

Do we think it would be safe to tag and skip this task as well?


Last question, is there a mechanism that I'm missing that should be skipping this for environments with External LB's? I have had a look and can't see anything that would facilitate this, but I could have missed it.

Comment 2 Brendan Shephard 2019-11-02 04:37:59 UTC
After tagging the relevant blocks and skipping the tags, we can see it completes successfully:

ansible-playbook --module-path /usr/share/ansible-modules/ -i /usr/bin/tripleo-ansible-inventory --ssh-extra-args '-o UserKnownHostsFile=/dev/null' --ssh-common-args '-o StrictHostKeyChecking=no' -b update_steps_playbook.yaml --limit Controller --skip-tags haproxy_lb

[...]

PLAY RECAP **********************************************************************************************************************************************************************************************************************************
controller-0               : ok=287  changed=124  unreachable=0    failed=0   



The only issue is that I'm not using an External LB. So there might be other issues that come into play that I'm unable to anticipate without a more thorough knowledge of the process.

Comment 3 Brendan Shephard 2019-11-02 04:41:32 UTC
Making this part of my comment public as well for anyone watching this BZ:


The tags I added are here:

 62 - block:
 63   - name: Check for haproxy Kolla configuration
[...]
 79   name: Set HAProxy upgrade facts
 80   tags: haproxy_lb


and

 81 - block:
 82   - command: cibadmin --query --xpath "//storage-mapping[@id='haproxy-cert']"
[...]
113   name: Mount TLS cert if needed
114   when:
115   - step|int == 1
116   - haproxy_containerized|bool
117   - is_bootstrap_node
118   tags: haproxy_lb

and

119 - block:
120   - name: Get docker Haproxy image
[...]
142   name: Haproxy fetch and retag container image for pacemaker
143   when: step|int == 2
144   tags: haproxy_lb

Comment 4 Brendan Shephard 2019-11-02 05:38:08 UTC
If I comment out the haproxy part from the docker-ha.yaml file and re-run the update prepare, it seems to remove all of it actually:

- cp /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml ~/templates/
- vi ~/templates/docker-ha.yaml
Comment out the haproxy part:

  # HA Containers managed by pacemaker
  OS::TripleO::Services::CinderVolume: ../docker/services/pacemaker/cinder-volume.yaml
  OS::TripleO::Services::Clustercheck: ../docker/services/pacemaker/clustercheck.yaml
  #OS::TripleO::Services::HAproxy: ../docker/services/pacemaker/haproxy.yaml
  OS::TripleO::Services::MySQL: ../docker/services/pacemaker/database/mysql.yaml
  OS::TripleO::Services::RabbitMQ: ../docker/services/pacemaker/rabbitmq.yaml
  OS::TripleO::Services::Redis: ../docker/services/pacemaker/database/redis.yaml

- include that file at the end of my deployment script:
  -e /home/stack/templates/docker-ha.yaml \ 

- run the update prepare command


Then when I check swift, I can now see it commented out:
(undercloud) [stack@undercloud-0 ~]$ diff -u swift/environments/docker-ha.yaml swift_changed/environments/docker-ha.yaml 
--- swift/environments/docker-ha.yaml   2019-11-01 21:57:09.204555383 -0400
+++ swift_changed/environments/docker-ha.yaml   2019-11-02 01:23:16.948181306 -0400
@@ -16,7 +16,7 @@
   # HA Containers managed by pacemaker
   OS::TripleO::Services::CinderVolume: ../docker/services/pacemaker/cinder-volume.yaml
   OS::TripleO::Services::Clustercheck: ../docker/services/pacemaker/clustercheck.yaml
-  OS::TripleO::Services::HAproxy: ../docker/services/pacemaker/haproxy.yaml
+  #OS::TripleO::Services::HAproxy: ../docker/services/pacemaker/haproxy.yaml
   OS::TripleO::Services::MySQL: ../docker/services/pacemaker/database/mysql.yaml
   OS::TripleO::Services::RabbitMQ: ../docker/services/pacemaker/rabbitmq.yaml
   OS::TripleO::Services::Redis: ../docker/services/pacemaker/database/redis.yaml

Checking config download:
(undercloud) [stack@undercloud-0 ~]$ grep haproxy blah1/tripleo-_ktbDN-config/Controller/update_tasks.yaml 
(undercloud) [stack@undercloud-0 ~]$ grep haproxy tmpconfig/tripleo-YhykHR-config/Controller/update_tasks.yaml 
  - name: Check for haproxy Kolla configuration
    register: haproxy_kolla_config
      path: /var/lib/config-data/puppet-generated/haproxy
  - name: Check if haproxy is already containerized
      haproxy_containerized: '{{haproxy_kolla_config.stat.isdir | default(false)}}'
  tags: haproxy_lb
  - command: cibadmin --query --xpath "//storage-mapping[@id='haproxy-cert']"
    name: Check haproxy public certificate configuration in pacemaker
    register: haproxy_cert_mounted
  - name: Disable the haproxy cluster resource
      resource: haproxy-bundle
    when: haproxy_cert_mounted.rc == 6
      haproxy_public_cert_path: /etc/pki/tls/private/overcloud_endpoint.pem
      haproxy_public_tls_enabled: false
  - command: pcs resource bundle update haproxy-bundle storage-map add id=haproxy-cert
      source-dir={{ haproxy_public_cert_path }} target-dir=/var/lib/kolla/config_files/src-tls/{{
      haproxy_public_cert_path }} options=ro
    name: Add a bind mount for public certificate in the haproxy bundle
    when: haproxy_cert_mounted.rc == 6 and haproxy_public_tls_enabled|bool
  - name: Enable the haproxy cluster resource
      resource: haproxy-bundle
    when: haproxy_cert_mounted.rc == 6
  - haproxy_containerized|bool
  tags: haproxy_lb
      docker_image: 192.168.24.1:8787/rhosp13/openstack-haproxy:2019-10-31.1
      docker_image_latest: 192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest
    register: haproxy_image_id
    shell: docker images | awk '/haproxy.* pcmklatest/{print $3}' | uniq
      register: haproxy_containers_to_destroy
      shell: docker ps -a -q -f 'ancestor={{haproxy_image_id.stdout}}'
      with_items: '{{ haproxy_containers_to_destroy.stdout_lines }}'
      shell: docker rmi -f {{haproxy_image_id.stdout}}
    - haproxy_image_id.stdout != ''


This looks like the right solution. But I don't see this mentioned in the documentation. Can anyone confirm?

Comment 5 Brendan Shephard 2019-11-02 06:14:00 UTC
So doing it this way appears to also work for me (comment 4):

PLAY RECAP *********************************************************************
controller-0               : ok=280  changed=121  unreachable=0    failed=0   

Saturday 02 November 2019  02:11:42 -0400 (0:00:00.043)       0:21:36.744 ***** 
=============================================================================== 

Updated nodes - Controller
Success


I feel like this is probably the "right" way to go about it.

Comment 6 Brendan Shephard 2019-11-02 22:14:59 UTC
To get past this issue we:

- set:
resource_registry:
  OS::TripleO::Services::HAproxy: OS::Heat::None

- Included it in the update prepare command

- Verified that the haproxy parts of the update_tasks.yaml had been removed using openstack overcloud config download

- We checked in swift and could see that this had now been set in user-environment.yaml:
user-environment.yaml:  OS::TripleO::Services::HAproxy: OS::Heat::None

- Run the update: openstack overcloud update run --nodes Controller

This now completes successfully and the converge was also successful.


I think we probably need to document that and include it with the "External Load Balancing for the Overcloud" document:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/external_load_balancing_for_the_overcloud/index

Comment 7 Luca Miccini 2019-11-04 15:38:59 UTC
I think we could also explicitly add 

  OS::TripleO::Services::HAproxy: OS::Heat::None

to the example environment files in THT (/usr/share/openstack-tripleo-heat-templates/environments/external-loadbalancer-vip.yaml etc).

Comment 8 Brendan Shephard 2019-11-04 20:42:57 UTC
(In reply to Luca Miccini from comment #7)
> I think we could also explicitly add 
> 
>   OS::TripleO::Services::HAproxy: OS::Heat::None
> 
> to the example environment files in THT
> (/usr/share/openstack-tripleo-heat-templates/environments/external-
> loadbalancer-vip.yaml etc).

Yeah, I agree with this. I'll commit it to master for review and we can debate it there with everyone.

Comment 9 Luca Miccini 2019-11-06 06:33:16 UTC
merged upstream, cherry picked to queens here https://review.opendev.org/693053

Comment 14 errata-xmlrpc 2020-03-10 11:22:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0760


Note You need to log in before you can comment on or make changes to this bug.