Description of problem: When using an External LB as per: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/external_load_balancing_for_the_overcloud/index Minor updates fail in the Controller/update_tasks.yaml playbook. When it tries to work with haproxy-bundle. Which doesn't exist because we're using a External LB. Version-Release number of selected component (if applicable): RHOSP13 How reproducible: Easily with enough resources Steps to Reproduce: 1. Deploy the Overcloud with external LB as per the above document 2. Run the update prepare 3. Run the update on the controllers openstack overcloud update run --node Controller Actual results: TASK [Disable the haproxy cluster resource] ************************************ Friday 01 November 2019 20:56:22 -0400 (0:00:00.373) 0:37:50.310 ******* FAILED - RETRYING: Disable the haproxy cluster resource (5 retries left). FAILED - RETRYING: Disable the haproxy cluster resource (4 retries left). FAILED - RETRYING: Disable the haproxy cluster resource (3 retries left). FAILED - RETRYING: Disable the haproxy cluster resource (2 retries left). FAILED - RETRYING: Disable the haproxy cluster resource (1 retries left). fatal: [controller-0]: FAILED! => {"attempts": 5, "changed": false, "error": "Error: bundle/clone/group/master/resource 'haproxy-bundle' does not exist\n", "msg": " Failed, to set the resource haproxy-bundle to the state disable", "output": "", "rc": 1} PLAY RECAP ********************************************************************* controller-0 : ok=22 changed=3 unreachable=0 failed=1 Expected results: If we include EnableLoadBalancer: False in the templates, we should be skipping the update_tasks related to haproxy Additional info: Best solution I've been able to come up with is to do the following: - openstack overcloud config download --config-dir blah - vi blah/tripleo-*/Controller/update_tasks.yaml - add a tags section to the haproxy block - ansible-playbook --module-path /usr/share/ansible-modules/ -i /usr/bin/tripleo-ansible-inventory --ssh-extra-args '-o UserKnownHostsFile=/dev/null' --ssh-common-args '-o StrictHostKeyChecking=no' -b update_steps_playbook.yaml --limit Controller --skip-tags haproxy_lb This appears to have worked so far. But I'm not testing with an external LB, I'm just skipping the haproxy steps and making the assumption the same would be true if I had an external LB. I need someone to confirm that there is no issues with this. Additionally, I know we have sections that move the VIP in the playbook as well: 233 - name: Move virtual IPs to another node before stopping pacemaker Do we think it would be safe to tag and skip this task as well? Last question, is there a mechanism that I'm missing that should be skipping this for environments with External LB's? I have had a look and can't see anything that would facilitate this, but I could have missed it.
After tagging the relevant blocks and skipping the tags, we can see it completes successfully: ansible-playbook --module-path /usr/share/ansible-modules/ -i /usr/bin/tripleo-ansible-inventory --ssh-extra-args '-o UserKnownHostsFile=/dev/null' --ssh-common-args '-o StrictHostKeyChecking=no' -b update_steps_playbook.yaml --limit Controller --skip-tags haproxy_lb [...] PLAY RECAP ********************************************************************************************************************************************************************************************************************************** controller-0 : ok=287 changed=124 unreachable=0 failed=0 The only issue is that I'm not using an External LB. So there might be other issues that come into play that I'm unable to anticipate without a more thorough knowledge of the process.
Making this part of my comment public as well for anyone watching this BZ: The tags I added are here: 62 - block: 63 - name: Check for haproxy Kolla configuration [...] 79 name: Set HAProxy upgrade facts 80 tags: haproxy_lb and 81 - block: 82 - command: cibadmin --query --xpath "//storage-mapping[@id='haproxy-cert']" [...] 113 name: Mount TLS cert if needed 114 when: 115 - step|int == 1 116 - haproxy_containerized|bool 117 - is_bootstrap_node 118 tags: haproxy_lb and 119 - block: 120 - name: Get docker Haproxy image [...] 142 name: Haproxy fetch and retag container image for pacemaker 143 when: step|int == 2 144 tags: haproxy_lb
If I comment out the haproxy part from the docker-ha.yaml file and re-run the update prepare, it seems to remove all of it actually: - cp /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml ~/templates/ - vi ~/templates/docker-ha.yaml Comment out the haproxy part: # HA Containers managed by pacemaker OS::TripleO::Services::CinderVolume: ../docker/services/pacemaker/cinder-volume.yaml OS::TripleO::Services::Clustercheck: ../docker/services/pacemaker/clustercheck.yaml #OS::TripleO::Services::HAproxy: ../docker/services/pacemaker/haproxy.yaml OS::TripleO::Services::MySQL: ../docker/services/pacemaker/database/mysql.yaml OS::TripleO::Services::RabbitMQ: ../docker/services/pacemaker/rabbitmq.yaml OS::TripleO::Services::Redis: ../docker/services/pacemaker/database/redis.yaml - include that file at the end of my deployment script: -e /home/stack/templates/docker-ha.yaml \ - run the update prepare command Then when I check swift, I can now see it commented out: (undercloud) [stack@undercloud-0 ~]$ diff -u swift/environments/docker-ha.yaml swift_changed/environments/docker-ha.yaml --- swift/environments/docker-ha.yaml 2019-11-01 21:57:09.204555383 -0400 +++ swift_changed/environments/docker-ha.yaml 2019-11-02 01:23:16.948181306 -0400 @@ -16,7 +16,7 @@ # HA Containers managed by pacemaker OS::TripleO::Services::CinderVolume: ../docker/services/pacemaker/cinder-volume.yaml OS::TripleO::Services::Clustercheck: ../docker/services/pacemaker/clustercheck.yaml - OS::TripleO::Services::HAproxy: ../docker/services/pacemaker/haproxy.yaml + #OS::TripleO::Services::HAproxy: ../docker/services/pacemaker/haproxy.yaml OS::TripleO::Services::MySQL: ../docker/services/pacemaker/database/mysql.yaml OS::TripleO::Services::RabbitMQ: ../docker/services/pacemaker/rabbitmq.yaml OS::TripleO::Services::Redis: ../docker/services/pacemaker/database/redis.yaml Checking config download: (undercloud) [stack@undercloud-0 ~]$ grep haproxy blah1/tripleo-_ktbDN-config/Controller/update_tasks.yaml (undercloud) [stack@undercloud-0 ~]$ grep haproxy tmpconfig/tripleo-YhykHR-config/Controller/update_tasks.yaml - name: Check for haproxy Kolla configuration register: haproxy_kolla_config path: /var/lib/config-data/puppet-generated/haproxy - name: Check if haproxy is already containerized haproxy_containerized: '{{haproxy_kolla_config.stat.isdir | default(false)}}' tags: haproxy_lb - command: cibadmin --query --xpath "//storage-mapping[@id='haproxy-cert']" name: Check haproxy public certificate configuration in pacemaker register: haproxy_cert_mounted - name: Disable the haproxy cluster resource resource: haproxy-bundle when: haproxy_cert_mounted.rc == 6 haproxy_public_cert_path: /etc/pki/tls/private/overcloud_endpoint.pem haproxy_public_tls_enabled: false - command: pcs resource bundle update haproxy-bundle storage-map add id=haproxy-cert source-dir={{ haproxy_public_cert_path }} target-dir=/var/lib/kolla/config_files/src-tls/{{ haproxy_public_cert_path }} options=ro name: Add a bind mount for public certificate in the haproxy bundle when: haproxy_cert_mounted.rc == 6 and haproxy_public_tls_enabled|bool - name: Enable the haproxy cluster resource resource: haproxy-bundle when: haproxy_cert_mounted.rc == 6 - haproxy_containerized|bool tags: haproxy_lb docker_image: 192.168.24.1:8787/rhosp13/openstack-haproxy:2019-10-31.1 docker_image_latest: 192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest register: haproxy_image_id shell: docker images | awk '/haproxy.* pcmklatest/{print $3}' | uniq register: haproxy_containers_to_destroy shell: docker ps -a -q -f 'ancestor={{haproxy_image_id.stdout}}' with_items: '{{ haproxy_containers_to_destroy.stdout_lines }}' shell: docker rmi -f {{haproxy_image_id.stdout}} - haproxy_image_id.stdout != '' This looks like the right solution. But I don't see this mentioned in the documentation. Can anyone confirm?
So doing it this way appears to also work for me (comment 4): PLAY RECAP ********************************************************************* controller-0 : ok=280 changed=121 unreachable=0 failed=0 Saturday 02 November 2019 02:11:42 -0400 (0:00:00.043) 0:21:36.744 ***** =============================================================================== Updated nodes - Controller Success I feel like this is probably the "right" way to go about it.
To get past this issue we: - set: resource_registry: OS::TripleO::Services::HAproxy: OS::Heat::None - Included it in the update prepare command - Verified that the haproxy parts of the update_tasks.yaml had been removed using openstack overcloud config download - We checked in swift and could see that this had now been set in user-environment.yaml: user-environment.yaml: OS::TripleO::Services::HAproxy: OS::Heat::None - Run the update: openstack overcloud update run --nodes Controller This now completes successfully and the converge was also successful. I think we probably need to document that and include it with the "External Load Balancing for the Overcloud" document: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/external_load_balancing_for_the_overcloud/index
I think we could also explicitly add OS::TripleO::Services::HAproxy: OS::Heat::None to the example environment files in THT (/usr/share/openstack-tripleo-heat-templates/environments/external-loadbalancer-vip.yaml etc).
(In reply to Luca Miccini from comment #7) > I think we could also explicitly add > > OS::TripleO::Services::HAproxy: OS::Heat::None > > to the example environment files in THT > (/usr/share/openstack-tripleo-heat-templates/environments/external- > loadbalancer-vip.yaml etc). Yeah, I agree with this. I'll commit it to master for review and we can debate it there with everyone.
merged upstream, cherry picked to queens here https://review.opendev.org/693053
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0760