+++ This bug was initially created as a clone of Bug #2135615 +++ --- Additional comment from Ravi Singh on 2022-10-30 12:58:40 UTC --- Attempting the following heterogeneous combination: -> 1 node(overcloud-controller01) booted with combination-> overcloud-hardened image+UEFI. -> Other controllers(controller2&3) using -> overcloud-full partitioned +BIOS I believe this should be a supported scenario considering that it's quite possible that customers would like to utilize old hardware which supports only bios or any xyz reason where overcloud-full partitioned +BIOS suits them. Please correct me if this will not be a supported scenario & only hardened image+UEFI combo should be used. Coming back to the issue I observed during provisioning nodes are booted successfully but later on playbook failed while trying to use growvols utility on overcloud-controller2&3 which eventually is failing since this image(overcloud-full) doesn't contain growvols utility. Rather hardened-uefi image contains it. +++ 2022-10-30 14:20:56.046435 | 525400db-7407-0b48-dcf8-000000000011 | TASK | Stopping playbook when no growvols utility is found 2022-10-30 14:20:56.053545 | 525400db-7407-0b48-dcf8-000000000011 | SKIPPED | Stopping playbook when no growvols utility is found | overcloud-controller-1 [WARNING]: ('overcloud-controller-1', '525400db-7407-0b48-dcf8-000000000011') missing from stats 2022-10-30 14:20:56.058177 | 525400db-7407-0b48-dcf8-000000000012 | TASK | Setting growvols path 2022-10-30 14:20:56.087519 | 525400db-7407-0b48-dcf8-000000000012 | OK | Setting growvols path | overcloud-controller-1 2022-10-30 14:20:56.089956 | 525400db-7407-0b48-dcf8-000000000012 | TIMING | Setting growvols path | overcloud-controller-1 | 0:00:50.947028 | 0.03s 2022-10-30 14:20:56.092011 | 525400db-7407-0b48-dcf8-000000000012 | FATAL | Setting growvols path | overcloud-controller-2 | error={"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0\n\nThe error appears to be in '/usr/share/ansible/tripleo-playbooks/cli-overcloud-node-growvols.yaml': line 76, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: Setting growvols path\n ^ here\n"} 2022-10-30 14:20:56.092891 | 525400db-7407-0b48-dcf8-000000000012 | TIMING | Setting growvols path | overcloud-controller-2 | 0:00:50.950017 | 0.02s 2022-10-30 14:20:56.104052 | 525400db-7407-0b48-dcf8-000000000012 | FATAL | Setting growvols path | overcloud-controller-3 | error={"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0\n\nThe error appears to be in '/usr/share/ansible/tripleo-playbooks/cli-overcloud-node-growvols.yaml': line 76, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: Setting growvols path\n ^ here\n"} 2022-10-30 14:20:56.104663 | 525400db-7407-0b48-dcf8-000000000012 | TIMING | Setting growvols path | overcloud-controller-3 | 0:00:50.961794 | 0.02s +++ Task which got executed -> https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/playbooks/cli-overcloud-node-growvols.yaml#L76-L78 I am able to fix this by introducing "when: find_growvols.rc == 0" on the following highlighted tasks which mean executing the command only on nodes where growvols utility is available. ~~~ - name: Find the growvols utility shell: > PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin which growvols failed_when: false become: true register: find_growvols - name: Stopping playbook when no growvols utility is found meta: end_play when: find_growvols.rc != 0 - name: Setting growvols path set_fact: growvols_path: "{{ find_growvols.stdout_lines[0] }}" when: find_growvols.rc == 0 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< - name: "Running {{ growvols_path }} {{growvols_args}}" shell: "{{ growvols_path }} --yes {{growvols_args}}" become: true register: run_growvols when: find_growvols.rc == 0 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~~~ Post above changes provisioning is successful. ~~~ PLAY RECAP ********************************************************************* overcloud-controller-1 : ok=20 changed=8 unreachable=0 failed=0 skipped=5 rescued=0 ignored=0 overcloud-controller-2 : ok=20 changed=8 unreachable=0 failed=0 skipped=5 rescued=0 ignored=0 overcloud-controller-3 : ok=20 changed=8 unreachable=0 failed=0 skipped=5 rescued=0 ignored=0 overcloud-novacompute-0 : ok=20 changed=8 unreachable=0 failed=0 skipped=5 rescued=0 ignored=0 ~~~ The homogeneous combination(all nodes with same image+boot_mode) works fine but I see problem only with heterogeneous combination. Do you see this as a supported scenario? If yes I can create a patch to fix it in similar manner as described above.
Wallaby backport proposed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days