Bug 2138665 - [OSP17.0] Growvols playbook fails with heterogeneous images
Summary: [OSP17.0] Growvols playbook fails with heterogeneous images
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Steve Baker
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-30 21:38 UTC by Steve Baker
Modified: 2023-12-15 04:25 UTC (History)
8 users (show)

Fixed In Version: tripleo-ansible-3.3.1-1.20230110200858.a09f237.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2135615
Environment:
Last Closed: 2023-08-16 01:12:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 862984 0 None MERGED Don't use meta:end_play after growvols check 2023-01-18 20:58:24 UTC
OpenStack gerrit 864445 0 None MERGED Don't use meta:end_play after growvols check 2023-01-18 20:58:27 UTC
Red Hat Issue Tracker OSP-19800 0 None None None 2022-10-30 21:42:47 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:12:52 UTC

Description Steve Baker 2022-10-30 21:38:17 UTC
+++ This bug was initially created as a clone of Bug #2135615 +++

--- Additional comment from Ravi Singh on 2022-10-30 12:58:40 UTC ---

Attempting the following heterogeneous combination:

-> 1 node(overcloud-controller01) booted with combination-> overcloud-hardened image+UEFI.
-> Other controllers(controller2&3) using -> overcloud-full partitioned +BIOS

I believe this should be a supported scenario considering that it's quite possible that customers would like to utilize old hardware which supports only bios or any xyz reason where overcloud-full partitioned +BIOS suits them.

Please correct me if this will not be a supported scenario & only hardened image+UEFI combo should be used. 

Coming back to the issue I observed during provisioning nodes are booted successfully but later on playbook failed while trying to use growvols utility on overcloud-controller2&3 which eventually is failing since this image(overcloud-full) doesn't contain growvols utility.

Rather hardened-uefi image contains it.

+++
2022-10-30 14:20:56.046435 | 525400db-7407-0b48-dcf8-000000000011 |       TASK | Stopping playbook when no growvols utility is found
2022-10-30 14:20:56.053545 | 525400db-7407-0b48-dcf8-000000000011 |    SKIPPED | Stopping playbook when no growvols utility is found | overcloud-controller-1
[WARNING]: ('overcloud-controller-1', '525400db-7407-0b48-dcf8-000000000011')
missing from stats
2022-10-30 14:20:56.058177 | 525400db-7407-0b48-dcf8-000000000012 |       TASK | Setting growvols path
2022-10-30 14:20:56.087519 | 525400db-7407-0b48-dcf8-000000000012 |         OK | Setting growvols path | overcloud-controller-1
2022-10-30 14:20:56.089956 | 525400db-7407-0b48-dcf8-000000000012 |     TIMING | Setting growvols path | overcloud-controller-1 | 0:00:50.947028 | 0.03s
2022-10-30 14:20:56.092011 | 525400db-7407-0b48-dcf8-000000000012 |      FATAL | Setting growvols path | overcloud-controller-2 | error={"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0\n\nThe error appears to be in '/usr/share/ansible/tripleo-playbooks/cli-overcloud-node-growvols.yaml': line 76, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: Setting growvols path\n      ^ here\n"}
2022-10-30 14:20:56.092891 | 525400db-7407-0b48-dcf8-000000000012 |     TIMING | Setting growvols path | overcloud-controller-2 | 0:00:50.950017 | 0.02s
2022-10-30 14:20:56.104052 | 525400db-7407-0b48-dcf8-000000000012 |      FATAL | Setting growvols path | overcloud-controller-3 | error={"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0\n\nThe error appears to be in '/usr/share/ansible/tripleo-playbooks/cli-overcloud-node-growvols.yaml': line 76, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: Setting growvols path\n      ^ here\n"}
2022-10-30 14:20:56.104663 | 525400db-7407-0b48-dcf8-000000000012 |     TIMING | Setting growvols path | overcloud-controller-3 | 0:00:50.961794 | 0.02s
+++

Task which got executed -> https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/playbooks/cli-overcloud-node-growvols.yaml#L76-L78

I am able to fix this by introducing "when: find_growvols.rc == 0" on the following highlighted tasks which mean executing the command only on nodes where growvols utility is available.

~~~
    - name: Find the growvols utility
      shell: >
        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
        which growvols
      failed_when: false
      become: true
      register: find_growvols

    - name: Stopping playbook when no growvols utility is found
      meta: end_play
      when: find_growvols.rc != 0

    - name: Setting growvols path
      set_fact:
        growvols_path: "{{ find_growvols.stdout_lines[0] }}"
      when: find_growvols.rc == 0 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

    - name: "Running {{ growvols_path }} {{growvols_args}}"
      shell: "{{ growvols_path }} --yes {{growvols_args}}"
      become: true
      register: run_growvols
      when: find_growvols.rc == 0  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
~~~

Post above changes provisioning is successful.
~~~
PLAY RECAP *********************************************************************
overcloud-controller-1     : ok=20   changed=8    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0   
overcloud-controller-2     : ok=20   changed=8    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0   
overcloud-controller-3     : ok=20   changed=8    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0   
overcloud-novacompute-0    : ok=20   changed=8    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0 
~~~
The homogeneous combination(all nodes with same image+boot_mode) works fine but I see problem only with heterogeneous combination.

Do you see this as a supported scenario? If yes I can create a patch to fix it in similar manner as described above.

Comment 1 Steve Baker 2022-11-14 21:37:10 UTC
Wallaby backport proposed

Comment 13 errata-xmlrpc 2023-08-16 01:12:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577

Comment 14 Red Hat Bugzilla 2023-12-15 04:25:46 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.