Bug 1922132

Summary: The OSP13->16.1 Upgrade fails during the overcloud system upgrade of controller-0 when running tripleo_provision_mcelog role
Product: Red Hat OpenStack Reporter: Leonid Natapov <lnatapov>
Component: openstack-tripleo-heat-templatesAssignee: Jesse Pretorius <jpretori>
Status: CLOSED ERRATA QA Contact: Joe H. Rahme <jhakimra>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: astupnik, jpretori, lbezdick, mburns, shtiwari, spower
Target Milestone: z4Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210104205662.el8ost.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-17 15:36:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Leonid Natapov 2021-01-29 10:20:27 UTC
The OSP13->16.1 Upgrade fails with the following during the overcloud system upgrade of controller-0:

2021-01-28 22:16:41 | PLAY [Clear cached facts] ******************************************************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | TASK [Gathering Facts] *********************************************************
2021-01-28 22:16:41 | Thursday 28 January 2021  22:16:35 +0000 (0:00:00.118)       0:00:00.118 ****** 
2021-01-28 22:16:41 | [WARNING]: Failure using method (v2_runner_on_start) in callback plugin
2021-01-28 22:16:41 | (<ansible.plugins.callback.tripleo.CallbackModule object at 0x7ff42e1fb048>):
2021-01-28 22:16:41 | 'show_per_host_start'
2021-01-28 22:16:41 | ok: [controller-0]
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY [Gather facts from undercloud] ********************************************
2021-01-28 22:16:41 | skipping: no hosts matched
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY [Gather facts from overcloud] *********************************************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY [Load global variables] ***************************************************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | TASK [include_vars] ************************************************************
2021-01-28 22:16:41 | Thursday 28 January 2021  22:16:38 +0000 (0:00:02.824)       0:00:02.943 ****** 
2021-01-28 22:16:41 | ok: [controller-0] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY [Render all_nodes data as group_vars for overcloud] ***********************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | TASK [Render all_nodes data as group_vars for overcloud] ***********************
2021-01-28 22:16:41 | Thursday 28 January 2021  22:16:38 +0000 (0:00:00.089)       0:00:03.032 ****** 
2021-01-28 22:16:41 | ok: [controller-0] => {"all_nodes": null, "changed": false}
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY [Set all_nodes data as group_vars for overcloud] **************************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | TASK [Set all_nodes data as group_vars for overcloud] **************************
2021-01-28 22:16:41 | Thursday 28 January 2021  22:16:40 +0000 (0:00:02.170)       0:00:05.203 ****** 
2021-01-28 22:16:41 | ok: [controller-0] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY [Manage SELinux] **********************************************************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY [Generate /etc/hosts] *****************************************************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY [Common roles for TripleO servers] ****************************************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY [Overcloud deploy step tasks for step 0] **********************************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY [Server pre deployment steps] *********************************************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY [Server deployments] ******************************************************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY [Host prep steps] *********************************************************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | TASK [tripleo_provision_mcelog : Gather variables for each operating system] ***
2021-01-28 22:16:41 | Thursday 28 January 2021  22:16:40 +0000 (0:00:00.197)       0:00:05.401 ****** 
2021-01-28 22:16:41 | fatal: [controller-0]: FAILED! => {"msg": "No file was found when using first_found. Use errors='ignore' to allow this task to be skipped if no files are found"}
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | NO MORE HOSTS LEFT *************************************************************
2021-01-28 22:16:41 | 
2021-01-28 22:16:41 | PLAY RECAP *********************************************************************
2021-01-28 22:16:41 | controller-0               : ok=4    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

Comment 3 Jesse Pretorius 2021-01-29 11:17:13 UTC
Looking into the undercloud's /var/lib/mistral/<uuid>/deploy_steps_playbook.yaml

Its failing in this section:

- hosts: Controller:overcloud
  name: Host prep steps
  become: true
  gather_facts: "{{ gather_facts | default(false) }}"
  any_errors_fatal: yes
  vars:
    bootstrap_server_id: <redacted>
    deploy_identifier: <redacted>
    enable_debug: True
    enable_puppet: True
    enable_paunch: True
    container_cli: podman
    container_log_stdout_path: /var/log/containers/stdouts
    container_healthcheck_disabled: False
    docker_puppet_debug: False
    docker_puppet_process_count: 8
    docker_puppet_mount_host_puppet: True
  tasks:
    - name: Controller Host prep block
      when:
        - tripleo_role_name == 'Controller'
      block:
        - name: Controller Host prep steps
          delegate_to: localhost
          run_once: true
          debug:
            msg: Use --start-at-task 'Controller Host prep steps' to resume from this task
        - import_tasks: Controller/host_prep_tasks.yaml

***

And this part of Controller/host_prep_tasks.yaml

- import_role:
    name: tripleo_provision_mcelog
  name: import provision_mcelog
  when: false

***

This is then running https://opendev.org/openstack/tripleo-ansible/src/branch/stable/train/tripleo_ansible/roles/tripleo_provision_mcelog/tasks/main.yml which is failing the 'Gather variables for each operating system'. There are no vars files, so this task should just be skipping.

Comment 4 Jesse Pretorius 2021-01-29 11:27:44 UTC
Using the following playbook I can replicate the behaviour:

- hosts: localhost
  gather_facts: "{{ gather_facts }}"
  connection: local
  tasks:
    - name: Gather variables for each operating system
      include_vars: "{{ item }}"
      with_first_found:
        - skip: true
          files:
            - "{{ ansible_distribution | lower }}-{{ ansible_distribution_version | lower }}.yml"
            - "{{ ansible_distribution | lower }}-{{ ansible_distribution_major_version | lower }}.yml"
            - "{{ ansible_os_family | lower }}-{{ ansible_distribution_major_version | lower }}.yml"
            - "{{ ansible_distribution | lower }}.yml"
            - "{{ ansible_os_family | lower }}-{{ ansible_distribution_version.split('.')[0] }}.yml"
            - "{{ ansible_os_family | lower }}.yml"
      tags:
        - always

Example execution and output:

jpretori@jpretori-mac tripleo-heat-templates % ansible-playbook /tmp/test.yml -e gather_facts=yes
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
[WARNING]: Found variable using reserved name: gather_facts

PLAY [localhost] *******************************************************************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************************************************************
ok: [localhost]

TASK [Gather variables for each operating system] **********************************************************************************************************************************************************

PLAY RECAP *************************************************************************************************************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   

jpretori@jpretori-mac tripleo-heat-templates % ansible-playbook /tmp/test.yml -e gather_facts=no 
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
[WARNING]: Found variable using reserved name: gather_facts

PLAY [localhost] *******************************************************************************************************************************************************************************************

TASK [Gather variables for each operating system] **********************************************************************************************************************************************************
fatal: [localhost]: FAILED! => 
  msg: No file was found when using first_found. Use errors='ignore' to allow this task to be skipped if no files are found

PLAY RECAP *************************************************************************************************************************************************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

Comment 18 errata-xmlrpc 2021-03-17 15:36:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817