Bug 1819085

Summary: [osp16] overcloud update run fails on compute node on malformed block in Compute/update_tasks.yaml
Product: Red Hat OpenStack Reporter: Sofer Athlan-Guyot <sathlang>
Component: openstack-tripleo-heat-templatesAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.0 (Train)CC: mburns
Target Milestone: z2Keywords: Regression, Triaged
Target Release: 16.0 (Train on RHEL 8.1)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200405044622.ec9970c.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-14 12:16:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sofer Athlan-Guyot 2020-03-31 07:40:42 UTC
Description of problem: Hi,  overcloud update run for the compute group fails with:

2020-03-26 05:38:59 | TASK [include_tasks] ***********************************************************
2020-03-26 05:38:59 | Thursday 26 March 2020  05:38:57 +0000 (0:00:00.185)       0:00:30.227 ******** 
2020-03-26 05:38:59 | fatal: [compute-0]: FAILED! => {"reason": "A malformed block was encountered while loading a block\n\nThe error appears to be in '/var/lib/mistral/4a5aae95-729d-493b-8932-1dc494624eee/Compute/update_tasks.yaml': line 35, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  when: step|int == 3\n- block:\n  ^ here\n"}
2020-03-26 05:38:59 | fatal: [compute-1]: FAILED! => {"reason": "A malformed block was encountered while loading a block\n\nThe error appears to be in '/var/lib/mistral/4a5aae95-729d-493b-8932-1dc494624eee/Compute/update_tasks.yaml': line 35, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  when: step|int == 3\n- block:\n  ^ here\n"}
2020-03-26 05:38:59 | 
2020-03-26 05:38:59 | PLAY RECAP *********************************************************************
2020-03-26 05:38:59 | compute-0                  : ok=16   changed=3    unreachable=0    failed=1    skipped=5    rescued=0    ignored=0   
2020-03-26 05:38:59 | compute-1                  : ok=15   changed=3    unreachable=0    failed=1    skipped=5    rescued=0    ignored=0   

The tht version is updated to:

openstack-tripleo-heat-templates.noarch       11.3.2-0.20200324120625.c3a8eb4.el8ost          @rhelosp-16.0     


The block created is indeed faulty:


- block:
    include_role:
      name: tripleo-systemd-wrapper
    loop:
    - cmd: $(if [ -f /usr/sbin/haproxy-systemd-wrapper ]; then echo "/usr/sbin/haproxy
        -Ds"; else echo "/usr/sbin/haproxy -Ws"; fi)
      kill_script: haproxy-kill
      name: ovn_metadata_haproxy
    loop_control:
      loop_var: ovn_wrapper_item
    name: set conditions
    set_fact:
      debug_enabled: true
      haproxy_wrapper_enabled: true
    vars:
      tripleo_systemd_wrapper_cmd: '{{ ovn_wrapper_item.cmd }}'
      tripleo_systemd_wrapper_config_bind_mount: /var/lib/config-data/puppet-generated/neutron:/etc/neutron:ro
      tripleo_systemd_wrapper_container_cli: '{{ container_cli }}'
      tripleo_systemd_wrapper_debug: '{{ debug_enabled }}'
      tripleo_systemd_wrapper_docker_additional_sockets:
      - /var/lib/openstack/docker.sock
      tripleo_systemd_wrapper_image_name: undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200324.1
      tripleo_systemd_wrapper_service_dir: /var/lib/neutron
      tripleo_systemd_wrapper_service_kill_script: '{{ ovn_wrapper_item.kill_script
        }}'
      tripleo_systemd_wrapper_service_name: '{{ ovn_wrapper_item.name }}'
  when: step|int == 1

relative to Ia898cb8b1c888aca45ba58ab99e61885a1da4f4e this is unexpected.                  


How reproducible: all osp16 jobs are affected (from ga or z1)

Comment 11 errata-xmlrpc 2020-05-14 12:16:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2114