Bug 1772201 - After a minor update to 13z9 , "overcloud deploy" fails on compute steps
Summary: After a minor update to 13z9 , "overcloud deploy" fails on compute steps
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Alex Schultz
QA Contact: Sasha Smolyak
URL:
Whiteboard:
: 1772955 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-13 21:33 UTC by David Hill
Modified: 2023-10-06 18:46 UTC (History)
6 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.4.1-21.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-10 11:22:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1852606 0 None None None 2019-11-14 16:03:18 UTC
OpenStack gerrit 695758 0 'None' MERGED Check that facter.conf is a file 2021-01-15 16:30:10 UTC
Red Hat Issue Tracker OSP-28291 0 None None None 2023-09-07 21:02:12 UTC
Red Hat Knowledge Base (Solution) 4584141 0 None None None 2019-11-13 21:45:57 UTC
Red Hat Product Errata RHBA-2020:0760 0 None None None 2020-03-10 11:22:45 UTC

Description David Hill 2019-11-13 21:33:12 UTC
Description of problem:
After a minor update to 13z9 ,  "overcloud deploy" fails on compute steps:


2019-11-12 22:24:24Z [overcloud-AllNodesDeploySteps-uby2uxg5d4wh-ComputeDeployment_Step1-ncqbfiswytpn.2]: CREATE_FAILED  Error: resources[2]: Deployment to server failed: deploy_status_code : D
eployment exited with non-zero status code: 2
2019-11-12 22:24:24Z [overcloud-AllNodesDeploySteps-uby2uxg5d4wh-ComputeDeployment_Step1-ncqbfiswytpn]: UPDATE_FAILED  Resource CREATE failed: Error: resources[2]: Deployment to server failed:
deploy_status_code : Deployment exited with non-zero status code: 2
2019-11-12 22:24:25Z [overcloud-AllNodesDeploySteps-uby2uxg5d4wh.ComputeDeployment_Step1]: UPDATE_FAILED  resources.ComputeDeployment_Step1: Resource CREATE failed: Error: resources[2]: Deploym
ent to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
2019-11-12 22:24:25Z [overcloud-AllNodesDeploySteps-uby2uxg5d4wh]: UPDATE_FAILED  Resource UPDATE failed: resources.ComputeDeployment_Step1: Resource CREATE failed: Error: resources[2]: Deploym
ent to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
2019-11-12 22:24:25Z [AllNodesDeploySteps]: UPDATE_FAILED  resources.ComputeDeployment_Step1: resources.AllNodesDeploySteps.Resource CREATE failed: Error: resources[2]: Deployment to server
 failed: deploy_status_code: Deployment exited with non-zero status code: 2
2019-11-12 22:24:25Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: resources.ComputeDeployment_Step1: resources.AllNodesDeploySteps.Resource CREATE failed: Error: resources[2]: Deployment
 to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2019-11-12 22:24:27Z [overcloud-AllNodesDeploySteps-uby2uxg5d4wh-ComputeDeployment_Step1-ncqbfiswytpn.1]: SIGNAL_IN_PROGRESS  Signal: deployment f5a25447-4dd1-4955-ae55-38e3fd7998a7 failed (2)
2019-11-12 22:24:28Z [overcloud-AllNodesDeploySteps-uby2uxg5d4wh-ComputeDeployment_Step1-ncqbfiswytpn.1]: CREATE_FAILED  Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
2019-11-12 22:24:28Z [overcloud-AllNodesDeploySteps-uby2uxg5d4wh-ComputeDeployment_Step1-ncqbfiswytpn]: UPDATE_FAILED  Resource CREATE failed: Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2

 Stack overcloud UPDATE_FAILED

overcloud.AllNodesDeploySteps.ComputeDeployment_Step1.1:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: f5a25447-4dd1-4955-ae55-38e3fd7998a7
  
status: CREATE_FAILED
  status_reason: |
    Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
    TASK [Create puppet caching structures] ****************************************
    changed: [localhost]

    TASK [Write facter cache config] ***********************************************
    fatal: [localhost]: FAILED! => {"changed": false, "msg": "can not use content with a dir as dest"}
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/0a1e882a-daa3-4132-84c3-ed9396b5fcf7_playbook.retry

    PLAY RECAP *********************************************************************
    localhost                  : ok=27   changed=9    unreachable=0    failed=1

    (truncated, view all with --long)
  deploy_stderr: |
Version-Release number of selected component (if applicable):


How reproducible:
This environment

Steps to Reproduce:
1. Update from z8 to z9 , complete update an then run an 'overcloud deploy'
2.
3.

Actual results:
Fails 

Expected results:
Succeeds

Additional info:

Comment 2 Alex Schultz 2019-11-14 15:49:13 UTC
It looks as though the configs were generated for fluentd/collectd/sensu prior to the facter cache being generated. The execution of the container configs for these containers would have created the facter.conf as a directory if it didn't previously exist.

In the logs we see:

Nov 11 15:17:49 ocd97-compute-0 systemd: Started libcontainer container 40abe582f905223301c821238d8c2272ac54170d30321b134243c508d067101f.
Nov 11 15:17:49 ocd97-compute-0 journal: + mkdir -p /etc/puppet
Nov 11 15:17:49 ocd97-compute-0 journal: + mkdir -p /etc/puppet
Nov 11 15:17:49 ocd97-compute-0 journal: + mkdir -p /etc/puppet
Nov 11 15:17:49 ocd97-compute-0 journal: + cp -a /tmp/puppet-etc/auth.conf /tmp/puppet-etc/hiera.yaml /tmp/puppet-etc/hieradata /tmp/puppet-etc/modules /tmp/puppet-etc/puppet.conf /tmp/puppet-etc/ssl /etc/puppet
Nov 11 15:17:49 ocd97-compute-0 journal: + cp -a /tmp/puppet-etc/auth.conf /tmp/puppet-etc/hiera.yaml /tmp/puppet-etc/hieradata /tmp/puppet-etc/modules /tmp/puppet-etc/puppet.conf /tmp/puppet-etc/ssl /etc/puppet
Nov 11 15:17:49 ocd97-compute-0 journal: + cp -a /tmp/puppet-etc/auth.conf /tmp/puppet-etc/hiera.yaml /tmp/puppet-etc/hieradata /tmp/puppet-etc/modules /tmp/puppet-etc/puppet.conf /tmp/puppet-etc/ssl /etc/puppet
Nov 11 15:17:49 ocd97-compute-0 journal: + rm -Rf /etc/puppet/ssl
Nov 11 15:17:49 ocd97-compute-0 journal: + rm -Rf /etc/puppet/ssl
Nov 11 15:17:49 ocd97-compute-0 journal: + rm -Rf /etc/puppet/ssl
Nov 11 15:17:49 ocd97-compute-0 journal: + echo '{"step": 6}'
Nov 11 15:17:49 ocd97-compute-0 journal: + echo '{"step": 6}'
Nov 11 15:17:49 ocd97-compute-0 journal: + echo '{"step": 6}'
Nov 11 15:17:49 ocd97-compute-0 journal: + TAGS=
Nov 11 15:17:49 ocd97-compute-0 journal: + TAGS=
Nov 11 15:17:49 ocd97-compute-0 journal: + '[' -n file,file_line,concat,augeas,cron,sensu_rabbitmq_config,sensu_client_config,sensu_check_config,sensu_check ']'
Nov 11 15:17:49 ocd97-compute-0 journal: + '[' -n file,file_line,concat,augeas,cron,collectd_client_config ']'
Nov 11 15:17:49 ocd97-compute-0 journal: + TAGS='--tags file,file_line,concat,augeas,cron,collectd_client_config'
Nov 11 15:17:49 ocd97-compute-0 journal: + TAGS='--tags file,file_line,concat,augeas,cron,sensu_rabbitmq_config,sensu_client_config,sensu_check_config,sensu_check'
Nov 11 15:17:49 ocd97-compute-0 journal: + TAGS=
Nov 11 15:17:49 ocd97-compute-0 journal: + origin_of_time=/var/lib/config-data/collectd.origin_of_time
Nov 11 15:17:49 ocd97-compute-0 journal: + touch /var/lib/config-data/collectd.origin_of_time
Nov 11 15:17:49 ocd97-compute-0 journal: + origin_of_time=/var/lib/config-data/sensu.origin_of_time
Nov 11 15:17:49 ocd97-compute-0 journal: + touch /var/lib/config-data/sensu.origin_of_time
Nov 11 15:17:49 ocd97-compute-0 journal: + '[' -n file,file_line,concat,augeas,cron,config ']'
Nov 11 15:17:49 ocd97-compute-0 journal: + TAGS='--tags file,file_line,concat,augeas,cron,config'
Nov 11 15:17:49 ocd97-compute-0 journal: + origin_of_time=/var/lib/config-data/fluentd.origin_of_time
Nov 11 15:17:49 ocd97-compute-0 journal: + touch /var/lib/config-data/fluentd.origin_of_time


This is the start of docker-puppet.py which would mount the facter.conf into the container being used to run this script. The facter.conf creation doesn't happen until 16:02

Nov 11 16:02:49 ocd97-compute-0 python: ansible-stat Invoked with checksum_algorithm=sha1 get_checksum=True follow=False checksum_algo=sha1 path=/var/lib/container-puppet/puppetlabs/facter.conf get_md5=None get_mime=True get_attributes=True

Since this is a folder now, the deployment fails:
TASK [Write facter cache config] ***********************************************
fatal: [localhost]: FAILED! => {\"changed\": false, \"msg\": \"can not use content with a dir as dest\"}
\tto retry, use: --limit @/var/lib/heat-config/heat-config-ansible/ceeec832-1552-48ac-9998-a748e883641d_playbook.retry


I'll need to look into the deployment templates to understand why collectd/sensu/fluentd config execution occurred prior to the actual deployment.


@dhill, Do you know if the customer is using an out of band configuration for collectd/sunsu/fluentd?

Comment 3 Alex Schultz 2019-11-14 15:50:48 UTC
Also can we get the templates and undercloud logs?

Comment 4 Alex Schultz 2019-11-14 16:03:18 UTC
In the mean time, i'll work on adding some additional checks to ensure we don't hit this case again.

Comment 9 Alex Schultz 2020-01-30 21:19:45 UTC
*** Bug 1772955 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2020-03-10 11:22:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0760


Note You need to log in before you can comment on or make changes to this bug.