Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1892679

Summary: 'Overcloud Deployed with error' but 'openstack overcloud failures' shows no ansible error log.
Product: Red Hat OpenStack Reporter: Sam Wan <Sam.Wan>
Component: openstack-tripleo-heat-templatesAssignee: RHOS Maint <rhos-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: David Rosenfeld <drosenfe>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: arkady_kanevsky, aschultz, a.stripeikis, gael_rehault, gcharot, jeanpierre.roquesalane, kecarter, kholtz, kurt_hey, mburns, morazi, vladislav.belogrudov
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-25 16:11:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1824852, 1861408    
Attachments:
Description Flags
deploy output
none
executor log during deploy
none
deploy with --debug none

Description Sam Wan 2020-10-29 12:41:26 UTC
Created attachment 1725061 [details]
deploy output

Description of problem:

Try to deploy overcloud and get this error:
===============================
...
2020-10-29 10:56:21Z [overcloud]: CREATE_COMPLETE  Stack CREATE completed successfully

 Stack overcloud/1de2b203-2337-4722-8384-60dd95002004 CREATE_COMPLETE

Deploying overcloud configuration
Enabling ssh admin (tripleo-admin) for hosts:
192.168.205.22 192.168.205.8 192.168.205.21
Using ssh user heat-admin for initial connection.
Using ssh key at /home/stack/.ssh/id_rsa for initial connection.
Inserting TripleO short term key for 192.168.205.22
Inserting TripleO short term key for 192.168.205.8
Inserting TripleO short term key for 192.168.205.21
Starting ssh admin enablement workflow
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - COMPLETE.
Removing TripleO short term key from 192.168.205.22
Removing TripleO short term key from 192.168.205.8
Removing TripleO short term key from 192.168.205.21
Removing short term keys locally
Enabling ssh admin - COMPLETE.
Overcloud Endpoint: http://10.1.27.124:5000
Overcloud Horizon Dashboard URL: http://10.1.27.124:80/dashboard
Overcloud rc file: /home/stack/overcloudrc
Overcloud Deployed with error
===============================

'openstack overcloud failures' shows ansible errors file not found.
===============================
(undercloud) [stack@elabdir135 ~]$ openstack overcloud failures
Ansible errors file not found at /var/lib/mistral/overcloud/ansible-errors.json
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.205.2', 40534), raddr=('192.168.205.2', 13000)>
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.205.2', 33994), raddr=('192.168.205.2', 13989)>
(undercloud) [stack@elabdir135 ~]$
===============================

Version-Release number of selected component (if applicable):
================================
(undercloud) [stack@elabdir135 ~]$ more /etc/rhosp-release
Red Hat OpenStack Platform release 16.1.2 GA (Train)
(undercloud) [stack@elabdir135 ~]$ rpm -qa|grep -i openstack
openstack-heat-agents-1.10.1-0.20200311091123.96b819c.el8ost.noarch
python3-openstackclient-4.0.1-1.20200817092223.bff556c.el8ost.noarch
openstack-tripleo-image-elements-10.6.2-0.20200528043425.7dc0fa1.el8ost.noarch
ansible-role-openstack-operations-0.0.1-0.20200311080930.274739e.el8ost.noarch
openstack-ironic-python-agent-builder-2.1.1-1.20200914175356.65d0f80.el8ost.noarch
puppet-openstacklib-15.4.1-0.20200403203429.5fdf43c.el8ost.noarch
openstack-tripleo-common-11.4.1-1.20200914165651.el8ost.noarch
openstack-tripleo-validations-11.3.2-1.20200914170825.el8ost.noarch
openstack-heat-monolith-13.0.3-1.20200914171254.48b730a.el8ost.noarch
python3-openstacksdk-0.36.4-0.20200715054250.76d3b29.el8ost.noarch
openstack-tripleo-heat-templates-11.3.2-1.20200914170156.el8ost.noarch
openstack-heat-api-13.0.3-1.20200914171254.48b730a.el8ost.noarch
openstack-selinux-0.8.24-1.20200914163011.26243bf.el8ost.noarch
puppet-openstack_extras-15.4.1-0.20200528113453.371931c.el8ost.noarch
openstack-tripleo-common-containers-11.4.1-1.20200914165651.el8ost.noarch
openstack-heat-engine-13.0.3-1.20200914171254.48b730a.el8ost.noarch
openstack-tripleo-puppet-elements-11.2.2-0.20200701163410.432518a.el8ost.noarch
python-openstackclient-lang-4.0.1-1.20200817092223.bff556c.el8ost.noarch
openstack-heat-common-13.0.3-1.20200914171254.48b730a.el8ost.noarch
(undercloud) [stack@elabdir135 ~]$

================================

How reproducible:


Steps to Reproduce:
1. openstack overcloud deploy  ...
2.
3.

Actual results:

failed with no detailed error and logs

Expected results:

there should be some logs errors or message at failures.

Additional info:


'openstack stack failures list' shows no error
===============================
(undercloud) [stack@elabdir135 ~]$ openstack stack failures list overcloud
(undercloud) [stack@elabdir135 ~]$ 
===============================

'openstack stack list' shows the overcloud is created
===============================
(undercloud) [stack@elabdir135 ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+
| ID                                   | Stack Name | Project                          | Stack Status    | Creation Time        | Updated Time |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+
| 1de2b203-2337-4722-8384-60dd95002004 | overcloud  | eabeb20352e44233b874b0a789941de6 | CREATE_COMPLETE | 2020-10-29T10:47:05Z | None         |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+
(undercloud) [stack@elabdir135 ~]$ 
===============================

but when login into the overcloud controller, there's no  container images.
===============================
(undercloud) [stack@elabdir135 ~]$ nova list
+--------------------------------------+----------------+--------+------------+-------------+-------------------------+
| ID                                   | Name           | Status | Task State | Power State | Networks                |
+--------------------------------------+----------------+--------+------------+-------------+-------------------------+
| d9fc6845-f47a-4e50-98b2-4c2cf7af4a23 | elabdir135com0 | ACTIVE | -          | Running     | ctlplane=192.168.205.8  |
| fc3e333d-2aa3-40a5-973d-92f827664196 | elabdir135com1 | ACTIVE | -          | Running     | ctlplane=192.168.205.21 |
| bb7393e3-b0e1-4429-9d4c-0b2fc8cee74c | elabdir135ctl0 | ACTIVE | -          | Running     | ctlplane=192.168.205.22 |
+--------------------------------------+----------------+--------+------------+-------------+-------------------------+
(undercloud) [stack@elabdir135 ~]$ ssh heat-admin.205.22
This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Last login: Thu Oct 29 08:23:06 2020 from 192.168.205.1
[heat-admin@elabdir135ctl0 ~]$ sudo podman images
REPOSITORY   TAG   IMAGE ID   CREATED   SIZE
[heat-admin@elabdir135ctl0 ~]$ sudo podman ps
CONTAINER ID  IMAGE  COMMAND  CREATED  STATUS  PORTS  NAMES
[heat-admin@elabdir135ctl0 ~]$
==================================

The overcloud install log is misleading.
It does not help troubleshooting the issue.

Comment 1 Sam Wan 2020-10-29 13:42:07 UTC
my deploy command and options
============================
(undercloud) [stack@elabdir135 ~]$ more deploy.sh
openstack overcloud deploy \
--templates \
-e /home/stack/templates/node-info.yaml \
-e /home/stack/containers-prepare-parameter.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/templates/network-environment.yaml \
-n /home/stack/templates/network-data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml \
-e /home/stack/templates/dellemc-powerflex-cinder-config.yaml \
-e /home/stack/templates/dellemc-powerflex-volume-mappings.yaml \
-e /home/stack/templates/ntp.yaml \
--ntp-server 192.168.205.1 
==========================

We have extra volumes for cinder/nova/glance container in /home/stack/templates/dellemc-powerflex-volume-mappings.yaml
===========================
$ more /home/stack/templates/dellemc-powerflex-volume-mappings.yaml
parameter_defaults:
  NovaComputeOptVolumes:
    - /opt/emc/scaleio/openstack:/opt/emc/scaleio/openstack
  CinderVolumeOptVolumes:
    - /opt/emc/scaleio/openstack:/opt/emc/scaleio/openstack
  GlanceApiOptVolumes:
    - /opt/emc/scaleio/openstack:/opt/emc/scaleio/openstack
========================

And when I first ran 'overcloud deploy', there's no /opt/emc dir on overcloud nodes yet.
SO I login into overcloud notes and make the dir.
================================
[heat-admin@elabdir135ctl0 ~]$ ls -ld /opt/emc/scaleio/openstack
drwxr-xr-x. 2 root root 28 Oct 29 08:47 /opt/emc/scaleio/openstack
[heat-admin@elabdir135ctl0 ~]$
..
[heat-admin@elabdir135com0 ~]$ ls -ld /opt/emc/scaleio/openstack
drwxr-xr-x. 2 root root 28 Oct 29 08:48 /opt/emc/scaleio/openstack
[heat-admin@elabdir135com0 ~]$
...
[heat-admin@elabdir135com1 ~]$ ls -ld /opt/emc/scaleio/openstack
drwxr-xr-x. 2 root root 28 Oct 29 08:48 /opt/emc/scaleio/openstack
[heat-admin@elabdir135com1 ~]$
================================

And re-run overcloud deploy but failed again without more details
===============================
...
Removing short term keys locally
Enabling ssh admin - COMPLETE.
Overcloud Endpoint: http://10.1.27.124:5000
Overcloud Horizon Dashboard URL: http://10.1.27.124:80/dashboard
Overcloud rc file: /home/stack/overcloudrc
Overcloud Deployed with error
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.205.2', 56938)>
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.205.2', 34450), raddr=('192.168.205.2', 13004)>
sys:1: ResourceWarning: unclosed <ss
================

====================
(undercloud) [stack@elabdir135 ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Project                          | Stack Status    | Creation Time        | Updated Time         |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
| 1de2b203-2337-4722-8384-60dd95002004 | overcloud  | eabeb20352e44233b874b0a789941de6 | UPDATE_COMPLETE | 2020-10-29T10:47:05Z | 2020-10-29T12:53:03Z |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
(undercloud) [stack@elabdir135 ~]$
=============================

Comment 2 arkady kanevsky 2020-10-29 14:20:43 UTC
Sam,
is this independent from what storage backend we use? Or is it specific backend that is impacted? Powerflex?

Comment 3 Alex Schultz 2020-10-29 19:39:09 UTC
openstack stack failures list is no longer a valid way to look for errors as of OSP16 since the errors are in ansible and no longer in heat.  If `openstack overcloud failures` is empty, it appears ansible hasn't been run. You would need to look at the mistral executor log to see where it failed. Please provide logs from the undercloud.

Comment 4 Sam Wan 2020-10-30 05:23:34 UTC
Created attachment 1725226 [details]
executor log during deploy

Comment 5 Sam Wan 2020-10-30 05:24:14 UTC
Hi Alex,

I delete the overcloud and re-run deploy. 
same issue.
please check attached executor.log.gz during the deployment.
thanks and regards
Sam

Comment 6 Sam Wan 2020-10-30 10:18:47 UTC
Hi Alex,

I re-ran deploy with debug turned on.
and this time, there's some information that might be helpful.
=========================
Overcloud Deployed with error
Config download already in progress with execution id 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28 for stack overcloud
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cliff/app.py", line 401, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1100, in take_action
    raise(deploy_trace)
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1072, in take_action
    limit_nodes=parsed_args.limit
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 388, in config_download
    stack.stack_name)
tripleoclient.exceptions.ConfigDownloadInProgress: Config download already in progress with execution id 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28 for stack overcloud
clean_up DeployOvercloud: Config download already in progress with execution id 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28 for stack overcloud
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/osc_lib/shell.py", line 136, in run
    ret_val = super(OpenStackShell, self).run(argv)
  File "/usr/lib/python3.6/site-packages/cliff/app.py", line 281, in run
    result = self.run_subcommand(remainder)
  File "/usr/lib/python3.6/site-packages/osc_lib/shell.py", line 176, in run_subcommand
    ret_value = super(OpenStackShell, self).run_subcommand(argv)
  File "/usr/lib/python3.6/site-packages/cliff/app.py", line 401, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1100, in take_action
    raise(deploy_trace)
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1072, in take_action
    limit_nodes=parsed_args.limit
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 388, in config_download
    stack.stack_name)
tripleoclient.exceptions.ConfigDownloadInProgress: Config download already in progress with execution id 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28 for stack overcloud

END return value: 1
======================================

==========================================
 (undercloud) [stack@elabdir135 ~]$ openstack workflow execution show 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28
+--------------------+----------------------------------------------+
| Field              | Value                                        |
+--------------------+----------------------------------------------+
| ID                 | 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28         |
| Workflow ID        | f5c2ee12-aa17-4a31-b0d6-124a0749a5bf         |
| Workflow name      | tripleo.deployment.v1.config_download_deploy |
| Workflow namespace |                                              |
| Description        |                                              |
| Task Execution ID  | <none>                                       |
| Root Execution ID  | <none>                                       |
| State              | RUNNING                                      |
| State info         | None                                         |
| Created at         | 2020-10-30 06:52:12                          |
| Updated at         | 2020-10-30 06:52:12                          |
+--------------------+----------------------------------------------+
(undercloud) [stack@elabdir135 ~]$
====================================================

I've already rebooted the undercloud before I re-ran the deploy command.
It looks to me like some kind of bug.

please check attached detailed logs for the deploy command.

Comment 7 Sam Wan 2020-10-30 10:19:20 UTC
Created attachment 1725259 [details]
deploy with --debug

Comment 8 Alex Schultz 2020-10-30 14:35:25 UTC
If you rebooted the system while it was running, you'll need to manually clear up the previous execution so you can deploy.

openstack workflow execution delete --force 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28

The executor log didn't have the previous execution information so I don't know what originally happened.  Can you please provide one that covers around the 2020-10-29T10:47:05Z timeframe?

Comment 9 Red Hat Bugzilla 2023-09-14 06:09:35 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days