Bug 1892679
| Summary: | 'Overcloud Deployed with error' but 'openstack overcloud failures' shows no ansible error log. | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Sam Wan <Sam.Wan> | ||||||||
| Component: | openstack-tripleo-heat-templates | Assignee: | RHOS Maint <rhos-maint> | ||||||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | David Rosenfeld <drosenfe> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 16.1 (Train) | CC: | arkady_kanevsky, aschultz, a.stripeikis, gael_rehault, gcharot, jeanpierre.roquesalane, kecarter, kholtz, kurt_hey, mburns, morazi, vladislav.belogrudov | ||||||||
| Target Milestone: | --- | Keywords: | Triaged | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2020-11-25 16:11:25 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1824852, 1861408 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Sam Wan
2020-10-29 12:41:26 UTC
my deploy command and options
============================
(undercloud) [stack@elabdir135 ~]$ more deploy.sh
openstack overcloud deploy \
--templates \
-e /home/stack/templates/node-info.yaml \
-e /home/stack/containers-prepare-parameter.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/templates/network-environment.yaml \
-n /home/stack/templates/network-data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml \
-e /home/stack/templates/dellemc-powerflex-cinder-config.yaml \
-e /home/stack/templates/dellemc-powerflex-volume-mappings.yaml \
-e /home/stack/templates/ntp.yaml \
--ntp-server 192.168.205.1
==========================
We have extra volumes for cinder/nova/glance container in /home/stack/templates/dellemc-powerflex-volume-mappings.yaml
===========================
$ more /home/stack/templates/dellemc-powerflex-volume-mappings.yaml
parameter_defaults:
NovaComputeOptVolumes:
- /opt/emc/scaleio/openstack:/opt/emc/scaleio/openstack
CinderVolumeOptVolumes:
- /opt/emc/scaleio/openstack:/opt/emc/scaleio/openstack
GlanceApiOptVolumes:
- /opt/emc/scaleio/openstack:/opt/emc/scaleio/openstack
========================
And when I first ran 'overcloud deploy', there's no /opt/emc dir on overcloud nodes yet.
SO I login into overcloud notes and make the dir.
================================
[heat-admin@elabdir135ctl0 ~]$ ls -ld /opt/emc/scaleio/openstack
drwxr-xr-x. 2 root root 28 Oct 29 08:47 /opt/emc/scaleio/openstack
[heat-admin@elabdir135ctl0 ~]$
..
[heat-admin@elabdir135com0 ~]$ ls -ld /opt/emc/scaleio/openstack
drwxr-xr-x. 2 root root 28 Oct 29 08:48 /opt/emc/scaleio/openstack
[heat-admin@elabdir135com0 ~]$
...
[heat-admin@elabdir135com1 ~]$ ls -ld /opt/emc/scaleio/openstack
drwxr-xr-x. 2 root root 28 Oct 29 08:48 /opt/emc/scaleio/openstack
[heat-admin@elabdir135com1 ~]$
================================
And re-run overcloud deploy but failed again without more details
===============================
...
Removing short term keys locally
Enabling ssh admin - COMPLETE.
Overcloud Endpoint: http://10.1.27.124:5000
Overcloud Horizon Dashboard URL: http://10.1.27.124:80/dashboard
Overcloud rc file: /home/stack/overcloudrc
Overcloud Deployed with error
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.205.2', 56938)>
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.205.2', 34450), raddr=('192.168.205.2', 13004)>
sys:1: ResourceWarning: unclosed <ss
================
====================
(undercloud) [stack@elabdir135 ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
| ID | Stack Name | Project | Stack Status | Creation Time | Updated Time |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
| 1de2b203-2337-4722-8384-60dd95002004 | overcloud | eabeb20352e44233b874b0a789941de6 | UPDATE_COMPLETE | 2020-10-29T10:47:05Z | 2020-10-29T12:53:03Z |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
(undercloud) [stack@elabdir135 ~]$
=============================
Sam, is this independent from what storage backend we use? Or is it specific backend that is impacted? Powerflex? openstack stack failures list is no longer a valid way to look for errors as of OSP16 since the errors are in ansible and no longer in heat. If `openstack overcloud failures` is empty, it appears ansible hasn't been run. You would need to look at the mistral executor log to see where it failed. Please provide logs from the undercloud. Created attachment 1725226 [details]
executor log during deploy
Hi Alex, I delete the overcloud and re-run deploy. same issue. please check attached executor.log.gz during the deployment. thanks and regards Sam Hi Alex,
I re-ran deploy with debug turned on.
and this time, there's some information that might be helpful.
=========================
Overcloud Deployed with error
Config download already in progress with execution id 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28 for stack overcloud
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/cliff/app.py", line 401, in run_subcommand
result = cmd.run(parsed_args)
File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run
super(Command, self).run(parsed_args)
File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
return super(Command, self).run(parsed_args)
File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run
return_code = self.take_action(parsed_args) or 0
File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1100, in take_action
raise(deploy_trace)
File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1072, in take_action
limit_nodes=parsed_args.limit
File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 388, in config_download
stack.stack_name)
tripleoclient.exceptions.ConfigDownloadInProgress: Config download already in progress with execution id 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28 for stack overcloud
clean_up DeployOvercloud: Config download already in progress with execution id 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28 for stack overcloud
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/osc_lib/shell.py", line 136, in run
ret_val = super(OpenStackShell, self).run(argv)
File "/usr/lib/python3.6/site-packages/cliff/app.py", line 281, in run
result = self.run_subcommand(remainder)
File "/usr/lib/python3.6/site-packages/osc_lib/shell.py", line 176, in run_subcommand
ret_value = super(OpenStackShell, self).run_subcommand(argv)
File "/usr/lib/python3.6/site-packages/cliff/app.py", line 401, in run_subcommand
result = cmd.run(parsed_args)
File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run
super(Command, self).run(parsed_args)
File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
return super(Command, self).run(parsed_args)
File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run
return_code = self.take_action(parsed_args) or 0
File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1100, in take_action
raise(deploy_trace)
File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1072, in take_action
limit_nodes=parsed_args.limit
File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 388, in config_download
stack.stack_name)
tripleoclient.exceptions.ConfigDownloadInProgress: Config download already in progress with execution id 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28 for stack overcloud
END return value: 1
======================================
==========================================
(undercloud) [stack@elabdir135 ~]$ openstack workflow execution show 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28
+--------------------+----------------------------------------------+
| Field | Value |
+--------------------+----------------------------------------------+
| ID | 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28 |
| Workflow ID | f5c2ee12-aa17-4a31-b0d6-124a0749a5bf |
| Workflow name | tripleo.deployment.v1.config_download_deploy |
| Workflow namespace | |
| Description | |
| Task Execution ID | <none> |
| Root Execution ID | <none> |
| State | RUNNING |
| State info | None |
| Created at | 2020-10-30 06:52:12 |
| Updated at | 2020-10-30 06:52:12 |
+--------------------+----------------------------------------------+
(undercloud) [stack@elabdir135 ~]$
====================================================
I've already rebooted the undercloud before I re-ran the deploy command.
It looks to me like some kind of bug.
please check attached detailed logs for the deploy command.
Created attachment 1725259 [details]
deploy with --debug
If you rebooted the system while it was running, you'll need to manually clear up the previous execution so you can deploy. openstack workflow execution delete --force 3ae8d6f2-3702-4238-8be3-f4ddcae4fb28 The executor log didn't have the previous execution information so I don't know what originally happened. Can you please provide one that covers around the 2020-10-29T10:47:05Z timeframe? The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |