Bug 1592785 - stack overcloud DELETE FAILED
Summary: stack overcloud DELETE FAILED
Keywords:
Status: CLOSED DUPLICATE of bug 1581364
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-19 10:04 UTC by Mike Abrams
Modified: 2020-12-21 19:36 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-24 16:38:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ironic-conductor log (956.04 KB, application/x-gzip)
2018-06-22 15:17 UTC, Alex Schultz
no flags Details

Description Mike Abrams 2018-06-19 10:04:29 UTC
Description of problem:
when trying to delete a failed stack with 'openstack stack delete overcloud --wait --yes', the delete process fails.

Version-Release number of selected component (if applicable):
13   -p 2018-06-15.2

How reproducible:
sometimes

Steps to Reproduce:
1. install rhos13 with 3 controllers, 2 computes
2. include /home/stack/swift.yaml in overcloud_deploy.sh
3. include barbican parameters in overcloud_deploy.sh
4. deploy will fail
5. try to delete the stack using:  openstack stack delete overcloud --wait --yes

Actual results:
(undercloud) [stack@undercloud-0 ~]$ openstack stack delete overcloud --wait --yes
2018-06-19 09:51:42Z [overcloud]: DELETE_IN_PROGRESS  Stack DELETE started
2018-06-19 09:51:43Z [overcloud.Compute]: DELETE_IN_PROGRESS  state changed
2018-06-19 09:51:43Z [overcloud.Compute]: DELETE_FAILED  ResourceInError: resources.Compute.resources[0].resources.NovaCompute: Went to status ERROR due to "Server compute-0 delete failed: (500) Error destroying the instance on node ca8d3c45-d79b-4217-a08f-64c3233f01d1. Provision state still 'deleting'."
2018-06-19 09:51:43Z [overcloud]: DELETE_FAILED  Resource DELETE failed: ResourceInError: resources.Compute.resources[0].resources.NovaCompute: Went to status ERROR due to "Server compute-0 delete failed: (500) Error destroying the instance on node ca8d3c45-d79b-4217-a08f-64c3233f01d1. Provision state s

 Stack overcloud DELETE_FAILED 

Unable to delete 1 of the 1 stacks.
(undercloud) [stack@undercloud-0 ~]$

Expected results:
overcloud gets deleted

Additional info:

(undercloud) [stack@undercloud-0 ~]$ cat overcloud_deploy.sh
#!/bin/bash

openstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /home/stack/virt/config_lvm.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /home/stack/virt/extra_templates.yaml \
-e /home/stack/virt/docker-images.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services/barbican.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/barbican-backend-simple-crypto.yaml \
-e /home/stack/swift.yaml \
--log-file overcloud_deployment_69.log
(undercloud) [stack@undercloud-0 ~]$ cat swift.yaml 
parameter_defaults:
  BarbicanSimpleCryptoGlobalDefault: True
  SwiftEncryptionEnabled: True
  DockerInsecureRegistryAddress: rhos-qe-mirror-tlv.usersys.redhat.com:5000
  DockerBarbicanApiImage: rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp13/openstack-barbican-api:2018-06-15.2
  DockerBarbicanConfigImage: rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp13/openstack-barbican-api:2018-06-15.2
  DockerBarbicanKeystoneListenerConfigImage: rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp13/openstack-barbican-keystone-listener:2018-06-15.2
  DockerBarbicanKeystoneListenerImage: rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp13/openstack-barbican-keystone-listener:2018-06-15.2
  DockerBarbicanWorkerConfigImage: rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp13/openstack-barbican-worker:2018-06-15.2
  DockerBarbicanWorkerImage: rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp13/openstack-barbican-worker:2018-06-15.2
(undercloud) [stack@undercloud-0 ~]$

Comment 4 Alex Schultz 2018-06-22 15:17:16 UTC
Created attachment 1453757 [details]
ironic-conductor log

It likely was failing because IPMI failed. From a Heat perspective it just calls a server delete. Which talks to ironic/nova. Since it was still deleting (probably because it failed with impi) the stack delete failed.

2018-06-19 05:51:55.575 21918 ERROR ironic.drivers.modules.ipmitool [req-aca21f86-bd26-4a51-9962-4321a7516160 70a32a35f8ab413aacf1626134da7e1c df8a44d7ac5547eca5426e50b8ebe8a7 - default default] IPMI Error while a
ttempting "ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6234 -U admin -R 12 -N 5 -f /tmp/tmpe4jk39 power status" for node ca8d3c45-d79b-4217-a08f-64c3233f01d1. Error: Unexpected error while running comman
d.
Command: ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6234 -U admin -R 12 -N 5 -f /tmp/tmpe4jk39 power status
Exit code: 1
Stdout: u''
Stderr: u'Error: Unable to establish IPMI v2 / RMCP+ session\n': ProcessExecutionError: Unexpected error while running command.
2018-06-19 05:51:55.577 21918 WARNING ironic.drivers.modules.ipmitool [req-aca21f86-bd26-4a51-9962-4321a7516160 70a32a35f8ab413aacf1626134da7e1c df8a44d7ac5547eca5426e50b8ebe8a7 - default default] IPMI power statu
s failed for node ca8d3c45-d79b-4217-a08f-64c3233f01d1 with error: Unexpected error while running command.
Command: ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6234 -U admin -R 12 -N 5 -f /tmp/tmpe4jk39 power status
Exit code: 1
Stdout: u''
Stderr: u'Error: Unable to establish IPMI v2 / RMCP+ session\n'.: ProcessExecutionError: Unexpected error while running command.
2018-06-19 05:51:55.603 21918 DEBUG ironic.conductor.manager [req-0a87110a-2e95-4aa3-954a-81598360c859 70a32a35f8ab413aacf1626134da7e1c df8a44d7ac5547eca5426e50b8ebe8a7 - default default] RPC vif_detach called for
 the node ca8d3c45-d79b-4217-a08f-64c3233f01d1 with vif_id ea4f72d6-8329-4bfe-a010-ae00bbf743ab vif_detach /usr/lib/python2.7/site-packages/ironic/conductor/manager.py:2991
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager [req-aca21f86-bd26-4a51-9962-4321a7516160 70a32a35f8ab413aacf1626134da7e1c df8a44d7ac5547eca5426e50b8ebe8a7 - default default] Error in tear_down of nod
e ca8d3c45-d79b-4217-a08f-64c3233f01d1: IPMI call failed: power status.: IPMIFailure: IPMI call failed: power status.
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager Traceback (most recent call last):
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 909, in _do_node_tear_down
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     task.driver.deploy.tear_down(task)
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 60, in wrapped
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     result = f(*args, **kwargs)
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     return f(*args, **kwargs)
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 498, in tear_down
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     manager_utils.node_power_action(task, states.POWER_OFF)
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     return f(*args, **kwargs)
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/conductor/utils.py", line 209, in node_power_action
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     if _can_skip_state_change(task, new_state):
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/conductor/utils.py", line 168, in _can_skip_state_change
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     fields.NotificationStatus.ERROR, new_state)
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     self.force_reraise()
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     six.reraise(self.type_, self.value, self.tb)
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/conductor/utils.py", line 158, in _can_skip_state_change
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     curr_state = task.driver.power.get_power_state(task)
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 60, in wrapped
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     result = f(*args, **kwargs)
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/ipmitool.py", line 781, in get_power_state
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     return _power_status(driver_info)
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/ipmitool.py", line 564, in _power_status
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager     raise exception.IPMIFailure(cmd=cmd)
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager IPMIFailure: IPMI call failed: power status.
2018-06-19 05:51:55.644 21918 ERROR ironic.conductor.manager

Comment 5 Bob Fournier 2018-06-26 13:28:55 UTC
Are these virtual nodes or baremetal?  If virtual, the IPMI failures you are seeing are likely due to VBMC failures because of the livbirt bug in RHEL 7.4/7.5 - https://bugzilla.redhat.com/show_bug.cgi?id=1581364.  The location of a patch to install can be found in the bug, otherwise the next rhel release with the fix is pending.  For reference, see this similar delete bug which was due to the libvirt issue - https://bugzilla.redhat.com/show_bug.cgi?id=1549571.

If these are baremetal nodes, there is an issue with the BM hardware that is causing IPMI failures.  Most likely these are virtual nodes and you are hitting the libvirt issue though.

Comment 6 Bob Fournier 2018-06-26 15:53:15 UTC
Closing this as a duplicate.  Please reopen if this appears to not be due to the IPMI issue with vbmc/libvirt.

*** This bug has been marked as a duplicate of bug 1581364 ***

Comment 9 Bob Fournier 2019-01-24 14:23:58 UTC
This bug was originally created when using a virtual environment and was due to an IPMI issue with vbmc/libvirt which was a clear duplicate to https://bugzilla.redhat.com/show_bug.cgi?id=1581364. This problem appears to not be related.

Please open a NEW bug and provide the following:
- sosreport capturing when the problem occurs
- related case linked to external tracker
- version and pkg ids for nova, ironic etc.

We will close this bug again as a duplicate.

Comment 10 Artom Lifshitz 2019-01-24 16:38:07 UTC

*** This bug has been marked as a duplicate of bug 1581364 ***


Note You need to log in before you can comment on or make changes to this bug.