Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1549571

Summary: openstack stack delete overcloud fails
Product: Red Hat OpenStack Reporter: Mike Abrams <mabrams>
Component: rhosp-directorAssignee: Bob Fournier <bfournie>
Status: CLOSED DUPLICATE QA Contact: Gurenko Alex <agurenko>
Severity: urgent Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: agurenko, aschultz, athomas, bfournie, dbecker, dtantsur, mabrams, mburns, morazi, ohochman, racedoro, rhel-osp-director-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-11 20:07:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Abrams 2018-02-27 12:38:28 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:
[stack@undercloud-0 ~]$ . ./stackrc 
(undercloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 763b3460-4990-43c1-a30b-901c2dc2f829 | compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.7  |
| 3f053e5d-dc4f-4ba6-9610-fafcb7612ab1 | compute-1    | ACTIVE | -          | Running     | ctlplane=192.168.24.9  |
| 5dd7074a-6f62-4d75-955f-421df712e3bf | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.14 |
| 4dbd2f4d-22b1-45f6-ab07-b4c06aa31d18 | controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.12 |
| f57d7ad0-d0d7-43a2-ace9-84d80de343b8 | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
(undercloud) [stack@undercloud-0 ~]$ openstack stack delete overcloud --wait --yes
2018-02-27 12:25:52Z [overcloud]: DELETE_IN_PROGRESS  Stack DELETE started
2018-02-27 12:25:54Z [overcloud.ControllerSshKnownHostsDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:25:55Z [overcloud.ServerIdMap]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:25:55Z [overcloud.ServerIdMap]: DELETE_COMPLETE  state changed
2018-02-27 12:25:56Z [overcloud.DeployedServerEnvironment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:25:56Z [overcloud.DeployedServerEnvironment]: DELETE_COMPLETE  state changed
2018-02-27 12:25:57Z [overcloud.ServerOsCollectConfigData]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:25:57Z [overcloud.ServerOsCollectConfigData]: DELETE_COMPLETE  state changed
2018-02-27 12:25:58Z [overcloud.ComputeSshKnownHostsDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:25:58Z [overcloud.CephStorageSshKnownHostsDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:25:59Z [overcloud.BlockStorageSshKnownHostsDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:26:00Z [overcloud.CephStorageSshKnownHostsDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:26:00Z [overcloud.ObjectStorageSshKnownHostsDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:26:01Z [overcloud.BlockStorageSshKnownHostsDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:26:01Z [overcloud.ObjectStorageSshKnownHostsDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:26:02Z [overcloud.AllNodesDeploySteps]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:26:02Z [overcloud.ControllerSshKnownHostsDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:26:03Z [overcloud.ComputeSshKnownHostsDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:26:03Z [overcloud.SshKnownHostsConfig]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:26:04Z [overcloud.SshKnownHostsConfig]: DELETE_COMPLETE  state changed
2018-02-27 12:27:17Z [overcloud.AllNodesDeploySteps]: DELETE_COMPLETE  state changed
2018-02-27 12:27:18Z [overcloud.ObjectStorageMergedConfigSettings]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:18Z [overcloud.ObjectStorageMergedConfigSettings]: DELETE_COMPLETE  state changed
2018-02-27 12:27:18Z [overcloud.BlockStorageMergedConfigSettings]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:18Z [overcloud.BlockStorageMergedConfigSettings]: DELETE_COMPLETE  state changed
2018-02-27 12:27:18Z [overcloud.ComputeMergedConfigSettings]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:18Z [overcloud.CephStorageMergedConfigSettings]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:18Z [overcloud.ComputeMergedConfigSettings]: DELETE_COMPLETE  state changed
2018-02-27 12:27:18Z [overcloud.CephStorageMergedConfigSettings]: DELETE_COMPLETE  state changed
2018-02-27 12:27:18Z [overcloud.BlacklistedIpAddresses]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:18Z [overcloud.BlacklistedIpAddresses]: DELETE_COMPLETE  state changed
2018-02-27 12:27:19Z [overcloud.ControllerMergedConfigSettings]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:19Z [overcloud.ControllerMergedConfigSettings]: DELETE_COMPLETE  state changed
2018-02-27 12:27:19Z [overcloud.BlacklistedHostnames]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:19Z [overcloud.BlacklistedHostnames]: DELETE_COMPLETE  state changed
2018-02-27 12:27:19Z [overcloud.AllNodesExtraConfig]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:19Z [overcloud.AllNodesExtraConfig]: DELETE_COMPLETE  state changed
2018-02-27 12:27:19Z [overcloud.ComputeAllNodesValidationDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:19Z [overcloud.ObjectStorageAllNodesValidationDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:19Z [overcloud.CephStorageAllNodesValidationDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:20Z [overcloud.ControllerAllNodesValidationDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:20Z [overcloud.UpdateWorkflow]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:20Z [overcloud.UpdateWorkflow]: DELETE_COMPLETE  state changed
2018-02-27 12:27:20Z [overcloud.BlockStorageAllNodesValidationDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:21Z [overcloud.ObjectStorageAllNodesValidationDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:21Z [overcloud.CephStorageAllNodesValidationDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:21Z [overcloud.ObjectStorageAllNodesDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:21Z [overcloud.CephStorageAllNodesDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:21Z [overcloud.BlockStorageAllNodesValidationDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:22Z [overcloud.BlockStorageAllNodesDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:22Z [overcloud.ObjectStorageAllNodesDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:23Z [overcloud.CephStorageAllNodesDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:23Z [overcloud.BlockStorageAllNodesDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:27Z [overcloud.ComputeAllNodesValidationDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:28Z [overcloud.ComputeAllNodesDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:28Z [overcloud.ControllerAllNodesValidationDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:28Z [overcloud.AllNodesValidationConfig]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:28Z [overcloud.ControllerAllNodesDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:30Z [overcloud.AllNodesValidationConfig]: DELETE_COMPLETE  state changed
2018-02-27 12:27:34Z [overcloud.ComputeAllNodesDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:36Z [overcloud.ControllerAllNodesDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:36Z [overcloud.ControllerHostsDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:36Z [overcloud.ComputeHostsDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:36Z [overcloud.ObjectStorageHostsDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:36Z [overcloud.allNodesConfig]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:37Z [overcloud.BlockStorageHostsDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:37Z [overcloud.CephStorageHostsDeployment]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:37Z [overcloud.ObjectStorageHostsDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:38Z [overcloud.allNodesConfig]: DELETE_COMPLETE  state changed
2018-02-27 12:27:38Z [overcloud.ObjectStorageServers]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:38Z [overcloud.BlockStorageHostsDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:38Z [overcloud.CephStorageHostsDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:38Z [overcloud.ObjectStorageServers]: DELETE_COMPLETE  state changed
2018-02-27 12:27:38Z [overcloud.RedisVirtualIP]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:39Z [overcloud.BlockStorageServers]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:39Z [overcloud.CephStorageIpListMap]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:39Z [overcloud.CephStorageServers]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:39Z [overcloud.BlockStorageServers]: DELETE_COMPLETE  state changed
2018-02-27 12:27:39Z [overcloud.ComputeIpListMap]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:39Z [overcloud.CephStorageServers]: DELETE_COMPLETE  state changed
2018-02-27 12:27:40Z [overcloud.ObjectStorageIpListMap]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:40Z [overcloud.ControllerIpListMap]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:41Z [overcloud.BlockStorageIpListMap]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:42Z [overcloud.CephStorageIpListMap]: DELETE_COMPLETE  state changed
2018-02-27 12:27:43Z [overcloud.CephStorageNetworkHostnameMap]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:43Z [overcloud.ControllerIpListMap]: DELETE_COMPLETE  state changed
2018-02-27 12:27:43Z [overcloud.ComputeIpListMap]: DELETE_COMPLETE  state changed
2018-02-27 12:27:43Z [overcloud.BlockStorageIpListMap]: DELETE_COMPLETE  state changed
2018-02-27 12:27:43Z [overcloud.CephStorageNetworkHostnameMap]: DELETE_COMPLETE  state changed
2018-02-27 12:27:43Z [overcloud.ObjectStorageIpListMap]: DELETE_COMPLETE  state changed
2018-02-27 12:27:44Z [overcloud.ComputeNetworkHostnameMap]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:44Z [overcloud.ComputeNetworkHostnameMap]: DELETE_COMPLETE  state changed
2018-02-27 12:27:44Z [overcloud.ObjectStorageNetworkHostnameMap]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:44Z [overcloud.BlockStorageNetworkHostnameMap]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:44Z [overcloud.ObjectStorageNetworkHostnameMap]: DELETE_COMPLETE  state changed
2018-02-27 12:27:44Z [overcloud.ControllerNetworkHostnameMap]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:44Z [overcloud.BlockStorageNetworkHostnameMap]: DELETE_COMPLETE  state changed
2018-02-27 12:27:44Z [overcloud.ControllerNetworkHostnameMap]: DELETE_COMPLETE  state changed
2018-02-27 12:27:45Z [overcloud.RedisVirtualIP]: DELETE_COMPLETE  state changed
2018-02-27 12:27:45Z [overcloud.ComputeHostsDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:46Z [overcloud.ComputeServers]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:46Z [overcloud.ComputeServers]: DELETE_COMPLETE  state changed
2018-02-27 12:27:46Z [overcloud.ControllerHostsDeployment]: DELETE_COMPLETE  state changed
2018-02-27 12:27:46Z [overcloud.ControllerServers]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:46Z [overcloud.hostsConfig]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:46Z [overcloud.ControllerServers]: DELETE_COMPLETE  state changed
2018-02-27 12:27:48Z [overcloud.hostsConfig]: DELETE_COMPLETE  state changed
2018-02-27 12:27:48Z [overcloud.Controller]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:48Z [overcloud.BlockStorage]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:48Z [overcloud.VipHosts]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:48Z [overcloud.Compute]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:48Z [overcloud.VipHosts]: DELETE_COMPLETE  state changed
2018-02-27 12:27:48Z [overcloud.CephStorage]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:48Z [overcloud.BlockStorage]: DELETE_COMPLETE  state changed
2018-02-27 12:27:49Z [overcloud.CephStorage]: DELETE_COMPLETE  state changed
2018-02-27 12:27:49Z [overcloud.ObjectStorage]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:49Z [overcloud.ObjectStorage]: DELETE_COMPLETE  state changed
2018-02-27 12:27:49Z [overcloud.CephStorageServiceConfigSettings]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:49Z [overcloud.CephStorageServiceConfigSettings]: DELETE_COMPLETE  state changed
2018-02-27 12:27:49Z [overcloud.BlockStorageServiceConfigSettings]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:49Z [overcloud.BlockStorageServiceConfigSettings]: DELETE_COMPLETE  state changed
2018-02-27 12:27:50Z [overcloud.ObjectStorageServiceConfigSettings]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:50Z [overcloud.CephStorageServiceNames]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:50Z [overcloud.CephStorageServiceNames]: DELETE_COMPLETE  state changed
2018-02-27 12:27:50Z [overcloud.BlockStorageServiceNames]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:50Z [overcloud.BlockStorageServiceNames]: DELETE_COMPLETE  state changed
2018-02-27 12:27:50Z [overcloud.ObjectStorageServiceConfigSettings]: DELETE_COMPLETE  state changed
2018-02-27 12:27:51Z [overcloud.ObjectStorageServiceNames]: DELETE_IN_PROGRESS  state changed
2018-02-27 12:27:51Z [overcloud.ObjectStorageServiceNames]: DELETE_COMPLETE  state changed
2018-02-27 12:30:59Z [overcloud.Compute]: DELETE_FAILED  ResourceInError: resources.Compute.resources[0].resources.NovaCompute: Went to status ERROR due to "Server compute-0 delete failed: (500) Error destroying the instance on node 1d26a81c-1ad0-458a-b716-23ab0aa5bb8c. Provision state still 'deleting'."
2018-02-27 12:30:59Z [overcloud]: DELETE_FAILED  Resource DELETE failed: ResourceInError: resources.Compute.resources[0].resources.NovaCompute: Went to status ERROR due to "Server compute-0 delete failed: (500) Error destroying the instance on node 1d26a81c-1ad0-458a-b716-23ab0aa5bb8c. Provision state s
2018-02-27 12:31:01Z [overcloud.Controller]: DELETE_FAILED  ResourceInError: resources.Controller.resources[2].resources.Controller: Went to status ERROR due to "Server controller-2 delete failed: (500) Error destroying the instance on node 37a01247-d8de-4405-9834-da209a32310d. Provision state still 'deleting'."
2018-02-27 12:31:01Z [overcloud]: DELETE_FAILED  Resource DELETE failed: ResourceInError: resources.Controller.resources[2].resources.Controller: Went to status ERROR due to "Server controller-2 delete failed: (500) Error destroying the instance on node 37a01247-d8de-4405-9834-da209a32310d. Provision st

 Stack overcloud DELETE_FAILED 

Unable to delete 1 of the 1 stacks.
(undercloud) [stack@undercloud-0 ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+---------------+----------------------+----------------------+
| ID                                   | Stack Name | Project                          | Stack Status  | Creation Time        | Updated Time         |
+--------------------------------------+------------+----------------------------------+---------------+----------------------+----------------------+
| 2586aefb-67ce-4fc8-97be-784878d3345d | overcloud  | 120fdee97e434d4fb7436b64c2aed6d1 | DELETE_FAILED | 2018-02-26T16:01:51Z | 2018-02-27T12:25:52Z |
+--------------------------------------+------------+----------------------------------+---------------+----------------------+----------------------+
(undercloud) [stack@undercloud-0 ~]$ 


Steps to Reproduce:
1. install rhos13
2. . ./stackrc
3. openstack stack delete overcloud --wait --yes

Actual results:
delete fails

Expected results:
delete passes

Additional info:

Comment 4 Bob Fournier 2018-03-09 17:32:28 UTC
Looks like failure with power management. Is IPMI working correctly to this node? Are you able to run ipmitool to it?  Is the BMC updated with latest F/W?

Comment 5 Bob Fournier 2018-04-10 13:02:07 UTC
Any update on this?

Comment 6 Bob Fournier 2018-04-17 14:49:52 UTC
Closing this for now, please update with requested info if able to duplicate.

Comment 7 Amit Ugol 2018-05-06 06:44:55 UTC
Hi. We still see this issue so I am reopening, hopefully we can add more meaningful info to the bug this time.

Comment 8 Gurenko Alex 2018-05-06 06:57:47 UTC
 I still experience the issue with puddle 2018-05-01.6. I would assume it all comes down to vbmc since it also causing other issues and keeps being updated and re-written?

(undercloud) [stack@undercloud-0 ~]$ openstack stack delete overcloud --wait -y
2018-05-06 06:40:03Z [overcloud]: DELETE_IN_PROGRESS  Stack DELETE started
2018-05-06 06:40:03Z [overcloud.CephStorage]: DELETE_IN_PROGRESS  state changed
2018-05-06 06:40:03Z [overcloud.CephStorage]: DELETE_FAILED  ResourceInError: resources.CephStorage.resources[0].resources.CephStorage: Went to status ERROR due to "Server ceph-0 delete failed: (500) Node 9f36aecb-c678-42e3-9c7c-f808ad9d8e10 can not be updated while a state transition is in progress. (HTTP 409)"
2018-05-06 06:40:03Z [overcloud]: DELETE_FAILED  Resource DELETE failed: ResourceInError: resources.CephStorage.resources[0].resources.CephStorage: Went to status ERROR due to "Server ceph-0 delete failed: (500) Node 9f36aecb-c678-42e3-9c7c-f808ad9d8e10 can not be updated while a state transition is in

 Stack overcloud DELETE_FAILED

Unable to delete 1 of the 1 stacks.

(undercloud) [stack@undercloud-0 ~]$ openstack baremetal node show 9f36aecb-c678-42e3-9c7c-f808ad9d8e10 --fit-width
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                  | Value                                                                                                                                                                                    |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| boot_interface         | None                                                                                                                                                                                     |
| chassis_uuid           | None                                                                                                                                                                                     |
| clean_step             | {}                                                                                                                                                                                       |
| console_enabled        | False                                                                                                                                                                                    |
| console_interface      | None                                                                                                                                                                                     |
| created_at             | 2018-05-02T10:00:55+00:00                                                                                                                                                                |
| deploy_interface       | None                                                                                                                                                                                     |
| driver                 | pxe_ipmitool                                                                                                                                                                             |
| driver_info            | {u'ipmi_port': u'6232', u'ipmi_username': u'admin', u'deploy_kernel': u'ed6c7b24-a059-4de4-b8ee-669040b021ba', u'ipmi_address': u'172.16.0.1', u'deploy_ramdisk': u'4cbc442f-575c-4d54   |
|                        | -a75a-b6f5d52e0eee', u'ipmi_password': u'******'}                                                                                                                                        |
| driver_internal_info   | {u'agent_url': u'http://192.168.24.8:9999', u'root_uuid_or_disk_id': u'c7e46e23-2898-4fa9-bfc2-7de1d6c5cf49', u'is_whole_disk_image': False, u'agent_version': u'3.2.1.dev2'}            |
| extra                  | {u'hardware_swift_object': u'extra_hardware-9f36aecb-c678-42e3-9c7c-f808ad9d8e10'}                                                                                                       |
| inspect_interface      | None                                                                                                                                                                                     |
| inspection_finished_at | None                                                                                                                                                                                     |
| inspection_started_at  | None                                                                                                                                                                                     |
| instance_info          | {u'root_gb': u'17', u'display_name': u'ceph-0', u'image_source': u'496124e5-40e1-4ed8-8035-30ac3e82e30a', u'capabilities': u'{"profile": "ceph", "boot_option": "local"}', u'memory_mb': |
|                        | u'4096', u'vcpus': u'1', u'local_gb': u'19', u'configdrive': u'******', u'swap_mb': u'0', u'nova_host_id': u'undercloud-0.redhat.local'}                                                 |
| instance_uuid          | 730462e6-82f1-4a8d-99ff-ac398d9cca62                                                                                                                                                     |
| last_error             | None                                                                                                                                                                                     |
| maintenance            | True                                                                                                                                                                                     |
| maintenance_reason     | During sync_power_state, max retries exceeded for node 9f36aecb-c678-42e3-9c7c-f808ad9d8e10, node state None does not match expected state 'power on'. Updating DB state to 'None'       |
|                        | Switching node to maintenance mode. Error: IPMI call failed: power status.                                                                                                               |
| management_interface   | None                                                                                                                                                                                     |
| name                   | ceph-0                                                                                                                                                                                   |
| network_interface      | flat                                                                                                                                                                                     |
| power_interface        | None                                                                                                                                                                                     |
| power_state            | None                                                                                                                                                                                     |
| properties             | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'19', u'cpus': u'2', u'capabilities': u'profile:ceph,boot_option:local'}                                                   |
| provision_state        | deleting                                                                                                                                                                                 |
| provision_updated_at   | 2018-05-06T05:49:14+00:00                                                                                                                                                                |
| raid_config            | {}                                                                                                                                                                                       |
| raid_interface         | None                                                                                                                                                                                     |
| reservation            | None                                                                                                                                                                                     |
| resource_class         | baremetal                                                                                                                                                                                |
| storage_interface      | noop                                                                                                                                                                                     |
| target_power_state     | None                                                                                                                                                                                     |
| target_provision_state | available                                                                                                                                                                                |
| target_raid_config     | {}                                                                                                                                                                                       |
| updated_at             | 2018-05-06T05:54:58+00:00                                                                                                                                                                |
| uuid                   | 9f36aecb-c678-42e3-9c7c-f808ad9d8e10                                                                                                                                                     |
| vendor_interface       | None                                                                                                                                                                                     |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I'm using following cherry-pick for this system to workaround different issues:
https://review.openstack.org/#/c/564878/

Comment 14 Bob Fournier 2018-05-07 14:05:13 UTC
Alex - I can't seem to access this system, is there an intermediate host I need to be on?

[bfournie@ibm-p8-kvm-03-guest-02 ~]$ ping seal06.qa.lab.tlv.redhat.com
PING seal06.qa.lab.tlv.redhat.com (10.35.64.6) 56(84) bytes of data.
^C
--- seal06.qa.lab.tlv.redhat.com ping statistics ---
93 packets transmitted, 0 received, 100% packet loss, time 91999ms

Note that we are currently fixing a vbmc timeout tracked here - https://bugzilla.redhat.com/show_bug.cgi?id=1571384 that causes power-on/power-off issues to nodes when using vbmc.  Its possible that its the same problem but we'd have to look at logs to confirm.

Comment 16 Bob Fournier 2018-05-07 18:59:40 UTC
I was finally able to get onto seal06.qa.lab.tlv.redhat.com and ssh'ed to undercloud-0. Currently I see baremetal nodes OK. It looks like a deployment is in progress.

(undercloud) [stack@undercloud-0 log]$ openstack baremetal node list
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| dd3f42de-4d5f-428c-b06e-7d74c0a36d98 | ceph-0       | 8cc02e61-d429-44cf-875e-c830131faefe | power on    | active             | False       |
| f61ab9a0-1373-4709-b9f6-775bf8c92683 | ceph-1       | d6ec7e2d-eda4-4cf7-abb9-06cb948fbe38 | power on    | active             | False       |
| dd842269-b332-4b05-9c68-a8a07b9377f1 | ceph-2       | 5fa2fe93-9ae6-43e0-b9d1-e3cd4c5b7619 | power on    | active             | False       |
| 9b22fe1b-22e4-47fd-bd0e-00042b9a2956 | compute-0    | 33915557-efe3-4aaf-87a3-361d0c4aa569 | power on    | active             | False       |
| 97643e11-117c-4fc8-915a-adef0cbd3e90 | compute-1    | 524c77d2-b901-438c-aa66-a92f878275f0 | power on    | active             | False       |
| 4450b6f8-008f-45f3-9560-0e2fe36742eb | controller-0 | 88143f27-3128-40a4-b422-c40fb5fe10c5 | power on    | active             | False       |
| 72c9ab9d-c1ea-43d8-bb44-653b81bf0824 | controller-1 | 48ea75db-74f7-4299-9915-826c1890a8ae | power on    | active             | False       |
| d578fd5f-4120-4d95-9388-314e57e9f9bc | controller-2 | ad52d55a-ecfc-4395-86c7-cb58052900ce | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+

Please capture sosreport if problem occurs again.

Comment 18 Bob Fournier 2018-05-08 20:31:32 UTC
Alex - can you attach the sosreports to this BZ please? I keep getting failures when trying to download from drive.google.com. Thanks.

Comment 20 Bob Fournier 2018-05-10 14:15:32 UTC
Thanks Alex, yes I was able to retrieve the logs.

Its clear from the Ironic logs that we're getting these "Error in tear_down of node"
due to IPMI failure issues (see below).

As you're using virtualbmc for IPMI most likely the issue you are seeing is vitualbmc power failures because of libvirt.  This is being tracked here - https://bugzilla.redhat.com/show_bug.cgi?id=1571384.  There is a libvirt patch described in https://bugzilla.redhat.com/show_bug.cgi?id=1576464 that has proven to resolve these virtualbmc issues.

Would it be possible to install this libvirt patch and retest?

I will leave this open for now, eventually it should be marked a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1571384.


2018-05-08 02:12:13.173 17127 WARNING ironic.drivers.modules.ipmitool [req-6a0a9638-d21d-4fc6-8ca8-8f284d3bb17d b9b6ae49e2b249f692319e410f25d2d7 c8b1b2624f57453496d61febc7ad0c09 - default default] IPMI power status failed for node 4450b6f8-008f-45f3-9560-0e2fe36742eb with error: Unexpected error while running command.
Command: ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6236 -U admin -R 12 -N 5 -f /tmp/tmpYzwU_m power status
Exit code: 1
Stdout: u''
Stderr: u'Error: Unable to establish IPMI v2 / RMCP+ session\n'.: ProcessExecutionError: Unexpected error while running command.
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager [req-6a0a9638-d21d-4fc6-8ca8-8f284d3bb17d b9b6ae49e2b249f692319e410f25d2d7 c8b1b2624f57453496d61febc7ad0c09 - default default] Error in tear_down of node 4450b6f8-008f-45f3-9560-0e2fe36742eb: IPMI call failed: power status.: IPMIFailure: IPMI call failed: power status.
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager Traceback (most recent call last):
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 908, in _do_node_tear_down
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     task.driver.deploy.tear_down(task)
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 60, in wrapped
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     result = f(*args, **kwargs)
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     return f(*args, **kwargs)
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 498, in tear_down
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     manager_utils.node_power_action(task, states.POWER_OFF)
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     return f(*args, **kwargs)
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/conductor/utils.py", line 209, in node_power_action
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     if _can_skip_state_change(task, new_state):
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/conductor/utils.py", line 168, in _can_skip_state_change
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     fields.NotificationStatus.ERROR, new_state)
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     self.force_reraise()
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     six.reraise(self.type_, self.value, self.tb)
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/conductor/utils.py", line 158, in _can_skip_state_change
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     curr_state = task.driver.power.get_power_state(task)
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 60, in wrapped
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     result = f(*args, **kwargs)
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/ipmitool.py", line 781, in get_power_state
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     return _power_status(driver_info)
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager   File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/ipmitool.py", line 564, in _power_status
2018-05-08 02:12:13.193 17127 ERROR ironic.conductor.manager     raise exception.IPMIFailure(cmd=cmd)

Comment 21 Bob Fournier 2018-05-11 20:07:36 UTC

*** This bug has been marked as a duplicate of bug 1571384 ***