Bug 1576073

Summary: overcloud deployment fails: Set Chassis Power Control to Up/On failed: Command not supported in present state
Product: Red Hat OpenStack Reporter: Waldemar Znoinski <wznoinsk>
Component: openstack-ironicAssignee: RHOS Maint <rhos-maint>
Status: CLOSED DUPLICATE QA Contact: mlammon
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: bfournie, mburns, srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-10 14:45:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
baremetal node that failed to power on none

Description Waldemar Znoinski 2018-05-08 17:45:57 UTC
Created attachment 1433361 [details]
baremetal node that failed to power on

Description of problem:
when deploying OSP13 (any recent puddle) on a big physical machine with undercloud and overcloud as VMs on it, undercloud installation works fine but overcloud deployment fails with:


(undercloud) [stack@undercloud-0 ~]$ openstack stack list
-----------------------------------------------------------------------------------------------------------------------------+

ID	Stack Name	Project	Stack Status	Creation Time	Updated Time
-----------------------------------------------------------------------------------------------------------------------------+

e75324ea-4415-46ac-958a-72d075641062	overcloud	915501887d7c498196b8595870706de1	CREATE_FAILED	2018-05-08T04:11:00Z	None
-----------------------------------------------------------------------------------------------------------------------------+

 

(undercloud) [stack@undercloud-0 ~]$ openstack server list
----------------------------------------------------------------------------------------+

ID	Name	Status	Networks	Image	Flavor
----------------------------------------------------------------------------------------+

4a5ddf23-ef8c-4941-af68-6a5b6d53e336	compute-0	ERROR	 	overcloud-full	compute
cee22b96-820b-4f45-9add-348cef284c6e	controller-0	ERROR	 	overcloud-full	controller
bcce27a9-5d7e-491e-9efe-edafa3673df9	controller-1	BUILD	 	overcloud-full	controller
cb42fa7a-ee30-4f91-a50f-620a2271126c	controller-2	BUILD	 	overcloud-full	controller
74f9f0a6-c44d-4720-b5c5-6519fd112d25	compute-1	BUILD	 	overcloud-full	compute
----------------------------------------------------------------------------------------+

 

 

(undercloud) [stack@undercloud-0 ~]$ openstack stack failures list overcloud
overcloud.Controller.0.Controller:
resource_type: OS::TripleO::ControllerServer
physical_resource_id: cee22b96-820b-4f45-9add-348cef284c6e
status: CREATE_FAILED
status_reason: |
ResourceInError: resources.Controller: Went to status ERROR due to "Message: Build of instance cee22b96-820b-4f45-9add-348cef284c6e aborted: Failure prepping block device., Code: 500"
overcloud.Compute.0.NovaCompute:
resource_type: OS::TripleO::ComputeServer
physical_resource_id: 4a5ddf23-ef8c-4941-af68-6a5b6d53e336
status: CREATE_FAILED
status_reason: |
ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: Build of instance 4a5ddf23-ef8c-4941-af68-6a5b6d53e336 aborted: Failure prepping block device., Code: 500"

 

(undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list
-------------------------------------------------------------------------------------------------------+

UUID	Name	Instance UUID	Power State	Provisioning State	Maintenance
-------------------------------------------------------------------------------------------------------+

f261b7f3-944e-44ba-996d-2c1b970609b0	compute-0	None	power off	available	False
77e1adac-24e7-4879-b19d-8214604bedf8	compute-1	None	power off	available	False
9a3887aa-6713-4d86-8fb7-ac93b17680f9	controller-0	None	power off	available	False
d5a8ec53-3a5b-45b8-b28f-be726d94a6aa	controller-1	None	power off	available	False
764c4790-f365-4376-a1e4-c71349d9c76e	controller-2	None	power off	available	False
-------------------------------------------------------------------------------------------------------+

 

grep -i "ERROR.*f261b7f3" ironic-conductor.log 
2018-05-08 00:18:55.349 16754 ERROR ironic.drivers.modules.ipmitool [req-889cd9a9-2277-4351-8430-957d3e0027d3 b2654cf6293c4d879caf7ea7934e8b23 cc452732555c4e85a82de4a33902abe6 - default default] IPMI Error while attempting "ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6233 -U admin -R 12 -N 5 -f /tmp/tmpKZr6_c power on" for node f261b7f3-944e-44ba-996d-2c1b970609b0. Error: Unexpected error while running command.
2018-05-08 00:18:55.387 16754 ERROR ironic.conductor.manager [req-889cd9a9-2277-4351-8430-957d3e0027d3 b2654cf6293c4d879caf7ea7934e8b23 cc452732555c4e85a82de4a33902abe6 - default default] Error in deploy of node f261b7f3-944e-44ba-996d-2c1b970609b0: IPMI call failed: power on.: IPMIFailure: IPMI call failed: power on.

 

more about the above log entries:
2018-05-08 00:18:55.349 16754 ERROR ironic.drivers.modules.ipmitool [req-889cd9a9-2277-4351-8430-957d3e0027d3 b2654cf6293c4d879caf7ea7934e8b23 cc452732555c4e85a82de4a33902abe6 - default default] IPMI Error while attempting "ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6233 -U admin -R 12 -N 5 -f /tmp/tmpKZr6_c power on" for node f261b7f3-944e-44ba-996d-2c1b970609b0. Error: Unexpected error while running command.
Command: ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6233 -U admin -R 12 -N 5 -f /tmp/tmpKZr6_c power on
Exit code: 1
Stdout: u''
Stderr: u'Set Chassis Power Control to Up/On failed: Command not supported in present state\n': ProcessExecutionError: Unexpected error while running command.
2018-05-08 00:18:55.351 16754 WARNING ironic.drivers.modules.ipmitool [req-889cd9a9-2277-4351-8430-957d3e0027d3 b2654cf6293c4d879caf7ea7934e8b23 cc452732555c4e85a82de4a33902abe6 - default default] IPMI power action power on failed for node f261b7f3-944e-44ba-996d-2c1b970609b0 with error: Unexpected error while running command.
Command: ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6233 -U admin -R 12 -N 5 -f /tmp/tmpKZr6_c power on
Exit code: 1
Stdout: u''
Stderr: u'Set Chassis Power Control to Up/On failed: Command not supported in present state\n'.: ProcessExecutionError: Unexpected error while running command.


when the above command is run by hand after the deployment process fails, then it works:
(undercloud) [stack@undercloud-0 ~]$ ipmitool -I lanplus -H 172.16.0.1 -L ADMINISTRATOR -p 6233 -U admin -R 12 -N 5 -P password power on
Chassis Power Control: Up/On


it depends on the sequence but sometimes some VMs (that we don't see the error for) get powered on sometimes not, this time:
[root@rhosw08 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 6     undercloud-0                   running
 -     compute-0                      shut off
 -     compute-1                      shut off
 -     controller-0                   shut off
 -     controller-1                   shut off
 -     controller-2                   shut off



Version-Release number of selected component (if applicable):
osp13 undercloud, osp13 overcloud

[root@undercloud-0 ironic]# rpm -qa | grep -i ironic
openstack-ironic-common-10.1.2-3.el7ost.noarch
python2-ironicclient-2.2.0-1.el7ost.noarch
python-ironic-lib-2.12.1-1.el7ost.noarch
puppet-ironic-12.4.0-0.20180329034302.8285d85.el7ost.noarch
openstack-ironic-conductor-10.1.2-3.el7ost.noarch
python-ironic-inspector-client-3.1.1-1.el7ost.noarch
openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch
python2-ironic-neutron-agent-1.0.0-1.el7ost.noarch
openstack-ironic-api-10.1.2-3.el7ost.noarch
openstack-ironic-inspector-7.2.1-0.20180409163359.2435d97.el7ost.noarch


How reproducible:
90%


Steps to Reproduce:
1. install undercloud osp13
2. try to deploy overcloud osp13
3. observe the failure

Actual results:
overcloud deployment fails becuase ironic has issues powering up a VM

Expected results:
overcloud deployment succeed

Additional info:

Comment 1 Bob Fournier 2018-05-08 17:54:15 UTC
This looks like the virtualbmc issue tracked here - https://bugzilla.redhat.com/show_bug.cgi?id=1571384, with upstream patch in progress -https://review.openstack.org/#/c/564878/.

Comment 2 Bob Fournier 2018-05-10 14:30:26 UTC
Waldemar, can this be marked a duplicate of of bug 1576464 based on your comments - https://bugzilla.redhat.com/show_bug.cgi?id=1576464#c7 ?

Comment 3 Waldemar Znoinski 2018-05-10 14:41:58 UTC
Bob I think this bug could marked as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1571384 (it's closer than the libvirt one IMHO)

leaving the decision to you

Comment 4 Bob Fournier 2018-05-10 14:45:13 UTC
> Bob I think this bug could marked as a duplicate of 
> https://bugzilla.redhat.com/show_bug.cgi?id=1571384 (it's closer than the libvirt one IMHO)

I agree, this is the one we are using to track the actual virtualbmc failures due to the libvirt issue so let's close it against that one.

*** This bug has been marked as a duplicate of bug 1571384 ***