Bug 1246641 - 'openstack baremetal introspection bulk start' does not leave nodes powered off
Summary: 'openstack baremetal introspection bulk start' does not leave nodes powered off
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: y2
: Director
Assignee: chris alfonso
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-24 18:21 UTC by Ronelle Landy
Modified: 2016-04-18 07:13 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-31 16:56:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sudo journalctl -u openstack-ironic-conductor -l --no-pager | grep 40da75b9-be0d-41aa-a5f3-4218c002c78c (144.56 KB, text/plain)
2015-07-24 18:28 UTC, John Trowbridge
no flags Details

Description Ronelle Landy 2015-07-24 18:21:33 UTC
Description of problem:

Before the overcloud is deployed on HP env, the CI jobs run ' openstack baremetal introspection bulk start'. There are traces in the logs of power off command being executed, however, before deploy, ironic node-list shows:

$ ironic node-list
+--------------------------------------+------+---------------+-------------+-----------------+-------------+
| UUID                                 | Name | Instance UUID | Power State | Provision State | Maintenance |
+--------------------------------------+------+---------------+-------------+-----------------+-------------+
| 40da75b9-be0d-41aa-a5f3-4218c002c78c | None | None          | power on    | available       | False       |
| f6629bf7-e55f-4837-ab18-85f0667a097a | None | None          | power off   | available       | False       |
| 30a16686-6786-4f93-b04c-843b1a36f121 | None | None          | power on    | available       | False       |
| 9ec7a021-943a-4d56-bd7c-f277af4710bd | None | None          | power on    | available       | False       |
+--------------------------------------+------+---------------+-------------+-----------------+-------------+

The problem is that nodes that are on from previous deploy could result in overlapping ip addresses.

Version-Release number of selected component (if applicable):

 rpm -qa | grep openstack
openstack-heat-api-2015.1.0-4.el7ost.noarch
openstack-ceilometer-central-2015.1.0-10.el7ost.noarch
openstack-tuskar-0.4.18-3.el7ost.noarch
openstack-swift-2.3.0-1.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-16.el7ost.noarch
openstack-swift-object-2.3.0-1.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-ceilometer-collector-2015.1.0-10.el7ost.noarch
openstack-tripleo-common-0.0.1.dev6-1.git49b57eb.el7ost.noarch
openstack-neutron-openvswitch-2015.1.0-12.el7ost.noarch
openstack-nova-api-2015.1.0-16.el7ost.noarch
python-django-openstack-auth-1.2.0-3.el7ost.noarch
openstack-nova-common-2015.1.0-16.el7ost.noarch
openstack-tripleo-0.0.7-0.1.1664e566.el7ost.noarch
python-openstackclient-1.0.3-2.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-4.el7ost.noarch
openstack-neutron-common-2015.1.0-12.el7ost.noarch
openstack-neutron-2015.1.0-12.el7ost.noarch
openstack-heat-engine-2015.1.0-4.el7ost.noarch
openstack-ceilometer-common-2015.1.0-10.el7ost.noarch
openstack-ironic-common-2015.1.0-9.el7ost.noarch
openstack-nova-compute-2015.1.0-16.el7ost.noarch
openstack-nova-conductor-2015.1.0-16.el7ost.noarch
openstack-swift-account-2.3.0-1.el7ost.noarch
openstack-swift-proxy-2.3.0-1.el7ost.noarch
openstack-dashboard-theme-2015.1.0-10.el7ost.noarch
openstack-tuskar-ui-extras-0.0.4-1.el7ost.noarch
openstack-nova-console-2015.1.0-16.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-tripleo-image-elements-0.9.6-6.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch
openstack-heat-common-2015.1.0-4.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-4.el7ost.noarch
openstack-ironic-conductor-2015.1.0-9.el7ost.noarch
openstack-ceilometer-api-2015.1.0-10.el7ost.noarch
openstack-ceilometer-alarm-2015.1.0-10.el7ost.noarch
openstack-ironic-api-2015.1.0-9.el7ost.noarch
openstack-keystone-2015.1.0-4.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch
openstack-puppet-modules-2015.1.8-8.el7ost.noarch
openstack-dashboard-2015.1.0-10.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
openstack-tempest-kilo-20150708.2.el7ost.noarch
openstack-neutron-ml2-2015.1.0-12.el7ost.noarch
openstack-nova-scheduler-2015.1.0-16.el7ost.noarch
openstack-nova-cert-2015.1.0-16.el7ost.noarch
openstack-glance-2015.1.0-6.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-4.el7ost.noarch
openstack-ceilometer-notification-2015.1.0-10.el7ost.noarch
openstack-ironic-discoverd-1.1.0-5.el7ost.noarch
openstack-selinux-0.6.37-1.el7ost.noarch
openstack-swift-container-2.3.0-1.el7ost.noarch
openstack-tuskar-ui-0.3.0-13.el7ost.noarch

rpm -qa | grep osc
python-rdomanager-oscplugin-0.0.8-43.el7ost.noarch

How reproducible:
Mostly

Steps to Reproduce:
1.Install undercloud on HP hardware
2. execute openstack baremetal introspection bulk start
3. check node status

Actual results:

Some nodes are powered on before deploy

Expected results:

All nodes should be off

Additional info:

This may be an issue of the environment and not the product

Comment 3 John Trowbridge 2015-07-24 18:28:54 UTC
Created attachment 1055864 [details]
sudo journalctl -u openstack-ironic-conductor -l --no-pager | grep 40da75b9-be0d-41aa-a5f3-4218c002c78c

Comment 4 John Trowbridge 2015-07-24 18:38:12 UTC
The full ironic-conductor log for the first node is attached.

Some additional context,

This power state change is not coming from Ironic. We see the node get powered off after the discovery ramdisk completes:

Jul 24 12:29:10 virtblade11.virt.lab.eng.bos.redhat.com ironic-conductor[12221]: 2015-07-24 12:29:10.394 12221 DEBUG ironic.conductor.manager [-] RPC change_node_power_state called for node 40da75b9-be0d-41aa-a5f3-4218c002c78c. The desired new state is power off. change_node_power_state /usr/lib/python2.7/site-packages/ironic/conductor/manager.py:431

Jul 24 12:29:20 virtblade11.virt.lab.eng.bos.redhat.com ironic-conductor[12221]: 2015-07-24 12:29:20.963 12221 INFO ironic.conductor.utils [-] Successfully set node 40da75b9-be0d-41aa-a5f3-4218c002c78c power state to power off.


Then, about a minute later, Ironic finds the node powered on:

Jul 24 12:30:16 virtblade11.virt.lab.eng.bos.redhat.com ironic-conductor[12221]: 2015-07-24 12:30:16.414 12221 WARNING ironic.conductor.manager [-] During sync_power_state, node 40da75    b9-be0d-41aa-a5f3-4218c002c78c state does not match expected state 'power off'. Updating recorded state to 'power on'.

Note, there are no logs in between showing a RPC call to change the power state. My suspicion is that there is something outside of the deployment powering on the node, but I am not sure how to confirm that.

Comment 5 Mike Burns 2015-08-19 16:24:41 UTC
Is this still reproducing?

Comment 6 Ronelle Landy 2015-08-19 20:51:09 UTC
Hard to tell - in CI ( keep testing operational)  we are working around this by turning the nodes off via Ironic before deploy.


Note You need to log in before you can comment on or make changes to this bug.