Bug 1246641

Summary: 'openstack baremetal introspection bulk start' does not leave nodes powered off
Product: Red Hat OpenStack Reporter: Ronelle Landy <rlandy>
Component: rhosp-directorAssignee: chris alfonso <calfonso>
Status: CLOSED INSUFFICIENT_DATA QA Contact: yeylon <yeylon>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: hbrock, jtrowbri, mburns, rhel-osp-director-maint, rlandy, srevivo, whayutin
Target Milestone: y2Keywords: Automation, ZStream
Target Release: Director   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-31 16:56:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sudo journalctl -u openstack-ironic-conductor -l --no-pager | grep 40da75b9-be0d-41aa-a5f3-4218c002c78c none

Description Ronelle Landy 2015-07-24 18:21:33 UTC
Description of problem:

Before the overcloud is deployed on HP env, the CI jobs run ' openstack baremetal introspection bulk start'. There are traces in the logs of power off command being executed, however, before deploy, ironic node-list shows:

$ ironic node-list
+--------------------------------------+------+---------------+-------------+-----------------+-------------+
| UUID                                 | Name | Instance UUID | Power State | Provision State | Maintenance |
+--------------------------------------+------+---------------+-------------+-----------------+-------------+
| 40da75b9-be0d-41aa-a5f3-4218c002c78c | None | None          | power on    | available       | False       |
| f6629bf7-e55f-4837-ab18-85f0667a097a | None | None          | power off   | available       | False       |
| 30a16686-6786-4f93-b04c-843b1a36f121 | None | None          | power on    | available       | False       |
| 9ec7a021-943a-4d56-bd7c-f277af4710bd | None | None          | power on    | available       | False       |
+--------------------------------------+------+---------------+-------------+-----------------+-------------+

The problem is that nodes that are on from previous deploy could result in overlapping ip addresses.

Version-Release number of selected component (if applicable):

 rpm -qa | grep openstack
openstack-heat-api-2015.1.0-4.el7ost.noarch
openstack-ceilometer-central-2015.1.0-10.el7ost.noarch
openstack-tuskar-0.4.18-3.el7ost.noarch
openstack-swift-2.3.0-1.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-16.el7ost.noarch
openstack-swift-object-2.3.0-1.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-ceilometer-collector-2015.1.0-10.el7ost.noarch
openstack-tripleo-common-0.0.1.dev6-1.git49b57eb.el7ost.noarch
openstack-neutron-openvswitch-2015.1.0-12.el7ost.noarch
openstack-nova-api-2015.1.0-16.el7ost.noarch
python-django-openstack-auth-1.2.0-3.el7ost.noarch
openstack-nova-common-2015.1.0-16.el7ost.noarch
openstack-tripleo-0.0.7-0.1.1664e566.el7ost.noarch
python-openstackclient-1.0.3-2.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-4.el7ost.noarch
openstack-neutron-common-2015.1.0-12.el7ost.noarch
openstack-neutron-2015.1.0-12.el7ost.noarch
openstack-heat-engine-2015.1.0-4.el7ost.noarch
openstack-ceilometer-common-2015.1.0-10.el7ost.noarch
openstack-ironic-common-2015.1.0-9.el7ost.noarch
openstack-nova-compute-2015.1.0-16.el7ost.noarch
openstack-nova-conductor-2015.1.0-16.el7ost.noarch
openstack-swift-account-2.3.0-1.el7ost.noarch
openstack-swift-proxy-2.3.0-1.el7ost.noarch
openstack-dashboard-theme-2015.1.0-10.el7ost.noarch
openstack-tuskar-ui-extras-0.0.4-1.el7ost.noarch
openstack-nova-console-2015.1.0-16.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-tripleo-image-elements-0.9.6-6.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch
openstack-heat-common-2015.1.0-4.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-4.el7ost.noarch
openstack-ironic-conductor-2015.1.0-9.el7ost.noarch
openstack-ceilometer-api-2015.1.0-10.el7ost.noarch
openstack-ceilometer-alarm-2015.1.0-10.el7ost.noarch
openstack-ironic-api-2015.1.0-9.el7ost.noarch
openstack-keystone-2015.1.0-4.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch
openstack-puppet-modules-2015.1.8-8.el7ost.noarch
openstack-dashboard-2015.1.0-10.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
openstack-tempest-kilo-20150708.2.el7ost.noarch
openstack-neutron-ml2-2015.1.0-12.el7ost.noarch
openstack-nova-scheduler-2015.1.0-16.el7ost.noarch
openstack-nova-cert-2015.1.0-16.el7ost.noarch
openstack-glance-2015.1.0-6.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-4.el7ost.noarch
openstack-ceilometer-notification-2015.1.0-10.el7ost.noarch
openstack-ironic-discoverd-1.1.0-5.el7ost.noarch
openstack-selinux-0.6.37-1.el7ost.noarch
openstack-swift-container-2.3.0-1.el7ost.noarch
openstack-tuskar-ui-0.3.0-13.el7ost.noarch

rpm -qa | grep osc
python-rdomanager-oscplugin-0.0.8-43.el7ost.noarch

How reproducible:
Mostly

Steps to Reproduce:
1.Install undercloud on HP hardware
2. execute openstack baremetal introspection bulk start
3. check node status

Actual results:

Some nodes are powered on before deploy

Expected results:

All nodes should be off

Additional info:

This may be an issue of the environment and not the product

Comment 3 John Trowbridge 2015-07-24 18:28:54 UTC
Created attachment 1055864 [details]
sudo journalctl -u openstack-ironic-conductor -l --no-pager | grep 40da75b9-be0d-41aa-a5f3-4218c002c78c

Comment 4 John Trowbridge 2015-07-24 18:38:12 UTC
The full ironic-conductor log for the first node is attached.

Some additional context,

This power state change is not coming from Ironic. We see the node get powered off after the discovery ramdisk completes:

Jul 24 12:29:10 virtblade11.virt.lab.eng.bos.redhat.com ironic-conductor[12221]: 2015-07-24 12:29:10.394 12221 DEBUG ironic.conductor.manager [-] RPC change_node_power_state called for node 40da75b9-be0d-41aa-a5f3-4218c002c78c. The desired new state is power off. change_node_power_state /usr/lib/python2.7/site-packages/ironic/conductor/manager.py:431

Jul 24 12:29:20 virtblade11.virt.lab.eng.bos.redhat.com ironic-conductor[12221]: 2015-07-24 12:29:20.963 12221 INFO ironic.conductor.utils [-] Successfully set node 40da75b9-be0d-41aa-a5f3-4218c002c78c power state to power off.


Then, about a minute later, Ironic finds the node powered on:

Jul 24 12:30:16 virtblade11.virt.lab.eng.bos.redhat.com ironic-conductor[12221]: 2015-07-24 12:30:16.414 12221 WARNING ironic.conductor.manager [-] During sync_power_state, node 40da75    b9-be0d-41aa-a5f3-4218c002c78c state does not match expected state 'power off'. Updating recorded state to 'power on'.

Note, there are no logs in between showing a RPC call to change the power state. My suspicion is that there is something outside of the deployment powering on the node, but I am not sure how to confirm that.

Comment 5 Mike Burns 2015-08-19 16:24:41 UTC
Is this still reproducing?

Comment 6 Ronelle Landy 2015-08-19 20:51:09 UTC
Hard to tell - in CI ( keep testing operational)  we are working around this by turning the nodes off via Ironic before deploy.