1344004 – Node cleaning fails with 'Failed to tear down from cleaning for node UUID'

Bug 1344004 - Node cleaning fails with 'Failed to tear down from cleaning for node UUID'

Summary: Node cleaning fails with 'Failed to tear down from cleaning for node UUID'

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-ironic
Sub Component:
Version:	9.0 (Mitaka)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	beta
Target Release:	10.0 (Newton)
Assignee:	Miles Gould
QA Contact:	Raviv Bar-Tal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1335596 intelosp10bugs
TreeView+	depends on / blocked

Reported:	2016-06-08 13:24 UTC by Raviv Bar-Tal
Modified:	2020-08-13 08:29 UTC (History)
CC List:	11 users (show)
Fixed In Version:	openstack-ironic-6.1.1-0.20160907120305.0acdfca.el7ost
Doc Type:	Bug Fix
Doc Text:	Previously, 'ironic-conductor' did not correctly pass the authentication token to the 'python-neutronclient'. As a result, automatic node cleaning failed with a tear down error. With this update, OpenStack Baremetal Provisioning (ironic) was migrated to use the 'keystoneauth' sessions rather than directly constructing Identity service client objects. As a result, nodes can now be successfully torn down after cleaning.
Clone Of:
Environment:
Last Closed:	2016-12-14 15:36:33 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
ironic, keystone, neutorn logs (25.42 KB, application/x-gzip) 2016-06-08 13:24 UTC, Raviv Bar-Tal	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Launchpad	1590408	0	None	None	None	2016-06-08 13:23:59 UTC
Red Hat Product Errata	RHEA-2016:2948	0	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 10 enhancement update	2016-12-14 19:55:27 UTC

Description Raviv Bar-Tal 2016-06-08 13:24:00 UTC

Created attachment 1166021 [details]
ironic, keystone, neutorn logs

Description of problem:
automatic node cleaning fails with tear down error

Version-Release number of selected component (if applicable):


How reproducible:
always 

Steps to Reproduce:
1. Enable automated_clean in ironic.conf
2. In virtual environment, or if your ssd doe's not support security erase - disable erase_devices by setting its priority to 0 (erase_devices_priority=0)
3. Restart ironic-conductor.
4. move node from manage state to provide (available) 

Actual results:

node state change to cleaning.
node is rebooted, and shut down.
node state change to clean failed and maintenance = True
ironic node-show xxxx-xxx output:
"maintenance_reason | Failed to tear down from cleaning for node xxxx-xxxx"

Expected results:
Node cleaning finish and node status change to available

Additional info:
Bug opened in upstream launchpad

Comment 1 Miles Gould 2016-06-20 13:09:51 UTC

@rbartal I'm having trouble duplicating this problem - how did you set up your environment, and which node are you running the commands on?

Comment 2 Miles Gould 2016-06-21 13:35:26 UTC

@rbartal alternatively, can you provide me with an environment on which I can duplicate this bug?

Comment 3 Raviv Bar-Tal 2016-06-23 09:06:44 UTC

HI Miles, 
The same problem happens on virt and BM env, 
you can connect to my seal system to test it.
Raviv

Comment 4 Miles Gould 2016-06-27 17:29:06 UTC

The failure happens when Ironic tries to list the Neutron ports to be torn down during cleaning.

 - Ironic-conductor doesn't pass an auth token to python-neutronclient
 - python-neutronclient tries to fetch one itself
 - the auth_uri setting in ironic.conf is in the wrong format for python-neutronclient (it needs /auth added to the end, but this breaks other parts of ironic)
 - if I put in a hack to add /auth, the authentication request fails anyway with the error "Expecting to find identity in auth - the server could not comply with the request since it is either malformed or otherwise incorrect. The client is assumed to be in error."

I'll continue to look into this tomorrow.

Comment 5 Miles Gould 2016-06-28 13:15:35 UTC

I was able to make cleaning succeed by adding the line

task.context.auth_token = keystone.get_admin_auth_token()

to ironic.dhcp.neutron.NeutronDHCPApi.delete_cleaning_ports, but that's clearly not a proper solution. I'll ask upstream where the auth token should come from.

Comment 6 Miles Gould 2016-06-28 15:37:50 UTC

There's a patch currently in review to completely rework the way we use Keystone, which should fix this: https://review.openstack.org/#/c/236982/

Comment 7 Dmitry Tantsur 2016-09-06 15:32:17 UTC

Hi!

I believe we've fixed this in the Newton (OSP10) release. Cleaning did work for me in overcloud a few weeks ago. Could you please retest it?

Comment 8 Raviv Bar-Tal 2016-09-15 14:59:52 UTC

This cleaning scenario was tested on RHOS 10 (Newton) and pass,
The erase_device step was disabled for the test as the disk on my machine do not support security erase.

Ironic RPM in this RHOS10 puddle are:
openstack-ironic-api-6.1.1-0.20160907120305.0acdfca.el7ost.noarch
openstack-ironic-inspector-4.1.1-0.20160906074601.0276422.el7ost.noarch
openstack-ironic-conductor-6.1.1-0.20160907120305.0acdfca.el7ost.noarch
openstack-ironic-common-6.1.1-0.20160907120305.0acdfca.el7ost.noarch

Comment 13 errata-xmlrpc 2016-12-14 15:36:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html

Note You need to log in before you can comment on or make changes to this bug.