Bug 1344004 - Node cleaning fails with 'Failed to tear down from cleaning for node UUID'
Summary: Node cleaning fails with 'Failed to tear down from cleaning for node UUID'
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
Target Milestone: beta
: 10.0 (Newton)
Assignee: Miles Gould
QA Contact: Raviv Bar-Tal
Depends On:
Blocks: 1335596 intelosp10bugs
TreeView+ depends on / blocked
Reported: 2016-06-08 13:24 UTC by Raviv Bar-Tal
Modified: 2020-08-13 08:29 UTC (History)
11 users (show)

Fixed In Version: openstack-ironic-6.1.1-0.20160907120305.0acdfca.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, 'ironic-conductor' did not correctly pass the authentication token to the 'python-neutronclient'. As a result, automatic node cleaning failed with a tear down error. With this update, OpenStack Baremetal Provisioning (ironic) was migrated to use the 'keystoneauth' sessions rather than directly constructing Identity service client objects. As a result, nodes can now be successfully torn down after cleaning.
Clone Of:
Last Closed: 2016-12-14 15:36:33 UTC
Target Upstream Version:

Attachments (Terms of Use)
ironic, keystone, neutorn logs (25.42 KB, application/x-gzip)
2016-06-08 13:24 UTC, Raviv Bar-Tal
no flags Details

System ID Private Priority Status Summary Last Updated
Launchpad 1590408 0 None None None 2016-06-08 13:23:59 UTC
Red Hat Product Errata RHEA-2016:2948 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC

Description Raviv Bar-Tal 2016-06-08 13:24:00 UTC
Created attachment 1166021 [details]
ironic, keystone, neutorn logs

Description of problem:
automatic node cleaning fails with tear down error

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Enable automated_clean in ironic.conf
2. In virtual environment, or if your ssd doe's not support security erase - disable erase_devices by setting its priority to 0 (erase_devices_priority=0)
3. Restart ironic-conductor.
4. move node from manage state to provide (available) 

Actual results:

node state change to cleaning.
node is rebooted, and shut down.
node state change to clean failed and maintenance = True
ironic node-show xxxx-xxx output:
"maintenance_reason | Failed to tear down from cleaning for node xxxx-xxxx"

Expected results:
Node cleaning finish and node status change to available

Additional info:
Bug opened in upstream launchpad

Comment 1 Miles Gould 2016-06-20 13:09:51 UTC
@rbartal I'm having trouble duplicating this problem - how did you set up your environment, and which node are you running the commands on?

Comment 2 Miles Gould 2016-06-21 13:35:26 UTC
@rbartal alternatively, can you provide me with an environment on which I can duplicate this bug?

Comment 3 Raviv Bar-Tal 2016-06-23 09:06:44 UTC
HI Miles, 
The same problem happens on virt and BM env, 
you can connect to my seal system to test it.

Comment 4 Miles Gould 2016-06-27 17:29:06 UTC
The failure happens when Ironic tries to list the Neutron ports to be torn down during cleaning.

 - Ironic-conductor doesn't pass an auth token to python-neutronclient
 - python-neutronclient tries to fetch one itself
 - the auth_uri setting in ironic.conf is in the wrong format for python-neutronclient (it needs /auth added to the end, but this breaks other parts of ironic)
 - if I put in a hack to add /auth, the authentication request fails anyway with the error "Expecting to find identity in auth - the server could not comply with the request since it is either malformed or otherwise incorrect. The client is assumed to be in error."

I'll continue to look into this tomorrow.

Comment 5 Miles Gould 2016-06-28 13:15:35 UTC
I was able to make cleaning succeed by adding the line

task.context.auth_token = keystone.get_admin_auth_token()

to ironic.dhcp.neutron.NeutronDHCPApi.delete_cleaning_ports, but that's clearly not a proper solution. I'll ask upstream where the auth token should come from.

Comment 6 Miles Gould 2016-06-28 15:37:50 UTC
There's a patch currently in review to completely rework the way we use Keystone, which should fix this: https://review.openstack.org/#/c/236982/

Comment 7 Dmitry Tantsur 2016-09-06 15:32:17 UTC

I believe we've fixed this in the Newton (OSP10) release. Cleaning did work for me in overcloud a few weeks ago. Could you please retest it?

Comment 8 Raviv Bar-Tal 2016-09-15 14:59:52 UTC
This cleaning scenario was tested on RHOS 10 (Newton) and pass,
The erase_device step was disabled for the test as the disk on my machine do not support security erase.

Ironic RPM in this RHOS10 puddle are:

Comment 13 errata-xmlrpc 2016-12-14 15:36:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.