Bug 1241424
Summary: | Can't delete bare metal nodes that are stuck and unresponsive, or put them in maintenance | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Udi Kalifon <ukalifon> |
Component: | openstack-ironic | Assignee: | Lucas Alvares Gomes <lmartins> |
Status: | CLOSED ERRATA | QA Contact: | Toure Dunnon <tdunnon> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | Director | CC: | david.costakos, dmacpher, dyocum, hbrock, jcoufal, jslagle, lmartins, mburns, nbarcet, ohochman, rhel-osp-director-maint, sclewis, srevivo, tcarlin, ukalifon |
Target Milestone: | z5 | Keywords: | Triaged, ZStream |
Target Release: | 7.0 (Kilo) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-ironic-2015.1.2-3.el7ost | Doc Type: | Known Issue |
Doc Text: |
Sometimes bare metal nodes can lock into a certain state if ironic-conductor stops abruptly. This means users cannot delete these nodes or change their state. As a workaround, log into the director's database and use the following query to set the node back to "available" state and remove the lock:
UPDATE nodes SET provision_state="available", target_provision_state=NULL, reservation=NULL WHERE uuid=<node uuid>;
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2016-06-15 18:04:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Udi Kalifon
2015-07-09 08:22:37 UTC
I will need some more information about the states of the node to proceed. Can you please tell me the node-show output of both nodes? $ ironic node-show <uuid> And attach the ironic-conductor log as well if possible. Lucas, Please provide the workaround necessary via updating the record manually in mysql, and make sure the doc text exists for this. Removing from the blocker list. Hi @chris, Right, yeah this is a workaround that we really want to avoid and I've been looking at states where we can get stuck and trying to fix then to avoid this. So please use it only as a last resort. In case you're really stuck please put the node back to "available" state, by modifying the database as: UPDATE nodes SET provision_state="available", target_provision_state=NULL WHERE uuid=<uuid>; For example: [stack@localhost devstack]$ sudo mysql -u root -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 91 Server version: 10.0.20-MariaDB MariaDB Server Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> use ironic; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed MariaDB [ironic]> UPDATE nodes SET provision_state="available", target_provision_state=NULL WHERE uuid="b76e1671-7a4c-4066-be7a-dc4e97c8dddd"; Query OK, 0 rows affected (0.06 sec) Rows matched: 1 Changed: 0 Warnings: 0 MariaDB [ironic]> exit Bye (In reply to Lucas Alvares Gomes from comment #6) > Hi @chris, > > Right, yeah this is a workaround that we really want to avoid and I've been > looking at states where we can get stuck and trying to fix then to avoid > this. So please use it only as a last resort. > > In case you're really stuck please put the node back to "available" state, > by modifying the database as: > > UPDATE nodes SET provision_state="available", target_provision_state=NULL > WHERE uuid=<uuid>; > Actually, it would be good to clean up the "reservation" field as well in case the node is also locked by a specific conductor: UPDATE nodes SET provision_state="available", target_provision_state=NULL, reservation=NULL WHERE uuid=<uuid>; @Lucas, We start with the nodes unregistered and turned off. We register them with the command "openstack baremetal import --json instackenv.json", and then we see them in power off state, provision state "available" and maintenance mode "off". I'm assuming that the states we see are just the default ones you always get when you register new nodes, and then ironic works in the background to connect to the IPMI interfaces and update the real states of the machines. If the interfaces on some of the nodes is really down, the nodes will be "locked" when you try to configure boot for them... But of course, that's just how I see it, and I might be completely wrong because I don't really know how the code works. Hi @Udi, Well kinda, the "available" provison state and maintenance "False" are default. But the power state when you register a node is actually None, so in the background (as a periodic task) Ironic will check the power state of the node every X seconds (defaults to 60 seconds) to see if what it has in the database is the actual state of the node [1]. It could be that the operation somehow got stuck for a long time since the version we use in ospd we acquire an exclusive lock from the beginning of this operation. Upstream @Dmitry worked to minimize the usage of exclusive locks for this problem [2] but this haven't been backported yet. A workaround around this lock problem right now would be to restart the ironic-conductor and it will free up all the locks that specific conductor was holding. So you don't have to change the database or anything. [1] https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L2139-L2165 [2] https://review.openstack.org/#/c/202562/ Sorry forgot to say that. (In reply to Lucas Alvares Gomes from comment #14) > Hi @Udi, > > Well kinda, the "available" provison state and maintenance "False" are > default. But the power state when you register a node is actually None, so > in the background (as a periodic task) Ironic will check the power state of > the node every X seconds (defaults to 60 seconds) to see if what it has in > the database is the actual state of the node [1]. > Sorry forgot to mention that if the state can be sync'ed Ironic will put the node in maintenance mode to alert the operator that it can't manage it [1]. [1] https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L2139-L2165 Patch merged upstream and should be fixed for y2. Based on Udi's comment, the bug status is not up to date. Can anybody please fix that? Patch is posted and merged upstream, but not yet backported. It was not part of the 2015.1.2 release, so it needs a manual backport. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1234 |