Bug 1263816

Summary: Stack-delete of overcloud does not remove instance UUID from nodes
Product: Red Hat OpenStack Reporter: Joe Talerico <jtaleric>
Component: openstack-novaAssignee: Lucas Alvares Gomes <lmartins>
Status: CLOSED ERRATA QA Contact: Raviv Bar-Tal <rbartal>
Severity: medium Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: akrzos, berrange, dasmith, dcain, dnavale, eglynn, gdrapeau, jcoufal, jjoyce, jslagle, jtrowbri, kchamart, mburns, mcornea, nlevinki, racedoro, rcernin, rhel-osp-director-maint, sbauza, sferdjao, sgordon, srevivo, vromanso
Target Milestone: rcKeywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-nova-14.0.1-1.el7ost Doc Type: Bug Fix
Doc Text:
Previously, the nova ironic virt driver wrote an instance UUID in the Bare Metal Provisioning (ironic) node before starting a deployment. If something failed between writing the UUID and starting the deployment, Compute did not remove the instance after it failed to spawn the instance. As a result, the Bare Metal Provisioning (ironic) node would have an instance UUID set and would not be picked for another deployment. With this update, if spawning an instance fails at any stage of the deployment, the ironic virt driver ensures that the instance UUID is cleaned up. As a result, nodes will not have an instance UUID set and will be picked up for a new deployment.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 15:15:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
OSPD9 Ironic logs none

Description Joe Talerico 2015-09-16 19:12:11 UTC
Description of problem:
Delete of overcloud resulted in success, however ironic nodes still had references to Instance UUIDs

[stack@gprfc001 ~]$ heat stack-delete overcloud
+--------------------------------------+------------+--------------------+----------------------+
| id                                   | stack_name | stack_status       | creation_time        |
+--------------------------------------+------------+--------------------+----------------------+
| 972af940-06bf-4ebb-a618-3f7e6a8a7e64 | overcloud  | DELETE_IN_PROGRESS | 2015-09-16T18:26:27Z |
+--------------------------------------+------------+--------------------+----------------------+
[stack@gprfc001 ~]$ heat stack-list
+----+------------+--------------+---------------+
| id | stack_name | stack_status | creation_time |
+----+------------+--------------+---------------+
+----+------------+--------------+---------------+
[stack@gprfc001 ~]$ ironic node-list
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| UUID                                 | Name | Instance UUID                        | Power State | Provision State | Maintenance |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| a0ae3a0b-0b90-4a2d-bfb1-b3fe6590f793 | None | 3f19de17-89a6-446c-b7f7-1ff6223ee106 | power on    | active          | False       |
| c5a347da-d07c-4f74-8616-308713dddb25 | None | 0ab03e1d-83dd-4502-a9f3-8192112096d2 | power on    | active          | False       |
| bd4e8ba4-cdc2-433a-83aa-d53cf4ba096f | None | 8244bf7c-93bf-4f8a-b705-4617fdb74e87 | power on    | active          | False       |
| 64a98320-5398-434d-8b90-2759079ced10 | None | 2b2accf2-1151-4d9e-896e-647f27c9da5c | power on    | active          | False       |
| 6cbd972b-1198-4394-8632-12384e3e6227 | None | 93ea3a55-40b2-44c9-ba81-94c9bad9e484 | power on    | active          | False       |
| 4283a41b-7fbc-4941-8e45-3f48c6c2602d | None | b782f846-b2a0-4272-b166-36f602b79ce6 | power on    | active          | False       |
| 2355c059-0207-4dff-8c78-3735d421051f | None | 30f98833-9e51-472a-963d-9860836a1ca8 | power on    | active          | False       |
| 582ce016-f42b-4db1-9d41-d794236367b1 | None | ecf5f641-0829-456b-b78c-2c58449c02f3 | power on    | active          | False       |
| d003166c-57fb-4356-95be-68305e716306 | None | 1049bc2a-0769-4193-97c4-3bd56034ccb7 | power on    | active          | False       |
| e42c5a8a-8913-4217-9ace-ed878eaaa528 | None | fba24c27-ee8a-44de-8229-186a69bddbb9 | power on    | active          | False       |
| 50353a51-1f99-4204-9127-f095d9354d8d | None | e1b09228-6065-4c5b-94a3-2e79f953f1fb | power on    | active          | False       |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
[stack@gprfc001 ~]$ heat stack-list
+----+------------+--------------+---------------+
| id | stack_name | stack_status | creation_time |
+----+------------+--------------+---------------+
+----+------------+--------------+---------------+
[stack@gprfc001 ~]$ nova list
+----+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+----+------+--------+------------+-------------+----------+
+----+------+--------+------------+-------------+----------+

MariaDB [ironic]> select instance_uuid from nodes;
+--------------------------------------+
| instance_uuid                        |
+--------------------------------------+
| 0ab03e1d-83dd-4502-a9f3-8192112096d2 |
| 1049bc2a-0769-4193-97c4-3bd56034ccb7 |
| 2b2accf2-1151-4d9e-896e-647f27c9da5c |
| 30f98833-9e51-472a-963d-9860836a1ca8 |
| 3f19de17-89a6-446c-b7f7-1ff6223ee106 |
| 8244bf7c-93bf-4f8a-b705-4617fdb74e87 |
| 93ea3a55-40b2-44c9-ba81-94c9bad9e484 |
| b782f846-b2a0-4272-b166-36f602b79ce6 |
| e1b09228-6065-4c5b-94a3-2e79f953f1fb |
| ecf5f641-0829-456b-b78c-2c58449c02f3 |
| fba24c27-ee8a-44de-8229-186a69bddbb9 |
+--------------------------------------+
11 rows in set (0.00 sec)


Version-Release number of selected component (if applicable):
openstack-heat-api-cloudwatch-2015.1.1-1.el7ost.noarch
python-heatclient-0.6.0-1.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-46.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-heat-common-2015.1.1-1.el7ost.noarch
openstack-heat-api-2015.1.1-1.el7ost.noarch
openstack-heat-engine-2015.1.1-1.el7ost.noarch
openstack-heat-api-cfn-2015.1.1-1.el7ost.noarch


How reproducible:
not sure.

Steps to Reproduce:
1. create overcloud
2. delete overcloud

Actual results:
ironic nodes have UUIDs

Expected results:
ironic nodes cleaned


Additional info:

Comment 5 John Trowbridge 2015-09-22 20:16:06 UTC
From discussing this on IRC, a simple reproducer for this would be:

1. Launch an baremetal instance
2. kill the ironic-conductor service
3. delete the instance from nova
4. restart the ironic-conductor service

This will leave the instance with an instance_uuid that can not be deleted without direct editing of the db.

These steps just show the issue in a really simple way. The actual issue is because yum update is causing the conductor service to crash, `yum update; heat stack-delete overcloud` leads to the same behavior.

Comment 6 Joe Talerico 2015-12-14 14:35:04 UTC
Just hit this in OSP8.

Comment 7 Lucas Alvares Gomes 2016-01-12 12:56:00 UTC
(In reply to John Trowbridge from comment #5)
> From discussing this on IRC, a simple reproducer for this would be:
> 
> 1. Launch an baremetal instance
> 2. kill the ironic-conductor service
> 3. delete the instance from nova
> 4. restart the ironic-conductor service
> 
> This will leave the instance with an instance_uuid that can not be deleted
> without direct editing of the db.
> 

Hi John, if you do that the instances will continue to be marked as active in Ironic right?

That would require people to manually delete them from Ironic by mimic'ing what the nova driver in Ironic does:

$ ironic node-set-provision-state <node uuid> deleted

And to remove the instance_uuid

$ ironic node-update <node uuid> remove instance_uuid

Does that works for you?

Comment 8 Mike Burns 2016-04-07 20:50:54 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 10 Alex Krzos 2016-06-28 19:55:50 UTC
I also encountered this with OSPd9 after deleting an overcloud via openstack stack delete overcloud

[stack@gprfc007 ~]$ ironic node-list
+--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+
| 3c9d0f77-3a8a-4621-be8a-59662c58396f | None | 35920bdc-254b-4bc0-a31c-c7863441613e | power off   | available          | False       |
| 700f7ebd-29f4-419e-80df-68da58f13d3b | None | None                                 | power off   | available          | False       |
| 6a95b0af-4320-4b6e-9924-da8c343a5174 | None | None                                 | power off   | available          | False       |
| e618b0d5-ba09-46ef-a074-1d543fb9a892 | None | None                                 | power off   | available          | False       |
| c97ce129-ee54-4108-8481-4ede8ead7f70 | None | None                                 | power off   | available          | False       |
+--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+

I got around this by running a workaround provided by Joe:

[stack@gprfc007 ~]$ ironic node-update 3c9d0f77-3a8a-4621-be8a-59662c58396f remove instance_uuid

I would definitely chalk this up as inconsistent to reproduce as I had deleted and redeployed several times this past week without any issue until today.

Comment 11 Alex Krzos 2016-06-28 20:02:03 UTC
Created attachment 1173547 [details]
OSPD9 Ironic logs

Comment 12 Karthik Prabhakar 2016-08-08 23:04:12 UTC
The 'ironic node-update remove instance_uuid' workaround doesn't work for me (on OSPd9):

[stack@undercloud ~]$ ironic node-list
+--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+
| 2dd880f3-63e2-419a-a604-0b667625bb0e | None | None                                 | power off   | available          | False       |
| 682dd8c0-5710-4d5e-b95d-e158ee051ab2 | None | None                                 | power off   | available          | False       |
| 6a00b395-15e8-4621-b994-25c1af4ec8ee | None | None                                 | power off   | available          | False       |
| 13c2862c-83f9-47a8-b4f9-9be78df7fae1 | None | 2e6bc741-e711-4c3d-a067-7857bdb7beee | power off   | available          | False       |
| 0a8c9295-89fb-49f4-9ff5-7cc14c44a542 | None | None                                 | power off   | available          | False       |
| 6d1a6dbb-5748-4311-845d-86ddb6fc26f0 | None | 0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954 | power off   | available          | False       |
| bece1995-2edd-4fa6-bb79-f2730df4a461 | None | None                                 | power off   | available          | False       |
| 774da9a9-cbff-4fe9-a1e2-2e3287d125f7 | None | None                                 | power off   | available          | True        |
+--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+

[stack@undercloud ~]$ ironic node-update 13c2862c-83f9-47a8-b4f9-9be78df7fae1 remove 2e6bc741-e711-4c3d-a067-7857bdb7beee
Couldn't apply patch '[{'path': '/2e6bc741-e711-4c3d-a067-7857bdb7beee', 'op': 'remove'}]'. Reason: u'2e6bc741-e711-4c3d-a067-7857bdb7beee' (HTTP 400)

[stack@undercloud ~]$ ironic node-update 6d1a6dbb-5748-4311-845d-86ddb6fc26f0 remove 0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954
Couldn't apply patch '[{'path': '/0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954', 'op': 'remove'}]'. Reason: u'0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954' (HTTP 400)

[stack@undercloud ~]$

Comment 13 Lucas Alvares Gomes 2016-08-10 13:42:11 UTC
(In reply to Karthik Prabhakar from comment #12)
> The 'ironic node-update remove instance_uuid' workaround doesn't work for me
> (on OSPd9):
> 
> [stack@undercloud ~]$ ironic node-list
> +--------------------------------------+------+------------------------------
> --------+-------------+--------------------+-------------+
> | UUID                                 | Name | Instance UUID               
> | Power State | Provisioning State | Maintenance |
> +--------------------------------------+------+------------------------------
> --------+-------------+--------------------+-------------+
> | 2dd880f3-63e2-419a-a604-0b667625bb0e | None | None                        
> | power off   | available          | False       |
> | 682dd8c0-5710-4d5e-b95d-e158ee051ab2 | None | None                        
> | power off   | available          | False       |
> | 6a00b395-15e8-4621-b994-25c1af4ec8ee | None | None                        
> | power off   | available          | False       |
> | 13c2862c-83f9-47a8-b4f9-9be78df7fae1 | None |
> 2e6bc741-e711-4c3d-a067-7857bdb7beee | power off   | available          |
> False       |
> | 0a8c9295-89fb-49f4-9ff5-7cc14c44a542 | None | None                        
> | power off   | available          | False       |
> | 6d1a6dbb-5748-4311-845d-86ddb6fc26f0 | None |
> 0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954 | power off   | available          |
> False       |
> | bece1995-2edd-4fa6-bb79-f2730df4a461 | None | None                        
> | power off   | available          | False       |
> | 774da9a9-cbff-4fe9-a1e2-2e3287d125f7 | None | None                        
> | power off   | available          | True        |
> +--------------------------------------+------+------------------------------
> --------+-------------+--------------------+-------------+
> 
> [stack@undercloud ~]$ ironic node-update
> 13c2862c-83f9-47a8-b4f9-9be78df7fae1 remove
> 2e6bc741-e711-4c3d-a067-7857bdb7beee
> Couldn't apply patch '[{'path': '/2e6bc741-e711-4c3d-a067-7857bdb7beee',
> 'op': 'remove'}]'. Reason: u'2e6bc741-e711-4c3d-a067-7857bdb7beee' (HTTP 400)
> 
> [stack@undercloud ~]$ ironic node-update
> 6d1a6dbb-5748-4311-845d-86ddb6fc26f0 remove
> 0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954
> Couldn't apply patch '[{'path': '/0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954',
> 'op': 'remove'}]'. Reason: u'0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954' (HTTP 400)
> 
> [stack@undercloud ~]$

The command is incorrect, the correct way to clean out the instance_uuid field is:

$ ironic node-update <node uuid> remove instance_uuid

instance_uuid is the name of the field, it shouldn't be replaced with the actual UUID of the instance.

Comment 14 Lucas Alvares Gomes 2016-08-10 16:04:27 UTC
There's current a patch for review upstream in Nova that seems to address this problem: https://review.openstack.org/#/c/341253/7

The patch is in Nova rather than Ironic because the ironic driver in nova is the one responsible for setting (and now cleaning up) the instance_uuid in case the deployment fails before it hits Ironic.

Comment 15 Dave Cain 2016-10-04 04:19:42 UTC
Ran into this problem in OSP9, Joe's suggestion worked for me.  Quite an irritating problem for a customer to have, will this be fixed by OSP10?

[stack@refarch-ospd ~]$ nova list
+----+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+----+------+--------+------------+-------------+----------+
+----+------+--------+------------+-------------+----------+

[stack@refarch-ospd ~]$ ironic node-list
i+--------------------------------------+---------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name    | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+---------+--------------------------------------+-------------+--------------------+-------------+
| 57e44040-5feb-42fd-8cbd-5b927802af46 | r630-02 | 4f2c4b38-71f9-4c89-98b1-95410efa2cbd | power off   | available          | False       |
+--------------------------------------+---------+--------------------------------------+-------------+--------------------+-------------+

[stack@refarch-ospd ~]$ ironic node-delete r630-02
Failed to delete node r630-02: Node 57e44040-5feb-42fd-8cbd-5b927802af46 is associated with instance 4f2c4b38-71f9-4c89-98b1-95410efa2cbd. (HTTP 409)

[stack@refarch-ospd ~]$ ironic node-update r630-02 remove instance_uuid

[stack@refarch-ospd ~]$ ironic node-list
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name    | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+
| 57e44040-5feb-42fd-8cbd-5b927802af46 | r630-02 | None          | power off   | available          | False       |
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+

[stack@refarch-ospd ~]$ ironic node-delete r630-02
Deleted node r630-02

Comment 16 Jaromir Coufal 2016-10-05 19:28:22 UTC
Lucas, the patch seems to be merged. Can you please update the bz status?

Comment 17 Lucas Alvares Gomes 2016-10-05 19:53:01 UTC
(In reply to Jaromir Coufal from comment #16)
> Lucas, the patch seems to be merged. Can you please update the bz status?

Hi Jaromir, cool! I've checked and the patch is already present in the "rhos-10.0-patches" branch for nova.

Comment 19 Raviv Bar-Tal 2016-11-02 13:08:24 UTC
When trying the reproduce steps I found out the behaviour changed and now when trying to delete stack or delete nova instance when ironic-conductor is down, the stack/instance change status to DELETE_FAIL in stack list and ERROR in nova list.
Once ironic-conductor is started and the delete command run again the stack/nodes are delete and instance uuid is removed from ironic node.

Comment 21 errata-xmlrpc 2016-12-14 15:15:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html