Description of problem: Delete of overcloud resulted in success, however ironic nodes still had references to Instance UUIDs [stack@gprfc001 ~]$ heat stack-delete overcloud +--------------------------------------+------------+--------------------+----------------------+ | id | stack_name | stack_status | creation_time | +--------------------------------------+------------+--------------------+----------------------+ | 972af940-06bf-4ebb-a618-3f7e6a8a7e64 | overcloud | DELETE_IN_PROGRESS | 2015-09-16T18:26:27Z | +--------------------------------------+------------+--------------------+----------------------+ [stack@gprfc001 ~]$ heat stack-list +----+------------+--------------+---------------+ | id | stack_name | stack_status | creation_time | +----+------------+--------------+---------------+ +----+------------+--------------+---------------+ [stack@gprfc001 ~]$ ironic node-list +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ | UUID | Name | Instance UUID | Power State | Provision State | Maintenance | +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ | a0ae3a0b-0b90-4a2d-bfb1-b3fe6590f793 | None | 3f19de17-89a6-446c-b7f7-1ff6223ee106 | power on | active | False | | c5a347da-d07c-4f74-8616-308713dddb25 | None | 0ab03e1d-83dd-4502-a9f3-8192112096d2 | power on | active | False | | bd4e8ba4-cdc2-433a-83aa-d53cf4ba096f | None | 8244bf7c-93bf-4f8a-b705-4617fdb74e87 | power on | active | False | | 64a98320-5398-434d-8b90-2759079ced10 | None | 2b2accf2-1151-4d9e-896e-647f27c9da5c | power on | active | False | | 6cbd972b-1198-4394-8632-12384e3e6227 | None | 93ea3a55-40b2-44c9-ba81-94c9bad9e484 | power on | active | False | | 4283a41b-7fbc-4941-8e45-3f48c6c2602d | None | b782f846-b2a0-4272-b166-36f602b79ce6 | power on | active | False | | 2355c059-0207-4dff-8c78-3735d421051f | None | 30f98833-9e51-472a-963d-9860836a1ca8 | power on | active | False | | 582ce016-f42b-4db1-9d41-d794236367b1 | None | ecf5f641-0829-456b-b78c-2c58449c02f3 | power on | active | False | | d003166c-57fb-4356-95be-68305e716306 | None | 1049bc2a-0769-4193-97c4-3bd56034ccb7 | power on | active | False | | e42c5a8a-8913-4217-9ace-ed878eaaa528 | None | fba24c27-ee8a-44de-8229-186a69bddbb9 | power on | active | False | | 50353a51-1f99-4204-9127-f095d9354d8d | None | e1b09228-6065-4c5b-94a3-2e79f953f1fb | power on | active | False | +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ [stack@gprfc001 ~]$ heat stack-list +----+------------+--------------+---------------+ | id | stack_name | stack_status | creation_time | +----+------------+--------------+---------------+ +----+------------+--------------+---------------+ [stack@gprfc001 ~]$ nova list +----+------+--------+------------+-------------+----------+ | ID | Name | Status | Task State | Power State | Networks | +----+------+--------+------------+-------------+----------+ +----+------+--------+------------+-------------+----------+ MariaDB [ironic]> select instance_uuid from nodes; +--------------------------------------+ | instance_uuid | +--------------------------------------+ | 0ab03e1d-83dd-4502-a9f3-8192112096d2 | | 1049bc2a-0769-4193-97c4-3bd56034ccb7 | | 2b2accf2-1151-4d9e-896e-647f27c9da5c | | 30f98833-9e51-472a-963d-9860836a1ca8 | | 3f19de17-89a6-446c-b7f7-1ff6223ee106 | | 8244bf7c-93bf-4f8a-b705-4617fdb74e87 | | 93ea3a55-40b2-44c9-ba81-94c9bad9e484 | | b782f846-b2a0-4272-b166-36f602b79ce6 | | e1b09228-6065-4c5b-94a3-2e79f953f1fb | | ecf5f641-0829-456b-b78c-2c58449c02f3 | | fba24c27-ee8a-44de-8229-186a69bddbb9 | +--------------------------------------+ 11 rows in set (0.00 sec) Version-Release number of selected component (if applicable): openstack-heat-api-cloudwatch-2015.1.1-1.el7ost.noarch python-heatclient-0.6.0-1.el7ost.noarch openstack-tripleo-heat-templates-0.8.6-46.el7ost.noarch openstack-heat-templates-0-0.6.20150605git.el7ost.noarch openstack-heat-common-2015.1.1-1.el7ost.noarch openstack-heat-api-2015.1.1-1.el7ost.noarch openstack-heat-engine-2015.1.1-1.el7ost.noarch openstack-heat-api-cfn-2015.1.1-1.el7ost.noarch How reproducible: not sure. Steps to Reproduce: 1. create overcloud 2. delete overcloud Actual results: ironic nodes have UUIDs Expected results: ironic nodes cleaned Additional info:
From discussing this on IRC, a simple reproducer for this would be: 1. Launch an baremetal instance 2. kill the ironic-conductor service 3. delete the instance from nova 4. restart the ironic-conductor service This will leave the instance with an instance_uuid that can not be deleted without direct editing of the db. These steps just show the issue in a really simple way. The actual issue is because yum update is causing the conductor service to crash, `yum update; heat stack-delete overcloud` leads to the same behavior.
Just hit this in OSP8.
(In reply to John Trowbridge from comment #5) > From discussing this on IRC, a simple reproducer for this would be: > > 1. Launch an baremetal instance > 2. kill the ironic-conductor service > 3. delete the instance from nova > 4. restart the ironic-conductor service > > This will leave the instance with an instance_uuid that can not be deleted > without direct editing of the db. > Hi John, if you do that the instances will continue to be marked as active in Ironic right? That would require people to manually delete them from Ironic by mimic'ing what the nova driver in Ironic does: $ ironic node-set-provision-state <node uuid> deleted And to remove the instance_uuid $ ironic node-update <node uuid> remove instance_uuid Does that works for you?
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
I also encountered this with OSPd9 after deleting an overcloud via openstack stack delete overcloud [stack@gprfc007 ~]$ ironic node-list +--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+ | 3c9d0f77-3a8a-4621-be8a-59662c58396f | None | 35920bdc-254b-4bc0-a31c-c7863441613e | power off | available | False | | 700f7ebd-29f4-419e-80df-68da58f13d3b | None | None | power off | available | False | | 6a95b0af-4320-4b6e-9924-da8c343a5174 | None | None | power off | available | False | | e618b0d5-ba09-46ef-a074-1d543fb9a892 | None | None | power off | available | False | | c97ce129-ee54-4108-8481-4ede8ead7f70 | None | None | power off | available | False | +--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+ I got around this by running a workaround provided by Joe: [stack@gprfc007 ~]$ ironic node-update 3c9d0f77-3a8a-4621-be8a-59662c58396f remove instance_uuid I would definitely chalk this up as inconsistent to reproduce as I had deleted and redeployed several times this past week without any issue until today.
Created attachment 1173547 [details] OSPD9 Ironic logs
The 'ironic node-update remove instance_uuid' workaround doesn't work for me (on OSPd9): [stack@undercloud ~]$ ironic node-list +--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+ | 2dd880f3-63e2-419a-a604-0b667625bb0e | None | None | power off | available | False | | 682dd8c0-5710-4d5e-b95d-e158ee051ab2 | None | None | power off | available | False | | 6a00b395-15e8-4621-b994-25c1af4ec8ee | None | None | power off | available | False | | 13c2862c-83f9-47a8-b4f9-9be78df7fae1 | None | 2e6bc741-e711-4c3d-a067-7857bdb7beee | power off | available | False | | 0a8c9295-89fb-49f4-9ff5-7cc14c44a542 | None | None | power off | available | False | | 6d1a6dbb-5748-4311-845d-86ddb6fc26f0 | None | 0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954 | power off | available | False | | bece1995-2edd-4fa6-bb79-f2730df4a461 | None | None | power off | available | False | | 774da9a9-cbff-4fe9-a1e2-2e3287d125f7 | None | None | power off | available | True | +--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+ [stack@undercloud ~]$ ironic node-update 13c2862c-83f9-47a8-b4f9-9be78df7fae1 remove 2e6bc741-e711-4c3d-a067-7857bdb7beee Couldn't apply patch '[{'path': '/2e6bc741-e711-4c3d-a067-7857bdb7beee', 'op': 'remove'}]'. Reason: u'2e6bc741-e711-4c3d-a067-7857bdb7beee' (HTTP 400) [stack@undercloud ~]$ ironic node-update 6d1a6dbb-5748-4311-845d-86ddb6fc26f0 remove 0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954 Couldn't apply patch '[{'path': '/0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954', 'op': 'remove'}]'. Reason: u'0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954' (HTTP 400) [stack@undercloud ~]$
(In reply to Karthik Prabhakar from comment #12) > The 'ironic node-update remove instance_uuid' workaround doesn't work for me > (on OSPd9): > > [stack@undercloud ~]$ ironic node-list > +--------------------------------------+------+------------------------------ > --------+-------------+--------------------+-------------+ > | UUID | Name | Instance UUID > | Power State | Provisioning State | Maintenance | > +--------------------------------------+------+------------------------------ > --------+-------------+--------------------+-------------+ > | 2dd880f3-63e2-419a-a604-0b667625bb0e | None | None > | power off | available | False | > | 682dd8c0-5710-4d5e-b95d-e158ee051ab2 | None | None > | power off | available | False | > | 6a00b395-15e8-4621-b994-25c1af4ec8ee | None | None > | power off | available | False | > | 13c2862c-83f9-47a8-b4f9-9be78df7fae1 | None | > 2e6bc741-e711-4c3d-a067-7857bdb7beee | power off | available | > False | > | 0a8c9295-89fb-49f4-9ff5-7cc14c44a542 | None | None > | power off | available | False | > | 6d1a6dbb-5748-4311-845d-86ddb6fc26f0 | None | > 0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954 | power off | available | > False | > | bece1995-2edd-4fa6-bb79-f2730df4a461 | None | None > | power off | available | False | > | 774da9a9-cbff-4fe9-a1e2-2e3287d125f7 | None | None > | power off | available | True | > +--------------------------------------+------+------------------------------ > --------+-------------+--------------------+-------------+ > > [stack@undercloud ~]$ ironic node-update > 13c2862c-83f9-47a8-b4f9-9be78df7fae1 remove > 2e6bc741-e711-4c3d-a067-7857bdb7beee > Couldn't apply patch '[{'path': '/2e6bc741-e711-4c3d-a067-7857bdb7beee', > 'op': 'remove'}]'. Reason: u'2e6bc741-e711-4c3d-a067-7857bdb7beee' (HTTP 400) > > [stack@undercloud ~]$ ironic node-update > 6d1a6dbb-5748-4311-845d-86ddb6fc26f0 remove > 0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954 > Couldn't apply patch '[{'path': '/0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954', > 'op': 'remove'}]'. Reason: u'0cbbeb4d-16f8-41f3-8f7c-c6dbed59c954' (HTTP 400) > > [stack@undercloud ~]$ The command is incorrect, the correct way to clean out the instance_uuid field is: $ ironic node-update <node uuid> remove instance_uuid instance_uuid is the name of the field, it shouldn't be replaced with the actual UUID of the instance.
There's current a patch for review upstream in Nova that seems to address this problem: https://review.openstack.org/#/c/341253/7 The patch is in Nova rather than Ironic because the ironic driver in nova is the one responsible for setting (and now cleaning up) the instance_uuid in case the deployment fails before it hits Ironic.
Ran into this problem in OSP9, Joe's suggestion worked for me. Quite an irritating problem for a customer to have, will this be fixed by OSP10? [stack@refarch-ospd ~]$ nova list +----+------+--------+------------+-------------+----------+ | ID | Name | Status | Task State | Power State | Networks | +----+------+--------+------------+-------------+----------+ +----+------+--------+------------+-------------+----------+ [stack@refarch-ospd ~]$ ironic node-list i+--------------------------------------+---------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+---------+--------------------------------------+-------------+--------------------+-------------+ | 57e44040-5feb-42fd-8cbd-5b927802af46 | r630-02 | 4f2c4b38-71f9-4c89-98b1-95410efa2cbd | power off | available | False | +--------------------------------------+---------+--------------------------------------+-------------+--------------------+-------------+ [stack@refarch-ospd ~]$ ironic node-delete r630-02 Failed to delete node r630-02: Node 57e44040-5feb-42fd-8cbd-5b927802af46 is associated with instance 4f2c4b38-71f9-4c89-98b1-95410efa2cbd. (HTTP 409) [stack@refarch-ospd ~]$ ironic node-update r630-02 remove instance_uuid [stack@refarch-ospd ~]$ ironic node-list +--------------------------------------+---------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+---------+---------------+-------------+--------------------+-------------+ | 57e44040-5feb-42fd-8cbd-5b927802af46 | r630-02 | None | power off | available | False | +--------------------------------------+---------+---------------+-------------+--------------------+-------------+ [stack@refarch-ospd ~]$ ironic node-delete r630-02 Deleted node r630-02
Lucas, the patch seems to be merged. Can you please update the bz status?
(In reply to Jaromir Coufal from comment #16) > Lucas, the patch seems to be merged. Can you please update the bz status? Hi Jaromir, cool! I've checked and the patch is already present in the "rhos-10.0-patches" branch for nova.
When trying the reproduce steps I found out the behaviour changed and now when trying to delete stack or delete nova instance when ironic-conductor is down, the stack/instance change status to DELETE_FAIL in stack list and ERROR in nova list. Once ironic-conductor is started and the delete command run again the stack/nodes are delete and instance uuid is removed from ironic node.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html