This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2099279 - instances fails evacuation if group instance where they belongs to get removed
Summary: instances fails evacuation if group instance where they belongs to get removed
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-20 12:42 UTC by alisci
Modified: 2024-01-10 20:43 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-01-10 20:38:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1890244 0 None None None 2022-06-21 09:19:02 UTC
OpenStack gerrit 847000 0 None NEW add repoducer test for bug 1890244 2022-06-21 11:29:16 UTC
OpenStack gerrit 847001 0 None NEW ignore deleted server groups in validation 2022-06-21 11:29:16 UTC
Red Hat Issue Tracker   OSP-15843 0 None None None 2024-01-10 20:38:57 UTC
Red Hat Issue Tracker OSP-31135 0 None None None 2024-01-10 20:43:33 UTC

Description alisci 2022-06-20 12:42:05 UTC
Description of problem:
host evacuation fails for instances whom group instance get deleted

openstack server show reports for the error {'code': 404, 'created': '2022-06-20T12:21:30Z', 'message': 'Instance group <instance group uuid> could not be found.'}


Version-Release number of selected component (if applicable):
OSP 16.2.0
OSP 16.2.1


How reproducible:
always


Steps to Reproduce:
- create an instance group
- create a vm with the previously instance group created
- delete the instance group previously created
- shutdown/reboot the compute where the instance it is running on
- evacuate the host with : nova host-evacuate <compute where Vm created belongs to>
Actual results:


Expected results:
instance get evacuated


Additional info:

Comment 1 alisci 2022-06-20 12:44:55 UTC
here the reproduction on a OSP 16.2.0 lab

(overcloud) [stack@undercloud-0 ~]$ openstack server group create --policy affinity test-group-01
+------------+--------------------------------------+
| Field      | Value                                |
+------------+--------------------------------------+
| id         | 22fe1c6e-9593-4012-b710-51956b66b104 |
| members    |                                      |
| name       | test-group-01                        |
| policy     | affinity                             |
| project_id | 3f6a49bcd3364cca988dd994dccd0887     |
| rules      |                                      |
| user_id    | 35596b10983b4b2b80825f491356703a     |
+------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server create --volume cirros-volume --flavor m1.tiny vm01 --wait --network net1 --hint group=22fe1c6e-9593-4012-b710-51956b66b104

+-------------------------------------+------------------------------------------------------------------------------------+
| Field                               | Value                                                                              |
+-------------------------------------+------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                             |
| OS-EXT-AZ:availability_zone         | nova                                                                               |
| OS-EXT-SRV-ATTR:host                | compute-1.redhat.local                                                             |
| OS-EXT-SRV-ATTR:hostname            | vm01                                                                               |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local                                                             |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000013                                                                  |
| OS-EXT-SRV-ATTR:kernel_id           |                                                                                    |
| OS-EXT-SRV-ATTR:launch_index        | 0                                                                                  |
| OS-EXT-SRV-ATTR:ramdisk_id          |                                                                                    |
| OS-EXT-SRV-ATTR:reservation_id      | r-ccmxkx2n                                                                         |
| OS-EXT-SRV-ATTR:root_device_name    | /dev/vda                                                                           |
| OS-EXT-SRV-ATTR:user_data           | None                                                                               |
| OS-EXT-STS:power_state              | Running                                                                            |
| OS-EXT-STS:task_state               | None                                                                               |
| OS-EXT-STS:vm_state                 | active                                                                             |
| OS-SRV-USG:launched_at              | 2022-06-20T12:17:54.000000                                                         |
| OS-SRV-USG:terminated_at            | None                                                                               |
| accessIPv4                          |                                                                                    |
| accessIPv6                          |                                                                                    |
| addresses                           | net1=192.168.0.78                                                                  |
| adminPass                           | zX6KahbcC6wf                                                                       |
| config_drive                        |                                                                                    |
| created                             | 2022-06-20T12:17:25Z                                                               |
| description                         | None                                                                               |
| flavor                              | disk='2', ephemeral='0', , original_name='m1.tiny', ram='256', swap='0', vcpus='1' |
| hostId                              | 159afb1af230f986368dea681deecec28457b961fd29a7bc6a988f50                           |
| host_status                         | UP                                                                                 |
| id                                  | 7dc8a912-131d-427e-869f-458d9c547c11                                               |
| image                               |                                                                                    |
| key_name                            | None                                                                               |
| locked                              | False                                                                              |
| locked_reason                       | None                                                                               |
| name                                | vm01                                                                               |
| progress                            | 0                                                                                  |
| project_id                          | 3f6a49bcd3364cca988dd994dccd0887                                                   |
| properties                          |                                                                                    |
| security_groups                     | name='default'                                                                     |
| server_groups                       | ['22fe1c6e-9593-4012-b710-51956b66b104']                                           |
| status                              | ACTIVE                                                                             |
| tags                                | []                                                                                 |
| trusted_image_certificates          | None                                                                               |
| updated                             | 2022-06-20T12:17:54Z                                                               |
| user_id                             | 35596b10983b4b2b80825f491356703a                                                   |
| volumes_attached                    | delete_on_termination='False', id='e3014861-0360-4227-921d-bb40f139aaa6'           |
+-------------------------------------+------------------------------------------------------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server group delete test-group-01

(overcloud) [stack@undercloud-0 ~]$ openstack server show vm01
+-------------------------------------+------------------------------------------------------------------------------------+
| Field                               | Value                                                                              |
+-------------------------------------+------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                             |
| OS-EXT-AZ:availability_zone         | nova                                                                               |
| OS-EXT-SRV-ATTR:host                | compute-1.redhat.local                                                             |
| OS-EXT-SRV-ATTR:hostname            | vm01                                                                               |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local                                                             |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000013                                                                  |
| OS-EXT-SRV-ATTR:kernel_id           |                                                                                    |
| OS-EXT-SRV-ATTR:launch_index        | 0                                                                                  |
| OS-EXT-SRV-ATTR:ramdisk_id          |                                                                                    |
| OS-EXT-SRV-ATTR:reservation_id      | r-ccmxkx2n                                                                         |
| OS-EXT-SRV-ATTR:root_device_name    | /dev/vda                                                                           |
| OS-EXT-SRV-ATTR:user_data           | None                                                                               |
| OS-EXT-STS:power_state              | Running                                                                            |
| OS-EXT-STS:task_state               | None                                                                               |
| OS-EXT-STS:vm_state                 | active                                                                             |
| OS-SRV-USG:launched_at              | 2022-06-20T12:17:54.000000                                                         |
| OS-SRV-USG:terminated_at            | None                                                                               |
| accessIPv4                          |                                                                                    |
| accessIPv6                          |                                                                                    |
| addresses                           | net1=192.168.0.78                                                                  |
| config_drive                        |                                                                                    |
| created                             | 2022-06-20T12:17:25Z                                                               |
| description                         | None                                                                               |
| flavor                              | disk='2', ephemeral='0', , original_name='m1.tiny', ram='256', swap='0', vcpus='1' |
| hostId                              | 159afb1af230f986368dea681deecec28457b961fd29a7bc6a988f50                           |
| host_status                         | UP                                                                                 |
| id                                  | 7dc8a912-131d-427e-869f-458d9c547c11                                               |
| image                               |                                                                                    |
| key_name                            | None                                                                               |
| locked                              | False                                                                              |
| locked_reason                       | None                                                                               |
| name                                | vm01                                                                               |
| progress                            | 0                                                                                  |
| project_id                          | 3f6a49bcd3364cca988dd994dccd0887                                                   |
| properties                          |                                                                                    |
| security_groups                     | name='default'                                                                     |
| server_groups                       | []                                                                                 |
| status                              | ACTIVE                                                                             |
| tags                                | []                                                                                 |
| trusted_image_certificates          | None                                                                               |
| updated                             | 2022-06-20T12:17:54Z                                                               |
| user_id                             | 35596b10983b4b2b80825f491356703a                                                   |
| volumes_attached                    | delete_on_termination='False', id='e3014861-0360-4227-921d-bb40f139aaa6'           |
+-------------------------------------+------------------------------------------------------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.8
Warning: Permanently added '192.168.24.8' (ECDSA) to the list of known hosts.
Last login: Mon Jun 20 12:14:29 2022 from 192.168.24.1
[heat-admin@compute-1 ~]$ sudo reboot
Connection to 192.168.24.8 closed by remote host.
Connection to 192.168.24.8 closed.

(overcloud) [stack@undercloud-0 ~]$ nova host-evacuate compute-1
+--------------------------------------+-------------------+---------------+
| Server UUID                          | Evacuate Accepted | Error Message |
+--------------------------------------+-------------------+---------------+
| 7dc8a912-131d-427e-869f-458d9c547c11 | True              |               |
+--------------------------------------+-------------------+---------------+


(overcloud) [stack@undercloud-0 ~]$ nova migration-list
+----+--------------------------------------+------------------------+------------------------+------------------------+------------------------+-------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+
| Id | UUID                                 | Source Node            | Dest Node              | Source Compute         | Dest Compute           | Dest Host   | Status | Instance UUID                        | Old Flavor | New Flavor | Created At                 | Updated At                 | Type       |
+----+--------------------------------------+------------------------+------------------------+------------------------+------------------------+-------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+
| 13 | 17180ff5-2733-4de9-bdba-4a5b561d475a | compute-1.redhat.local | compute-0.redhat.local | compute-1.redhat.local | compute-0.redhat.local | 172.17.1.56 | failed | 7dc8a912-131d-427e-869f-458d9c547c11 | None       | None       | 2022-06-20T12:21:25.000000 | 2022-06-20T12:21:30.000000 | evacuation |
+----+--------------------------------------+------------------------+------------------------+------------------------+------------------------+-------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server event list vm01
+------------------------------------------+--------------------------------------+----------+----------------------------+
| Request ID                               | Server ID                            | Action   | Start Time                 |
+------------------------------------------+--------------------------------------+----------+----------------------------+
| req-7688c1c8-fdb0-4f78-9339-7b76f888eea4 | 7dc8a912-131d-427e-869f-458d9c547c11 | evacuate | 2022-06-20T12:21:25.000000 |
| req-e8babc01-b06b-496d-81c2-70913f6d579a | 7dc8a912-131d-427e-869f-458d9c547c11 | create   | 2022-06-20T12:17:23.000000 |
+------------------------------------------+--------------------------------------+----------+----------------------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server event show vm01 req-7688c1c8-fdb0-4f78-9339-7b76f888eea4 --fit
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field         | Value                                                                                                                                                                      |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action        | evacuate                                                                                                                                                                   |
| events        | [{'event': 'compute_rebuild_instance', 'start_time': '2022-06-20T12:21:28.000000', 'finish_time': '2022-06-20T12:21:30.000000', 'result': 'Error', 'traceback': '  File    |
|               | "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1372, in decorated_function\n    return function(self, context, *args, **kwargs)\n  File                    |
|               | "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 219, in decorated_function\n    kwargs[\'instance\'], e, sys.exc_info())\n  File                          |
|               | "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n  File "/usr/lib/python3.6/site-                               |
|               | packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n  File "/usr/lib/python3.6/site-packages/six.py", line     |
|               | 693, in reraise\n    raise value\n  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 207, in decorated_function\n    return function(self, context,   |
|               | *args, **kwargs)\n  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 3393, in rebuild_instance\n    migration, request_spec, allocs)\n  File          |
|               | "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 3455, in _do_rebuild_instance_with_claim\n    self._do_rebuild_instance(*args, **kwargs)\n  File          |
|               | "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 3478, in _do_rebuild_instance\n    self._validate_instance_group_policy(context, instance, hints)\n  File |
|               | "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 1642, in _validate_instance_group_policy\n    _do_validation(context, instance, group_hint)\n  File       |
|               | "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner\n    return f(*args, **kwargs)\n  File "/usr/lib/python3.6/site-                      |
|               | packages/nova/compute/manager.py", line 1611, in _do_validation\n    group = objects.InstanceGroup.get_by_hint(context, group_hint)\n  File "/usr/lib/python3.6/site-      |
|               | packages/nova/objects/instance_group.py", line 384, in get_by_hint\n    return cls.get_by_uuid(context, hint)\n  File "/usr/lib/python3.6/site-                            |
|               | packages/oslo_versionedobjects/base.py", line 177, in wrapper\n    args, kwargs)\n  File "/usr/lib/python3.6/site-packages/nova/conductor/rpcapi.py", line 241, in         |
|               | object_class_action_versions\n    args=args, kwargs=kwargs)\n  File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call\n                   |
|               | transport_options=self.transport_options)\n  File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send\n                                     |
|               | transport_options=transport_options)\n  File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 674, in send\n                                 |
|               | transport_options=transport_options)\n  File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 664, in _send\n    raise result\n', 'host':    |
|               | 'compute-0.redhat.local', 'hostId': 'bb22b6f39cbdb7d8b08472fd93624d1c097296188a531da1b171be10'}, {'event': 'rebuild_server', 'start_time': '2022-06-20T12:21:25.000000',   |
|               | 'finish_time': '2022-06-20T12:21:28.000000', 'result': 'Success', 'traceback': None, 'host': 'controller-1.redhat.local', 'hostId':                                        |
|               | '76a7267ae706a859241caa653440761cbac48d6e2f9e78f42ffca09f'}]                                                                                                               |
| instance_uuid | 7dc8a912-131d-427e-869f-458d9c547c11                                                                                                                                       |
| message       | Error                                                                                                                                                                      |
| project_id    | 3f6a49bcd3364cca988dd994dccd0887                                                                                                                                           |
| request_id    | req-7688c1c8-fdb0-4f78-9339-7b76f888eea4                                                                                                                                   |
| start_time    | 2022-06-20T12:21:25.000000                                                                                                                                                 |
| updated_at    | 2022-06-20T12:21:30.000000                                                                                                                                                 |
| user_id       | 35596b10983b4b2b80825f491356703a                                                                                                                                           |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server show vm01
+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------+
| Field                               | Value                                                                                                                                  |
+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                                                                                 |
| OS-EXT-AZ:availability_zone         | nova                                                                                                                                   |
| OS-EXT-SRV-ATTR:host                | compute-1.redhat.local                                                                                                                 |
| OS-EXT-SRV-ATTR:hostname            | vm01                                                                                                                                   |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local                                                                                                                 |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000013                                                                                                                      |
| OS-EXT-SRV-ATTR:kernel_id           |                                                                                                                                        |
| OS-EXT-SRV-ATTR:launch_index        | 0                                                                                                                                      |
| OS-EXT-SRV-ATTR:ramdisk_id          |                                                                                                                                        |
| OS-EXT-SRV-ATTR:reservation_id      | r-ccmxkx2n                                                                                                                             |
| OS-EXT-SRV-ATTR:root_device_name    | /dev/vda                                                                                                                               |
| OS-EXT-SRV-ATTR:user_data           | None                                                                                                                                   |
| OS-EXT-STS:power_state              | Shutdown                                                                                                                               |
| OS-EXT-STS:task_state               | None                                                                                                                                   |
| OS-EXT-STS:vm_state                 | error                                                                                                                                  |
| OS-SRV-USG:launched_at              | 2022-06-20T12:17:54.000000                                                                                                             |
| OS-SRV-USG:terminated_at            | None                                                                                                                                   |
| accessIPv4                          |                                                                                                                                        |
| accessIPv6                          |                                                                                                                                        |
| addresses                           | net1=192.168.0.78                                                                                                                      |
| config_drive                        |                                                                                                                                        |
| created                             | 2022-06-20T12:17:25Z                                                                                                                   |
| description                         | None                                                                                                                                   |
| fault                               | {'code': 404, 'created': '2022-06-20T12:21:30Z', 'message': 'Instance group 22fe1c6e-9593-4012-b710-51956b66b104 could not be found.'} |
| flavor                              | disk='2', ephemeral='0', , original_name='m1.tiny', ram='256', swap='0', vcpus='1'                                                     |
| hostId                              | 159afb1af230f986368dea681deecec28457b961fd29a7bc6a988f50                                                                               |
| host_status                         | UP                                                                                                                                     |
| id                                  | 7dc8a912-131d-427e-869f-458d9c547c11                                                                                                   |
| image                               |                                                                                                                                        |
| key_name                            | None                                                                                                                                   |
| locked                              | False                                                                                                                                  |
| locked_reason                       | None                                                                                                                                   |
| name                                | vm01                                                                                                                                   |
| project_id                          | 3f6a49bcd3364cca988dd994dccd0887                                                                                                       |
| properties                          |                                                                                                                                        |
| security_groups                     | name='default'                                                                                                                         |
| server_groups                       | []                                                                                                                                     |
| status                              | ERROR                                                                                                                                  |
| tags                                | []                                                                                                                                     |
| trusted_image_certificates          | None                                                                                                                                   |
| updated                             | 2022-06-20T12:21:40Z                                                                                                                   |
| user_id                             | 35596b10983b4b2b80825f491356703a                                                                                                       |
| volumes_attached                    | delete_on_termination='False', id='e3014861-0360-4227-921d-bb40f139aaa6'                                                               |
+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------+

Comment 2 Sylvain Bauza 2022-06-21 09:19:03 UTC
This sounds a legit bug even in master. Due to late anti-affinity check we have in the compute service (for making sure concurrent instance requests are not using the same compute if they have a anti-affinity policy for their group), we look at the RequestSpec record to know whether the instance was asked to be created with a group hint. If we find it, then we look at the related InstanceGroup record (by their UUID) to know the group policy.

Given we verify that for every instance creation or move operation (shelve, migrate, evacuate, etc.), we could indeed have a problem if the group was deleted before moving the instance.

https://github.com/openstack/nova/blob/ebe08834f311e8e22bfd9685d7e6e91dab967382/nova/compute/manager.py#L3650-L3657
https://github.com/openstack/nova/blob/ebe08834f311e8e22bfd9685d7e6e91dab967382/nova/compute/manager.py#L1733

I eventually found an upstream bug report that was created but was expired given of a timeout.
https://bugs.launchpad.net/nova/+bug/1890244

I'll reopen the Nova bug and I think this BZ is definitely OK but I'll ask our team if someone wants to work on it.

Comment 3 smooney 2022-06-21 11:37:59 UTC
while this is being resolved on master and is backported the simplest workaround is to disabel the group policy validation upcall

https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.disable_group_policy_check_upcall

by setting 

[workarounds]
disable_group_policy_check_upcall=true

in the nova.conf

that will disable the failing code and allow the evacuation to proceed.


Note You need to log in before you can comment on or make changes to this bug.