Bug 2099279

Summary: instances fails evacuation if group instance where they belongs to get removed
Product: Red Hat OpenStack Reporter: alisci <alisci>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED MIGRATED QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: alifshit, dasmith, eglynn, jhakimra, kchamart, riramos, sbauza, sgordon, smooney, vromanso
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-01-10 20:38:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description alisci 2022-06-20 12:42:05 UTC
Description of problem:
host evacuation fails for instances whom group instance get deleted

openstack server show reports for the error {'code': 404, 'created': '2022-06-20T12:21:30Z', 'message': 'Instance group <instance group uuid> could not be found.'}


Version-Release number of selected component (if applicable):
OSP 16.2.0
OSP 16.2.1


How reproducible:
always


Steps to Reproduce:
- create an instance group
- create a vm with the previously instance group created
- delete the instance group previously created
- shutdown/reboot the compute where the instance it is running on
- evacuate the host with : nova host-evacuate <compute where Vm created belongs to>
Actual results:


Expected results:
instance get evacuated


Additional info:

Comment 1 alisci 2022-06-20 12:44:55 UTC
here the reproduction on a OSP 16.2.0 lab

(overcloud) [stack@undercloud-0 ~]$ openstack server group create --policy affinity test-group-01
+------------+--------------------------------------+
| Field      | Value                                |
+------------+--------------------------------------+
| id         | 22fe1c6e-9593-4012-b710-51956b66b104 |
| members    |                                      |
| name       | test-group-01                        |
| policy     | affinity                             |
| project_id | 3f6a49bcd3364cca988dd994dccd0887     |
| rules      |                                      |
| user_id    | 35596b10983b4b2b80825f491356703a     |
+------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server create --volume cirros-volume --flavor m1.tiny vm01 --wait --network net1 --hint group=22fe1c6e-9593-4012-b710-51956b66b104

+-------------------------------------+------------------------------------------------------------------------------------+
| Field                               | Value                                                                              |
+-------------------------------------+------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                             |
| OS-EXT-AZ:availability_zone         | nova                                                                               |
| OS-EXT-SRV-ATTR:host                | compute-1.redhat.local                                                             |
| OS-EXT-SRV-ATTR:hostname            | vm01                                                                               |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local                                                             |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000013                                                                  |
| OS-EXT-SRV-ATTR:kernel_id           |                                                                                    |
| OS-EXT-SRV-ATTR:launch_index        | 0                                                                                  |
| OS-EXT-SRV-ATTR:ramdisk_id          |                                                                                    |
| OS-EXT-SRV-ATTR:reservation_id      | r-ccmxkx2n                                                                         |
| OS-EXT-SRV-ATTR:root_device_name    | /dev/vda                                                                           |
| OS-EXT-SRV-ATTR:user_data           | None                                                                               |
| OS-EXT-STS:power_state              | Running                                                                            |
| OS-EXT-STS:task_state               | None                                                                               |
| OS-EXT-STS:vm_state                 | active                                                                             |
| OS-SRV-USG:launched_at              | 2022-06-20T12:17:54.000000                                                         |
| OS-SRV-USG:terminated_at            | None                                                                               |
| accessIPv4                          |                                                                                    |
| accessIPv6                          |                                                                                    |
| addresses                           | net1=192.168.0.78                                                                  |
| adminPass                           | zX6KahbcC6wf                                                                       |
| config_drive                        |                                                                                    |
| created                             | 2022-06-20T12:17:25Z                                                               |
| description                         | None                                                                               |
| flavor                              | disk='2', ephemeral='0', , original_name='m1.tiny', ram='256', swap='0', vcpus='1' |
| hostId                              | 159afb1af230f986368dea681deecec28457b961fd29a7bc6a988f50                           |
| host_status                         | UP                                                                                 |
| id                                  | 7dc8a912-131d-427e-869f-458d9c547c11                                               |
| image                               |                                                                                    |
| key_name                            | None                                                                               |
| locked                              | False                                                                              |
| locked_reason                       | None                                                                               |
| name                                | vm01                                                                               |
| progress                            | 0                                                                                  |
| project_id                          | 3f6a49bcd3364cca988dd994dccd0887                                                   |
| properties                          |                                                                                    |
| security_groups                     | name='default'                                                                     |
| server_groups                       | ['22fe1c6e-9593-4012-b710-51956b66b104']                                           |
| status                              | ACTIVE                                                                             |
| tags                                | []                                                                                 |
| trusted_image_certificates          | None                                                                               |
| updated                             | 2022-06-20T12:17:54Z                                                               |
| user_id                             | 35596b10983b4b2b80825f491356703a                                                   |
| volumes_attached                    | delete_on_termination='False', id='e3014861-0360-4227-921d-bb40f139aaa6'           |
+-------------------------------------+------------------------------------------------------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server group delete test-group-01

(overcloud) [stack@undercloud-0 ~]$ openstack server show vm01
+-------------------------------------+------------------------------------------------------------------------------------+
| Field                               | Value                                                                              |
+-------------------------------------+------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                             |
| OS-EXT-AZ:availability_zone         | nova                                                                               |
| OS-EXT-SRV-ATTR:host                | compute-1.redhat.local                                                             |
| OS-EXT-SRV-ATTR:hostname            | vm01                                                                               |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local                                                             |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000013                                                                  |
| OS-EXT-SRV-ATTR:kernel_id           |                                                                                    |
| OS-EXT-SRV-ATTR:launch_index        | 0                                                                                  |
| OS-EXT-SRV-ATTR:ramdisk_id          |                                                                                    |
| OS-EXT-SRV-ATTR:reservation_id      | r-ccmxkx2n                                                                         |
| OS-EXT-SRV-ATTR:root_device_name    | /dev/vda                                                                           |
| OS-EXT-SRV-ATTR:user_data           | None                                                                               |
| OS-EXT-STS:power_state              | Running                                                                            |
| OS-EXT-STS:task_state               | None                                                                               |
| OS-EXT-STS:vm_state                 | active                                                                             |
| OS-SRV-USG:launched_at              | 2022-06-20T12:17:54.000000                                                         |
| OS-SRV-USG:terminated_at            | None                                                                               |
| accessIPv4                          |                                                                                    |
| accessIPv6                          |                                                                                    |
| addresses                           | net1=192.168.0.78                                                                  |
| config_drive                        |                                                                                    |
| created                             | 2022-06-20T12:17:25Z                                                               |
| description                         | None                                                                               |
| flavor                              | disk='2', ephemeral='0', , original_name='m1.tiny', ram='256', swap='0', vcpus='1' |
| hostId                              | 159afb1af230f986368dea681deecec28457b961fd29a7bc6a988f50                           |
| host_status                         | UP                                                                                 |
| id                                  | 7dc8a912-131d-427e-869f-458d9c547c11                                               |
| image                               |                                                                                    |
| key_name                            | None                                                                               |
| locked                              | False                                                                              |
| locked_reason                       | None                                                                               |
| name                                | vm01                                                                               |
| progress                            | 0                                                                                  |
| project_id                          | 3f6a49bcd3364cca988dd994dccd0887                                                   |
| properties                          |                                                                                    |
| security_groups                     | name='default'                                                                     |
| server_groups                       | []                                                                                 |
| status                              | ACTIVE                                                                             |
| tags                                | []                                                                                 |
| trusted_image_certificates          | None                                                                               |
| updated                             | 2022-06-20T12:17:54Z                                                               |
| user_id                             | 35596b10983b4b2b80825f491356703a                                                   |
| volumes_attached                    | delete_on_termination='False', id='e3014861-0360-4227-921d-bb40f139aaa6'           |
+-------------------------------------+------------------------------------------------------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.8
Warning: Permanently added '192.168.24.8' (ECDSA) to the list of known hosts.
Last login: Mon Jun 20 12:14:29 2022 from 192.168.24.1
[heat-admin@compute-1 ~]$ sudo reboot
Connection to 192.168.24.8 closed by remote host.
Connection to 192.168.24.8 closed.

(overcloud) [stack@undercloud-0 ~]$ nova host-evacuate compute-1
+--------------------------------------+-------------------+---------------+
| Server UUID                          | Evacuate Accepted | Error Message |
+--------------------------------------+-------------------+---------------+
| 7dc8a912-131d-427e-869f-458d9c547c11 | True              |               |
+--------------------------------------+-------------------+---------------+


(overcloud) [stack@undercloud-0 ~]$ nova migration-list
+----+--------------------------------------+------------------------+------------------------+------------------------+------------------------+-------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+
| Id | UUID                                 | Source Node            | Dest Node              | Source Compute         | Dest Compute           | Dest Host   | Status | Instance UUID                        | Old Flavor | New Flavor | Created At                 | Updated At                 | Type       |
+----+--------------------------------------+------------------------+------------------------+------------------------+------------------------+-------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+
| 13 | 17180ff5-2733-4de9-bdba-4a5b561d475a | compute-1.redhat.local | compute-0.redhat.local | compute-1.redhat.local | compute-0.redhat.local | 172.17.1.56 | failed | 7dc8a912-131d-427e-869f-458d9c547c11 | None       | None       | 2022-06-20T12:21:25.000000 | 2022-06-20T12:21:30.000000 | evacuation |
+----+--------------------------------------+------------------------+------------------------+------------------------+------------------------+-------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server event list vm01
+------------------------------------------+--------------------------------------+----------+----------------------------+
| Request ID                               | Server ID                            | Action   | Start Time                 |
+------------------------------------------+--------------------------------------+----------+----------------------------+
| req-7688c1c8-fdb0-4f78-9339-7b76f888eea4 | 7dc8a912-131d-427e-869f-458d9c547c11 | evacuate | 2022-06-20T12:21:25.000000 |
| req-e8babc01-b06b-496d-81c2-70913f6d579a | 7dc8a912-131d-427e-869f-458d9c547c11 | create   | 2022-06-20T12:17:23.000000 |
+------------------------------------------+--------------------------------------+----------+----------------------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server event show vm01 req-7688c1c8-fdb0-4f78-9339-7b76f888eea4 --fit
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field         | Value                                                                                                                                                                      |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action        | evacuate                                                                                                                                                                   |
| events        | [{'event': 'compute_rebuild_instance', 'start_time': '2022-06-20T12:21:28.000000', 'finish_time': '2022-06-20T12:21:30.000000', 'result': 'Error', 'traceback': '  File    |
|               | "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1372, in decorated_function\n    return function(self, context, *args, **kwargs)\n  File                    |
|               | "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 219, in decorated_function\n    kwargs[\'instance\'], e, sys.exc_info())\n  File                          |
|               | "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n  File "/usr/lib/python3.6/site-                               |
|               | packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n  File "/usr/lib/python3.6/site-packages/six.py", line     |
|               | 693, in reraise\n    raise value\n  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 207, in decorated_function\n    return function(self, context,   |
|               | *args, **kwargs)\n  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 3393, in rebuild_instance\n    migration, request_spec, allocs)\n  File          |
|               | "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 3455, in _do_rebuild_instance_with_claim\n    self._do_rebuild_instance(*args, **kwargs)\n  File          |
|               | "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 3478, in _do_rebuild_instance\n    self._validate_instance_group_policy(context, instance, hints)\n  File |
|               | "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 1642, in _validate_instance_group_policy\n    _do_validation(context, instance, group_hint)\n  File       |
|               | "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner\n    return f(*args, **kwargs)\n  File "/usr/lib/python3.6/site-                      |
|               | packages/nova/compute/manager.py", line 1611, in _do_validation\n    group = objects.InstanceGroup.get_by_hint(context, group_hint)\n  File "/usr/lib/python3.6/site-      |
|               | packages/nova/objects/instance_group.py", line 384, in get_by_hint\n    return cls.get_by_uuid(context, hint)\n  File "/usr/lib/python3.6/site-                            |
|               | packages/oslo_versionedobjects/base.py", line 177, in wrapper\n    args, kwargs)\n  File "/usr/lib/python3.6/site-packages/nova/conductor/rpcapi.py", line 241, in         |
|               | object_class_action_versions\n    args=args, kwargs=kwargs)\n  File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call\n                   |
|               | transport_options=self.transport_options)\n  File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send\n                                     |
|               | transport_options=transport_options)\n  File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 674, in send\n                                 |
|               | transport_options=transport_options)\n  File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 664, in _send\n    raise result\n', 'host':    |
|               | 'compute-0.redhat.local', 'hostId': 'bb22b6f39cbdb7d8b08472fd93624d1c097296188a531da1b171be10'}, {'event': 'rebuild_server', 'start_time': '2022-06-20T12:21:25.000000',   |
|               | 'finish_time': '2022-06-20T12:21:28.000000', 'result': 'Success', 'traceback': None, 'host': 'controller-1.redhat.local', 'hostId':                                        |
|               | '76a7267ae706a859241caa653440761cbac48d6e2f9e78f42ffca09f'}]                                                                                                               |
| instance_uuid | 7dc8a912-131d-427e-869f-458d9c547c11                                                                                                                                       |
| message       | Error                                                                                                                                                                      |
| project_id    | 3f6a49bcd3364cca988dd994dccd0887                                                                                                                                           |
| request_id    | req-7688c1c8-fdb0-4f78-9339-7b76f888eea4                                                                                                                                   |
| start_time    | 2022-06-20T12:21:25.000000                                                                                                                                                 |
| updated_at    | 2022-06-20T12:21:30.000000                                                                                                                                                 |
| user_id       | 35596b10983b4b2b80825f491356703a                                                                                                                                           |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server show vm01
+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------+
| Field                               | Value                                                                                                                                  |
+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                                                                                 |
| OS-EXT-AZ:availability_zone         | nova                                                                                                                                   |
| OS-EXT-SRV-ATTR:host                | compute-1.redhat.local                                                                                                                 |
| OS-EXT-SRV-ATTR:hostname            | vm01                                                                                                                                   |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local                                                                                                                 |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000013                                                                                                                      |
| OS-EXT-SRV-ATTR:kernel_id           |                                                                                                                                        |
| OS-EXT-SRV-ATTR:launch_index        | 0                                                                                                                                      |
| OS-EXT-SRV-ATTR:ramdisk_id          |                                                                                                                                        |
| OS-EXT-SRV-ATTR:reservation_id      | r-ccmxkx2n                                                                                                                             |
| OS-EXT-SRV-ATTR:root_device_name    | /dev/vda                                                                                                                               |
| OS-EXT-SRV-ATTR:user_data           | None                                                                                                                                   |
| OS-EXT-STS:power_state              | Shutdown                                                                                                                               |
| OS-EXT-STS:task_state               | None                                                                                                                                   |
| OS-EXT-STS:vm_state                 | error                                                                                                                                  |
| OS-SRV-USG:launched_at              | 2022-06-20T12:17:54.000000                                                                                                             |
| OS-SRV-USG:terminated_at            | None                                                                                                                                   |
| accessIPv4                          |                                                                                                                                        |
| accessIPv6                          |                                                                                                                                        |
| addresses                           | net1=192.168.0.78                                                                                                                      |
| config_drive                        |                                                                                                                                        |
| created                             | 2022-06-20T12:17:25Z                                                                                                                   |
| description                         | None                                                                                                                                   |
| fault                               | {'code': 404, 'created': '2022-06-20T12:21:30Z', 'message': 'Instance group 22fe1c6e-9593-4012-b710-51956b66b104 could not be found.'} |
| flavor                              | disk='2', ephemeral='0', , original_name='m1.tiny', ram='256', swap='0', vcpus='1'                                                     |
| hostId                              | 159afb1af230f986368dea681deecec28457b961fd29a7bc6a988f50                                                                               |
| host_status                         | UP                                                                                                                                     |
| id                                  | 7dc8a912-131d-427e-869f-458d9c547c11                                                                                                   |
| image                               |                                                                                                                                        |
| key_name                            | None                                                                                                                                   |
| locked                              | False                                                                                                                                  |
| locked_reason                       | None                                                                                                                                   |
| name                                | vm01                                                                                                                                   |
| project_id                          | 3f6a49bcd3364cca988dd994dccd0887                                                                                                       |
| properties                          |                                                                                                                                        |
| security_groups                     | name='default'                                                                                                                         |
| server_groups                       | []                                                                                                                                     |
| status                              | ERROR                                                                                                                                  |
| tags                                | []                                                                                                                                     |
| trusted_image_certificates          | None                                                                                                                                   |
| updated                             | 2022-06-20T12:21:40Z                                                                                                                   |
| user_id                             | 35596b10983b4b2b80825f491356703a                                                                                                       |
| volumes_attached                    | delete_on_termination='False', id='e3014861-0360-4227-921d-bb40f139aaa6'                                                               |
+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------+

Comment 2 Sylvain Bauza 2022-06-21 09:19:03 UTC
This sounds a legit bug even in master. Due to late anti-affinity check we have in the compute service (for making sure concurrent instance requests are not using the same compute if they have a anti-affinity policy for their group), we look at the RequestSpec record to know whether the instance was asked to be created with a group hint. If we find it, then we look at the related InstanceGroup record (by their UUID) to know the group policy.

Given we verify that for every instance creation or move operation (shelve, migrate, evacuate, etc.), we could indeed have a problem if the group was deleted before moving the instance.

https://github.com/openstack/nova/blob/ebe08834f311e8e22bfd9685d7e6e91dab967382/nova/compute/manager.py#L3650-L3657
https://github.com/openstack/nova/blob/ebe08834f311e8e22bfd9685d7e6e91dab967382/nova/compute/manager.py#L1733

I eventually found an upstream bug report that was created but was expired given of a timeout.
https://bugs.launchpad.net/nova/+bug/1890244

I'll reopen the Nova bug and I think this BZ is definitely OK but I'll ask our team if someone wants to work on it.

Comment 3 smooney 2022-06-21 11:37:59 UTC
while this is being resolved on master and is backported the simplest workaround is to disabel the group policy validation upcall

https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.disable_group_policy_check_upcall

by setting 

[workarounds]
disable_group_policy_check_upcall=true

in the nova.conf

that will disable the failing code and allow the evacuation to proceed.