Bug 2099279
| Summary: | instances fails evacuation if group instance where they belongs to get removed | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | alisci <alisci> |
| Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> |
| Status: | CLOSED MIGRATED | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.2 (Train) | CC: | alifshit, dasmith, eglynn, jhakimra, kchamart, riramos, sbauza, sgordon, smooney, vromanso |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-01-10 20:38:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
here the reproduction on a OSP 16.2.0 lab
(overcloud) [stack@undercloud-0 ~]$ openstack server group create --policy affinity test-group-01
+------------+--------------------------------------+
| Field | Value |
+------------+--------------------------------------+
| id | 22fe1c6e-9593-4012-b710-51956b66b104 |
| members | |
| name | test-group-01 |
| policy | affinity |
| project_id | 3f6a49bcd3364cca988dd994dccd0887 |
| rules | |
| user_id | 35596b10983b4b2b80825f491356703a |
+------------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack server create --volume cirros-volume --flavor m1.tiny vm01 --wait --network net1 --hint group=22fe1c6e-9593-4012-b710-51956b66b104
+-------------------------------------+------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------------------+------------------------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | compute-1.redhat.local |
| OS-EXT-SRV-ATTR:hostname | vm01 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local |
| OS-EXT-SRV-ATTR:instance_name | instance-00000013 |
| OS-EXT-SRV-ATTR:kernel_id | |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | |
| OS-EXT-SRV-ATTR:reservation_id | r-ccmxkx2n |
| OS-EXT-SRV-ATTR:root_device_name | /dev/vda |
| OS-EXT-SRV-ATTR:user_data | None |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2022-06-20T12:17:54.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | net1=192.168.0.78 |
| adminPass | zX6KahbcC6wf |
| config_drive | |
| created | 2022-06-20T12:17:25Z |
| description | None |
| flavor | disk='2', ephemeral='0', , original_name='m1.tiny', ram='256', swap='0', vcpus='1' |
| hostId | 159afb1af230f986368dea681deecec28457b961fd29a7bc6a988f50 |
| host_status | UP |
| id | 7dc8a912-131d-427e-869f-458d9c547c11 |
| image | |
| key_name | None |
| locked | False |
| locked_reason | None |
| name | vm01 |
| progress | 0 |
| project_id | 3f6a49bcd3364cca988dd994dccd0887 |
| properties | |
| security_groups | name='default' |
| server_groups | ['22fe1c6e-9593-4012-b710-51956b66b104'] |
| status | ACTIVE |
| tags | [] |
| trusted_image_certificates | None |
| updated | 2022-06-20T12:17:54Z |
| user_id | 35596b10983b4b2b80825f491356703a |
| volumes_attached | delete_on_termination='False', id='e3014861-0360-4227-921d-bb40f139aaa6' |
+-------------------------------------+------------------------------------------------------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack server group delete test-group-01
(overcloud) [stack@undercloud-0 ~]$ openstack server show vm01
+-------------------------------------+------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------------------+------------------------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | compute-1.redhat.local |
| OS-EXT-SRV-ATTR:hostname | vm01 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local |
| OS-EXT-SRV-ATTR:instance_name | instance-00000013 |
| OS-EXT-SRV-ATTR:kernel_id | |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | |
| OS-EXT-SRV-ATTR:reservation_id | r-ccmxkx2n |
| OS-EXT-SRV-ATTR:root_device_name | /dev/vda |
| OS-EXT-SRV-ATTR:user_data | None |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2022-06-20T12:17:54.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | net1=192.168.0.78 |
| config_drive | |
| created | 2022-06-20T12:17:25Z |
| description | None |
| flavor | disk='2', ephemeral='0', , original_name='m1.tiny', ram='256', swap='0', vcpus='1' |
| hostId | 159afb1af230f986368dea681deecec28457b961fd29a7bc6a988f50 |
| host_status | UP |
| id | 7dc8a912-131d-427e-869f-458d9c547c11 |
| image | |
| key_name | None |
| locked | False |
| locked_reason | None |
| name | vm01 |
| progress | 0 |
| project_id | 3f6a49bcd3364cca988dd994dccd0887 |
| properties | |
| security_groups | name='default' |
| server_groups | [] |
| status | ACTIVE |
| tags | [] |
| trusted_image_certificates | None |
| updated | 2022-06-20T12:17:54Z |
| user_id | 35596b10983b4b2b80825f491356703a |
| volumes_attached | delete_on_termination='False', id='e3014861-0360-4227-921d-bb40f139aaa6' |
+-------------------------------------+------------------------------------------------------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.8
Warning: Permanently added '192.168.24.8' (ECDSA) to the list of known hosts.
Last login: Mon Jun 20 12:14:29 2022 from 192.168.24.1
[heat-admin@compute-1 ~]$ sudo reboot
Connection to 192.168.24.8 closed by remote host.
Connection to 192.168.24.8 closed.
(overcloud) [stack@undercloud-0 ~]$ nova host-evacuate compute-1
+--------------------------------------+-------------------+---------------+
| Server UUID | Evacuate Accepted | Error Message |
+--------------------------------------+-------------------+---------------+
| 7dc8a912-131d-427e-869f-458d9c547c11 | True | |
+--------------------------------------+-------------------+---------------+
(overcloud) [stack@undercloud-0 ~]$ nova migration-list
+----+--------------------------------------+------------------------+------------------------+------------------------+------------------------+-------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+
| Id | UUID | Source Node | Dest Node | Source Compute | Dest Compute | Dest Host | Status | Instance UUID | Old Flavor | New Flavor | Created At | Updated At | Type |
+----+--------------------------------------+------------------------+------------------------+------------------------+------------------------+-------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+
| 13 | 17180ff5-2733-4de9-bdba-4a5b561d475a | compute-1.redhat.local | compute-0.redhat.local | compute-1.redhat.local | compute-0.redhat.local | 172.17.1.56 | failed | 7dc8a912-131d-427e-869f-458d9c547c11 | None | None | 2022-06-20T12:21:25.000000 | 2022-06-20T12:21:30.000000 | evacuation |
+----+--------------------------------------+------------------------+------------------------+------------------------+------------------------+-------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+
(overcloud) [stack@undercloud-0 ~]$ openstack server event list vm01
+------------------------------------------+--------------------------------------+----------+----------------------------+
| Request ID | Server ID | Action | Start Time |
+------------------------------------------+--------------------------------------+----------+----------------------------+
| req-7688c1c8-fdb0-4f78-9339-7b76f888eea4 | 7dc8a912-131d-427e-869f-458d9c547c11 | evacuate | 2022-06-20T12:21:25.000000 |
| req-e8babc01-b06b-496d-81c2-70913f6d579a | 7dc8a912-131d-427e-869f-458d9c547c11 | create | 2022-06-20T12:17:23.000000 |
+------------------------------------------+--------------------------------------+----------+----------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack server event show vm01 req-7688c1c8-fdb0-4f78-9339-7b76f888eea4 --fit
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action | evacuate |
| events | [{'event': 'compute_rebuild_instance', 'start_time': '2022-06-20T12:21:28.000000', 'finish_time': '2022-06-20T12:21:30.000000', 'result': 'Error', 'traceback': ' File |
| | "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1372, in decorated_function\n return function(self, context, *args, **kwargs)\n File |
| | "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 219, in decorated_function\n kwargs[\'instance\'], e, sys.exc_info())\n File |
| | "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n File "/usr/lib/python3.6/site- |
| | packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n File "/usr/lib/python3.6/site-packages/six.py", line |
| | 693, in reraise\n raise value\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 207, in decorated_function\n return function(self, context, |
| | *args, **kwargs)\n File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 3393, in rebuild_instance\n migration, request_spec, allocs)\n File |
| | "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 3455, in _do_rebuild_instance_with_claim\n self._do_rebuild_instance(*args, **kwargs)\n File |
| | "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 3478, in _do_rebuild_instance\n self._validate_instance_group_policy(context, instance, hints)\n File |
| | "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 1642, in _validate_instance_group_policy\n _do_validation(context, instance, group_hint)\n File |
| | "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner\n return f(*args, **kwargs)\n File "/usr/lib/python3.6/site- |
| | packages/nova/compute/manager.py", line 1611, in _do_validation\n group = objects.InstanceGroup.get_by_hint(context, group_hint)\n File "/usr/lib/python3.6/site- |
| | packages/nova/objects/instance_group.py", line 384, in get_by_hint\n return cls.get_by_uuid(context, hint)\n File "/usr/lib/python3.6/site- |
| | packages/oslo_versionedobjects/base.py", line 177, in wrapper\n args, kwargs)\n File "/usr/lib/python3.6/site-packages/nova/conductor/rpcapi.py", line 241, in |
| | object_class_action_versions\n args=args, kwargs=kwargs)\n File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call\n |
| | transport_options=self.transport_options)\n File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send\n |
| | transport_options=transport_options)\n File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 674, in send\n |
| | transport_options=transport_options)\n File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 664, in _send\n raise result\n', 'host': |
| | 'compute-0.redhat.local', 'hostId': 'bb22b6f39cbdb7d8b08472fd93624d1c097296188a531da1b171be10'}, {'event': 'rebuild_server', 'start_time': '2022-06-20T12:21:25.000000', |
| | 'finish_time': '2022-06-20T12:21:28.000000', 'result': 'Success', 'traceback': None, 'host': 'controller-1.redhat.local', 'hostId': |
| | '76a7267ae706a859241caa653440761cbac48d6e2f9e78f42ffca09f'}] |
| instance_uuid | 7dc8a912-131d-427e-869f-458d9c547c11 |
| message | Error |
| project_id | 3f6a49bcd3364cca988dd994dccd0887 |
| request_id | req-7688c1c8-fdb0-4f78-9339-7b76f888eea4 |
| start_time | 2022-06-20T12:21:25.000000 |
| updated_at | 2022-06-20T12:21:30.000000 |
| user_id | 35596b10983b4b2b80825f491356703a |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ openstack server show vm01
+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | compute-1.redhat.local |
| OS-EXT-SRV-ATTR:hostname | vm01 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local |
| OS-EXT-SRV-ATTR:instance_name | instance-00000013 |
| OS-EXT-SRV-ATTR:kernel_id | |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | |
| OS-EXT-SRV-ATTR:reservation_id | r-ccmxkx2n |
| OS-EXT-SRV-ATTR:root_device_name | /dev/vda |
| OS-EXT-SRV-ATTR:user_data | None |
| OS-EXT-STS:power_state | Shutdown |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | error |
| OS-SRV-USG:launched_at | 2022-06-20T12:17:54.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | net1=192.168.0.78 |
| config_drive | |
| created | 2022-06-20T12:17:25Z |
| description | None |
| fault | {'code': 404, 'created': '2022-06-20T12:21:30Z', 'message': 'Instance group 22fe1c6e-9593-4012-b710-51956b66b104 could not be found.'} |
| flavor | disk='2', ephemeral='0', , original_name='m1.tiny', ram='256', swap='0', vcpus='1' |
| hostId | 159afb1af230f986368dea681deecec28457b961fd29a7bc6a988f50 |
| host_status | UP |
| id | 7dc8a912-131d-427e-869f-458d9c547c11 |
| image | |
| key_name | None |
| locked | False |
| locked_reason | None |
| name | vm01 |
| project_id | 3f6a49bcd3364cca988dd994dccd0887 |
| properties | |
| security_groups | name='default' |
| server_groups | [] |
| status | ERROR |
| tags | [] |
| trusted_image_certificates | None |
| updated | 2022-06-20T12:21:40Z |
| user_id | 35596b10983b4b2b80825f491356703a |
| volumes_attached | delete_on_termination='False', id='e3014861-0360-4227-921d-bb40f139aaa6' |
+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------+
This sounds a legit bug even in master. Due to late anti-affinity check we have in the compute service (for making sure concurrent instance requests are not using the same compute if they have a anti-affinity policy for their group), we look at the RequestSpec record to know whether the instance was asked to be created with a group hint. If we find it, then we look at the related InstanceGroup record (by their UUID) to know the group policy. Given we verify that for every instance creation or move operation (shelve, migrate, evacuate, etc.), we could indeed have a problem if the group was deleted before moving the instance. https://github.com/openstack/nova/blob/ebe08834f311e8e22bfd9685d7e6e91dab967382/nova/compute/manager.py#L3650-L3657 https://github.com/openstack/nova/blob/ebe08834f311e8e22bfd9685d7e6e91dab967382/nova/compute/manager.py#L1733 I eventually found an upstream bug report that was created but was expired given of a timeout. https://bugs.launchpad.net/nova/+bug/1890244 I'll reopen the Nova bug and I think this BZ is definitely OK but I'll ask our team if someone wants to work on it. while this is being resolved on master and is backported the simplest workaround is to disabel the group policy validation upcall https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.disable_group_policy_check_upcall by setting [workarounds] disable_group_policy_check_upcall=true in the nova.conf that will disable the failing code and allow the evacuation to proceed. |
Description of problem: host evacuation fails for instances whom group instance get deleted openstack server show reports for the error {'code': 404, 'created': '2022-06-20T12:21:30Z', 'message': 'Instance group <instance group uuid> could not be found.'} Version-Release number of selected component (if applicable): OSP 16.2.0 OSP 16.2.1 How reproducible: always Steps to Reproduce: - create an instance group - create a vm with the previously instance group created - delete the instance group previously created - shutdown/reboot the compute where the instance it is running on - evacuate the host with : nova host-evacuate <compute where Vm created belongs to> Actual results: Expected results: instance get evacuated Additional info: