Bug 1759545 - Build of instance 693fed91-19ec-4d5e-a64e-c742af27766b was re-scheduled: Anti-affinity instance group policy was violated
Summary: Build of instance 693fed91-19ec-4d5e-a64e-c742af27766b was re-scheduled: Anti...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: smooney
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-08 13:19 UTC by Jeremy
Modified: 2023-03-21 19:24 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-18 09:39:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-23492 0 None None None 2023-03-21 19:24:29 UTC

Comment 8 smooney 2019-10-18 09:39:23 UTC
sorry for the delay in writing this up.
We discussed this at some length last week and i concur with matts assessment.
the behavior in nova is working as we expect so this is not a bug.
there are 2 ways forward that i can see.

1.) nova has a max_attempts config option that controls the number of hosts
which will be tried by each vm 
https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.max_attempts
max attempst default to 3 which is why in matt's assessment of the instance spawn we saw 2 retries
due to the anti afintiy policy and why on the third failed attempt the instance went to error and
the stack was reverted. this value could be increased to 10 or a higher value to decrease the risk of
exhausting host before placing all vms. nova also two other options host_subset_size and shuffle_best_same_weighed_hosts
https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.host_subset_size
https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.shuffle_best_same_weighed_hosts
in combination with setting the max_attempts=10 if you set subset_size=15 shuffle_best_same_weighed_hosts=true
this will significatlly reduce the likely hood of there being a conflict. shuffle_best_same_weighed_hosts allows
nova to randomise the top N possible hosts retruned by the scheduler where N is set by host_subset_size.
Adding randomness to the schduler will tend to spread vm aross more hosts and as a result reduce the likely hood of a collision

2.) heat could be modified to create vms serials waiting for each vm to be active before launching the next.
By reating each vm as a single request in parallel heat is creating the worst case scenario with the hight
proably of conflict. in nova an instnace is not recorded as assigned to a host until it is running on the host.
as this takes time there is a window between the schduelr filtering the hosts to select a target and teh vm
strating on the target host. As a result wehn heat tries to launch the second instance without waiting for the
first to be active it does not allow the nova data base to be updated with the location of the already running vms
in the server group so the anti afinity filter cannot work correctly.
its possibel that instead of useing a resouce group
https://docs.openstack.org/heat/rocky/template_guide/openstack.html#OS::Heat::ResourceGroup
an auto scaling group https://docs.openstack.org/heat/rocky/template_guide/openstack.html#OS::Heat::AutoScalingGroup
may instead allow you to alter the way heat creates vms using the batch size option to serialize the vm creation.

Give that nova is working as expected i am closing this as not a bug form a nova perspective but it could be considered
a heat bug or RFE. as such this could be reopend against the heat component but this is not soemthign we can fix in nova
in a backportably way. we have discussed using dynamic aggregates in placement to track anti affinity groups in the past but
if we were to pursue that in the future it would not be possible to back port it to address this issue and the idea has previously
suggested and discouraged.


Note You need to log in before you can comment on or make changes to this bug.