1461889 – Nova compute resource_tracker incorrectly sets overcommitted nodes free_mem to 0 affecting nova-scheduler's ram_filter

Bug 1461889 - Nova compute resource_tracker incorrectly sets overcommitted nodes free_mem to 0 affecting nova-scheduler's ram_filter

Summary: Nova compute resource_tracker incorrectly sets overcommitted nodes free_mem t...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	9.0 (Mitaka)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	OSP DFG:Compute
QA Contact:	OSP DFG:Compute
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-15 14:18 UTC by jliberma@redhat.com
Modified:	2023-03-21 18:43 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-08-22 07:49:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1698383	None	None	None	2017-06-16 13:53:16 UTC
OpenStack gerrit	474994	None	MERGED	Fix regression preventing reporting negative resources for overcommit	2020-11-10 10:47:39 UTC
Red Hat Issue Tracker	OSP-4644	None	None	None	2022-08-16 12:56:20 UTC

Description jliberma@redhat.com 2017-06-15 14:18:35 UTC

In the following commit:

https://github.com/openstack/nova/commit/016b810f675b20e8ce78f4c82dc9c679c0162b7a

The following lines were added to nova/compute/resource_tracker.py:

+        self.compute_node.free_ram_mb = max(0, self.compute_node.free_ram_mb)

+        self.compute_node.free_disk_gb = max(0, self.compute_node.free_disk_gb)

 
The result of this change is that if self.compute_node.free_ram_mb or self.compute_none.free_disk_gb are negative numbers, they are set to 0 however, as specified earlier in the code:

https://github.com/openstack/nova/blob/016b810f675b20e8ce78f4c82dc9c679c0162b7a/nova/compute/resource_tracker.py

# free ram and disk may be negative, depending on policy:

The value of free_ram_mb or free_disk_gb being negative is valid when a compute node is overcommitted, and the negative value is used in the computation of used memory for the purpose of resource allocation in nova-scheduler’s ram_filter:

https://github.com/openstack/nova/blob/master/nova/scheduler/filters/ram_filter.py

memory_mb_limit = total_usable_ram_mb * ram_allocation_ratio

        used_ram_mb = total_usable_ram_mb - free_ram_mb

        usable_ram = memory_mb_limit - used_ram_mb

        if not usable_ram >= requested_ram:

            LOG.debug("%(host_state)s does not have %(requested_ram)s MB "

                    "usable ram, it only has %(usable_ram)s MB usable ram.",

                    {'host_state': host_state,

                     'requested_ram': requested_ram,

                     'usable_ram': usable_ram})

            return False

 

The artificial resetting of the negative number to 0 causes the scheduler to believe that a node may be valid for scheduling, however, when the instance attempts to launch on the compute node, the compute node correctly identifies that the instance would put the node above the overcommit ratio, and refuses to launch.


Looking at the original commit that these lines were included in, it would appear that the actual problem being addressed was fixed in nova/cells/state.py and were mostly related to ironic nodes, where overcommit does not come into play.

From the commit notes and comments alone, I cannot tell why these lines were added to resource_tracker.py other than for cosmetic reasons to clean up unexpected negative numbers.

In our case, our “fix” for this has been to comment the two lines in resource_tracker.py, but somebody more familiar with the original patch should probably determine if that’s the correct fix to push upstream.

Comment 1 Dan Smith 2017-06-19 17:58:52 UTC

This is on its way back to upstream Mitaka:

https://review.openstack.org/#/q/I25ba6f7f4e4fab6db223368427d889d6b06a77e8,n,z

Comment 3 Lee Yarwood 2019-08-22 07:49:26 UTC

Closing as CURRENTRELEASE as this has been fixed in OSP 10 via https://review.opendev.org/#/q/I25ba6f7f4e4fab6db223368427d889d6b06a77e8,n,z and OSP 9 is now EOL.

Note You need to log in before you can comment on or make changes to this bug.