Bug 1461889 - Nova compute resource_tracker incorrectly sets overcommitted nodes free_mem to 0 affecting nova-scheduler's ram_filter
Summary: Nova compute resource_tracker incorrectly sets overcommitted nodes free_mem t...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-15 14:18 UTC by jliberma@redhat.com
Modified: 2023-03-21 18:43 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-22 07:49:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1698383 0 None None None 2017-06-16 13:53:16 UTC
OpenStack gerrit 474994 0 None MERGED Fix regression preventing reporting negative resources for overcommit 2020-11-10 10:47:39 UTC
Red Hat Issue Tracker OSP-4644 0 None None None 2022-08-16 12:56:20 UTC

Description jliberma@redhat.com 2017-06-15 14:18:35 UTC
In the following commit:

https://github.com/openstack/nova/commit/016b810f675b20e8ce78f4c82dc9c679c0162b7a

The following lines were added to nova/compute/resource_tracker.py:

+        self.compute_node.free_ram_mb = max(0, self.compute_node.free_ram_mb)

+        self.compute_node.free_disk_gb = max(0, self.compute_node.free_disk_gb)

 
The result of this change is that if self.compute_node.free_ram_mb or self.compute_none.free_disk_gb are negative numbers, they are set to 0 however, as specified earlier in the code:

https://github.com/openstack/nova/blob/016b810f675b20e8ce78f4c82dc9c679c0162b7a/nova/compute/resource_tracker.py

# free ram and disk may be negative, depending on policy:

The value of free_ram_mb or free_disk_gb being negative is valid when a compute node is overcommitted, and the negative value is used in the computation of used memory for the purpose of resource allocation in nova-scheduler’s ram_filter:

https://github.com/openstack/nova/blob/master/nova/scheduler/filters/ram_filter.py

memory_mb_limit = total_usable_ram_mb * ram_allocation_ratio

        used_ram_mb = total_usable_ram_mb - free_ram_mb

        usable_ram = memory_mb_limit - used_ram_mb

        if not usable_ram >= requested_ram:

            LOG.debug("%(host_state)s does not have %(requested_ram)s MB "

                    "usable ram, it only has %(usable_ram)s MB usable ram.",

                    {'host_state': host_state,

                     'requested_ram': requested_ram,

                     'usable_ram': usable_ram})

            return False

 

The artificial resetting of the negative number to 0 causes the scheduler to believe that a node may be valid for scheduling, however, when the instance attempts to launch on the compute node, the compute node correctly identifies that the instance would put the node above the overcommit ratio, and refuses to launch.


Looking at the original commit that these lines were included in, it would appear that the actual problem being addressed was fixed in nova/cells/state.py and were mostly related to ironic nodes, where overcommit does not come into play.

From the commit notes and comments alone, I cannot tell why these lines were added to resource_tracker.py other than for cosmetic reasons to clean up unexpected negative numbers.

In our case, our “fix” for this has been to comment the two lines in resource_tracker.py, but somebody more familiar with the original patch should probably determine if that’s the correct fix to push upstream.

Comment 1 Dan Smith 2017-06-19 17:58:52 UTC
This is on its way back to upstream Mitaka:

https://review.openstack.org/#/q/I25ba6f7f4e4fab6db223368427d889d6b06a77e8,n,z

Comment 3 Lee Yarwood 2019-08-22 07:49:26 UTC
Closing as CURRENTRELEASE as this has been fixed in OSP 10 via https://review.opendev.org/#/q/I25ba6f7f4e4fab6db223368427d889d6b06a77e8,n,z and OSP 9 is now EOL.


Note You need to log in before you can comment on or make changes to this bug.