Hide Forgot
Description of problem: resource_tracker doesn't cope well with VMs using boot from volumes hosted on NFS backend. This is easy to reproduce as the resource_tracker doesn't validate the location of root_gb and ephemeral_gb of a given instance and by commenting out the following 2 lines in resource_tracker.py returns proper values to the nova-api. self.compute_node.memory_mb_used += sign * mem_usage # self.compute_node.local_gb_used += sign * usage.get('root_gb', 0) # self.compute_node.local_gb_used += sign * usage.get('ephemeral_gb', 0) This hack is ugly as anyone NOT using boot_from_volume could easily fill in $instance_path if that path is different from $nfs_mount_point_base . # Top-level directory for maintaining nova's state (string value) #state_path=/var/lib/nova state_path=/var/lib/nova # Where instances are stored on disk (string value) #instances_path=$state_path/instances # Directory where the NFS volume is mounted on the compute node (string value) #nfs_mount_point_base=$state_path/mnt Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Deploy an overcloud with a valid storage-environment.yaml using NFS backend 2. Deploy many VMs with boot_from_volumes on the overcloud until you reach a full / but still have lots of free gigs on the NFS share 3. Scheduler will refuse to spawn new VMs Actual results: Fails Expected results: This shouldn't fail Additional info:
Ugly patch to fix this: --- resource_tracker.py.orig 2016-10-20 21:24:29.892850673 +0000 +++ resource_tracker.py 2016-10-20 21:47:35.282942549 +0000 @@ -714,8 +714,8 @@ mem_usage += overhead['memory_mb'] self.compute_node.memory_mb_used += sign * mem_usage - self.compute_node.local_gb_used += sign * usage.get('root_gb', 0) - self.compute_node.local_gb_used += sign * usage.get('ephemeral_gb', 0) +# self.compute_node.local_gb_used += sign * usage.get('root_gb', 0) +# self.compute_node.local_gb_used += sign * usage.get('ephemeral_gb', 0) self.compute_node.vcpus_used += sign * usage.get('vcpus', 0) # free ram and disk may be negative, depending on policy:
That's unfortunately a very well known problem that comes from an initial design issue. We generally count our resources per compute node since the early beginning of Nova. Unfortunately, when we accepted (for good reasons) to have shared space for booting instances, that resource usage per compute became unreasonable. Fixing that is an upstream Nova priority and we hopefully will see some solution implemented between Ocata and Pike (OSP11 and OSP12), but it will be a full dealbreaker with a lot of design changes and a new REST API called Placement API. Consequently, it's hardly assumable to backport any of that to OSP9. That said, there are a couple of known workarounds that help reducing the problem : - operators can create dedicated flavors for boot-from-volume instances with a root and ephemeral size of 0 - or, if you only support BFV instances, just disable the DiskFilter I assume this is not a perfect solution and that the resolution will be somehow mid-term, but the above are the current workarounds for most of the operators. An upstream bug partially describes the problem : https://bugs.launchpad.net/nova/+bug/1469179
WONTFIX/NOTABUG therefore QE Won't automate
We've revisited this, and the fundamental issue here is that cinder and nova are sharing a storage pool. Unfortunately this isn't something we can support. Cinder and nova must use separate storage. If they're both using NFS on the same array, they need to use separate exports from separate filesystems.