Description of problem: with 'preallocate_images = space' in nova.conf the disk_over_committed calculation is wrong because in the end the size of the ephemeral disk is deducted twice when checking the available resources. As a result available_least which is used for the disk filter shows no available disk, even if there is. E.g. creating a single instance with 20GB ephemeral disk on a compute with ~33GB available: * prealloc=space 2018-03-08 11:46:54.444 12793 DEBUG nova.virt.libvirt.driver [req-e41619b7-f10d-407f-bc71-95d7e4fe610f - - - - -] mschuppert - disk_free_gb 13 get_available_resource /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5503 2018-03-08 11:46:54.445 12793 DEBUG nova.virt.libvirt.driver [req-e41619b7-f10d-407f-bc71-95d7e4fe610f - - - - -] mschuppert - _get_disk_over_committed_size_total _get_disk_over_committed_size_total /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:7058 2018-03-08 11:46:54.449 12793 DEBUG nova.virt.libvirt.driver [req-e41619b7-f10d-407f-bc71-95d7e4fe610f - - - - -] mschuppert instance_domains [<libvirt.virDomain object at 0x690aa50>] _get_disk_over_committed_size_total /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:7063 ... 2018-03-08 11:46:54.674 12793 DEBUG nova.virt.libvirt.driver [req-e41619b7-f10d-407f-bc71-95d7e4fe610f - - - - -] mschuppert - DISK info 21474573824 _get_disk_over_committed_size_total /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:7111 2018-03-08 11:46:54.674 12793 DEBUG nova.virt.libvirt.driver [req-e41619b7-f10d-407f-bc71-95d7e4fe610f - - - - -] mschuppert - disk_over_committed out 21474573824 get_available_resource /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5505 2018-03-08 11:46:54.675 12793 DEBUG nova.virt.libvirt.driver [req-e41619b7-f10d-407f-bc71-95d7e4fe610f - - - - -] mschuppert - available_least -7515930112 get_available_resource /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5507 * prealloc=none 2018-03-08 11:48:35.348 13987 DEBUG nova.virt.libvirt.driver [req-fff3e2a9-c70b-4e62-8f33-c761f3337930 - - - - -] mschuppert - disk_free_gb 33 get_available_resource /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5503 2018-03-08 11:48:35.349 13987 DEBUG nova.virt.libvirt.driver [req-fff3e2a9-c70b-4e62-8f33-c761f3337930 - - - - -] mschuppert - _get_disk_over_committed_size_total _get_disk_over_committed_size_total /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:7058 2018-03-08 11:48:35.350 13987 DEBUG nova.virt.libvirt.driver [req-fff3e2a9-c70b-4e62-8f33-c761f3337930 - - - - -] mschuppert instance_domains [<libvirt.virDomain object at 0x8030450>] _get_disk_over_committed_size_total /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:7063 ... 2018-03-08 11:48:35.581 13987 DEBUG nova.virt.libvirt.driver [req-fff3e2a9-c70b-4e62-8f33-c761f3337930 - - - - -] mschuppert - DISK info 21474573824 _get_disk_over_committed_size_total /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:7111 2018-03-08 11:48:35.582 13987 DEBUG nova.virt.libvirt.driver [req-fff3e2a9-c70b-4e62-8f33-c761f3337930 - - - - -] mschuppert - disk_over_committed out 21474573824 get_available_resource /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5505 2018-03-08 11:48:35.582 13987 DEBUG nova.virt.libvirt.driver [req-fff3e2a9-c70b-4e62-8f33-c761f3337930 - - - - -] mschuppert - available_least 13958906368 get_available_resource /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5507 Version-Release number of selected component (if applicable): OSP8, but it is same in latest master at: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7856 How reproducible: always Steps to Reproduce: 1. set `preallocate_images = space` on compute 2. check disk_available_least stats 3. create instance 4. verify disk_available_least again Actual results: disk_available_least is not reduced by the amount of max preallocate ephemeral disk. It is reduced twice the size Expected results: disk_available_least is only reduced by the fill size of the ephemeral disk. Additional info: 5463 def get_available_resource(self, nodename): 5464 """Retrieve resource information. 5465 5466 This method is called when nova-compute launches, and 5467 as part of a periodic task that records the results in the DB. 5468 5469 :param nodename: unused in this driver 5470 :returns: dictionary containing resource info 5471 """ 5472 5473 disk_info_dict = self._get_local_gb_info() <----- here we get the local available disk space AFTER the qcow file got created so "free from before instance start - full image size", in my example 33 - 20 = 13 5474 data = {} ... 5502 disk_free_gb = disk_info_dict['free'] <---- this is our remaining 13GB free 5503 disk_over_committed = self._get_disk_over_committed_size_total() <---- returns 20GB even if overcommit should be 0 for pre-allocated image because disk_over_committed should be the sum of not yet allocated space of all instance disks configured/requested. 5504 available_least = disk_free_gb * units.Gi - disk_over_committed <---- with this bug we reduce here again the disk_free_gb by the amount of disk_over_committed 5505 data['disk_available_least'] = available_least / units.Gi Here is the issue - virt/libvirt/driver.py in _get_instance_disk_info: 6935 def _get_instance_disk_info(self, instance_name, xml, 6936 block_device_info=None): ... 7008 else: 7009 dk_size = int(os.path.getsize(path)) <----- 'preallocated = space' uses falloc which is not shown in getsize(). we do not use the info from qemu-img info on the actual size. we check the disk size on the device, which is not 20GB 7011 elif disk_type == 'block' and block_device_info: 7012 dk_size = lvm.get_volume_size(path) 7013 else: 7014 LOG.debug('skipping disk %(path)s (%(target)s) - unable to ' 7015 'determine if volume', 7016 {'path': path, 'target': target}) 7017 continue 7018 7019 disk_type = driver_nodes[cnt].get('type') 7020 7021 if disk_type in ("qcow2", "ploop"): 7022 backing_file = libvirt_utils.get_disk_backing_file(path) 7023 virt_size = disk_api.get_disk_size(path) 7025 over_commit_size = int(virt_size) - dk_size <----- with the above, the over_commit_size is still 20GB even if the file is pre allocated. For reference get_disk_size to check the virt_size of the qcow2 file: virt_size from - /usr/lib/python2.7/site-packages/nova/virt/disk/api.py: 141 def get_disk_size(path): 142 """Get the (virtual) size of a disk image 143 144 :param path: Path to the disk image 145 :returns: Size (in bytes) of the given disk image as it would be seen 146 by a virtual machine. 147 """ 148 return images.qemu_img_info(path).virtual_size * /usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py 235 if size: 236 # create_image() only creates the base image if needed, so 237 # we cannot rely on it to exist here 238 if os.path.exists(base) and size > self.get_disk_size(base): 239 self.resize_image(size) 240 241 if (self.preallocate and self._can_fallocate() and 242 os.access(self.path, os.W_OK)): 243 utils.execute('fallocate', '-n', '-l', size, self.path) <----- here we preallocate the space using fallocate E.g. a 1MB file # fallocate -n -l 1000000 /tmp/test # ll /tmp/test -rw-r--r--. 1 root root 0 Mar 9 11:06 /tmp/test # du -mshc /tmp/test 980K /tmp/test # python Python 2.7.5 (default, Aug 2 2016, 04:20:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.path.getsize("/tmp/test") 0 It works if we introduce - virt/disk/api.py: 151 def get_phy_disk_size(path): 152 """Get the (virtual) size of a disk image 153 154 :param path: Path to the disk image 155 :returns: Size (in bytes) of the given disk image as it would be seen 156 by a virtual machine. 157 """ 158 return images.qemu_img_info(path).disk_size and change dk_size to be get in virt/libvirt/driver.py using get_phy_disk_size instead of int(os.path.getsize(path)) 6935 def _get_instance_disk_info(self, instance_name, xml, 6936 block_device_info=None): ... 7008 else: 7009 #dk_size = int(os.path.getsize(path)) 7009 dk_size = disk_api.get_phy_disk_size(path) ... 7021 if disk_type in ("qcow2", "ploop"): 7022 backing_file = libvirt_utils.get_disk_backing_file(path) 7023 virt_size = disk_api.get_disk_size(path) 7024 over_commit_size = int(virt_size) - dk_size 7025 else:
The fallocate behaviour you report is unexpected, and for reference I don't see it on RHEL 7.5: [mbooth@yellow ~]$ fallocate -l 1G test [mbooth@yellow ~]$ ls -lh test -rw-r--r--. 1 mbooth mbooth 1.0G Mar 16 13:06 test [mbooth@yellow ~]$ du -sh test 1.0G test [mbooth@yellow ~]$ python Python 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import os.path >>> os.path.getsize('test') 1073741824 >>> I tested the same on a tmpfs just in case, and the behaviour was the same.
(In reply to Matthew Booth from comment #2) > The fallocate behaviour you report is unexpected, and for reference I don't > see it on RHEL 7.5: > > [mbooth@yellow ~]$ fallocate -l 1G test > [mbooth@yellow ~]$ ls -lh test > -rw-r--r--. 1 mbooth mbooth 1.0G Mar 16 13:06 test > [mbooth@yellow ~]$ du -sh test > 1.0G test > [mbooth@yellow ~]$ python > Python 2.7.5 (default, Feb 20 2018, 09:19:12) > [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import os.path > >>> os.path.getsize('test') > 1073741824 > >>> > > I tested the same on a tmpfs just in case, and the behaviour was the same. nova uses the -n option for fallocate 243 utils.execute('fallocate', '-n', '-l', size, self.path) Without using -n the allocation is correct, but with we have the described behavior
(In reply to Martin Schuppert from comment #3) > (In reply to Matthew Booth from comment #2) > > The fallocate behaviour you report is unexpected, and for reference I don't > > see it on RHEL 7.5: > > > > [mbooth@yellow ~]$ fallocate -l 1G test > > [mbooth@yellow ~]$ ls -lh test > > -rw-r--r--. 1 mbooth mbooth 1.0G Mar 16 13:06 test > > [mbooth@yellow ~]$ du -sh test > > 1.0G test > > [mbooth@yellow ~]$ python > > Python 2.7.5 (default, Feb 20 2018, 09:19:12) > > [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import os.path > > >>> os.path.getsize('test') > > 1073741824 > > >>> > > > > I tested the same on a tmpfs just in case, and the behaviour was the same. > > nova uses the -n option for fallocate > 243 utils.execute('fallocate', '-n', '-l', size, self.path) > > Without using -n the allocation is correct, but with we have the described > behavior Ah, ha! Then that's the bug. With fallocate -n we're not actually doing the fallocate at all on a qcow2 because it's not a sparse file.
(In reply to Matthew Booth from comment #4) > (In reply to Martin Schuppert from comment #3) > > (In reply to Matthew Booth from comment #2) > > > The fallocate behaviour you report is unexpected, and for reference I don't > > > see it on RHEL 7.5: > > > > > > [mbooth@yellow ~]$ fallocate -l 1G test > > > [mbooth@yellow ~]$ ls -lh test > > > -rw-r--r--. 1 mbooth mbooth 1.0G Mar 16 13:06 test > > > [mbooth@yellow ~]$ du -sh test > > > 1.0G test > > > [mbooth@yellow ~]$ python > > > Python 2.7.5 (default, Feb 20 2018, 09:19:12) > > > [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 > > > Type "help", "copyright", "credits" or "license" for more information. > > > >>> import os.path > > > >>> os.path.getsize('test') > > > 1073741824 > > > >>> > > > > > > I tested the same on a tmpfs just in case, and the behaviour was the same. > > > > nova uses the -n option for fallocate > > 243 utils.execute('fallocate', '-n', '-l', size, self.path) > > > > Without using -n the allocation is correct, but with we have the described > > behavior > > Ah, ha! Then that's the bug. With fallocate -n we're not actually doing the > fallocate at all on a qcow2 because it's not a sparse file. For new created instances a change to the fallocate call could solve the issue, but what about existing instances? As we already use the information from qemu-img info for the virt-size my idea was that we could use the same source for the actual size instead of checking the file via getsize.
(In reply to Matthew Booth from comment #4) > (In reply to Martin Schuppert from comment #3) > > Without using -n the allocation is correct, but with we have the described > > behavior > > Ah, ha! Then that's the bug. With fallocate -n we're not actually doing the > fallocate at all on a qcow2 because it's not a sparse file. That's not correct, I've mis-remembered the behaviour of fallocate -n. The bug is that with fallocate -n we are actually allocating the requested size, which reduces disk free by the requisite amount, but *not* increasing the file size. That is a 20G file reports its size as 256k. I believe we should ditch the -n option to fallocate in imagebackend.
(In reply to Martin Schuppert from comment #5) > (In reply to Matthew Booth from comment #4) > > (In reply to Martin Schuppert from comment #3) > > > (In reply to Matthew Booth from comment #2) > > > > The fallocate behaviour you report is unexpected, and for reference I don't > > > > see it on RHEL 7.5: > > > > > > > > [mbooth@yellow ~]$ fallocate -l 1G test > > > > [mbooth@yellow ~]$ ls -lh test > > > > -rw-r--r--. 1 mbooth mbooth 1.0G Mar 16 13:06 test > > > > [mbooth@yellow ~]$ du -sh test > > > > 1.0G test > > > > [mbooth@yellow ~]$ python > > > > Python 2.7.5 (default, Feb 20 2018, 09:19:12) > > > > [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 > > > > Type "help", "copyright", "credits" or "license" for more information. > > > > >>> import os.path > > > > >>> os.path.getsize('test') > > > > 1073741824 > > > > >>> > > > > > > > > I tested the same on a tmpfs just in case, and the behaviour was the same. > > > > > > nova uses the -n option for fallocate > > > 243 utils.execute('fallocate', '-n', '-l', size, self.path) > > > > > > Without using -n the allocation is correct, but with we have the described > > > behavior > > > > Ah, ha! Then that's the bug. With fallocate -n we're not actually doing the > > fallocate at all on a qcow2 because it's not a sparse file. > > For new created instances a change to the fallocate call could solve the > issue, > but what about existing instances? As we already use the information from > qemu-img > info for the virt-size my idea was that we could use the same source for the > actual size instead of checking the file via getsize. We could potentially do that, too, which as you say would avoid having to remediate existing disks. I'll look into both.
According to our records, this should be resolved by openstack-nova-12.0.6-28.el7ost. This build is available now.