Description of problem: Uploading images to glance that are larger than ~20GB in size and then booting them causes the following error: 2463:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part : Unexpected error while running command. 2464:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part 2465:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Exit code: -9 2466:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Stdout: u'' 2467:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Stderr: u'' 2468:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] 2474:2016-12-07 12:51:56.005 28644 ERROR nova.compute.manager [req-77fdc7e1-ee3a-4d4f-9e18-b5ffc0432570 7e96b4757412457a8d2e30118109bfe0 425cbc5131c94e0a98893150bcc44fde - - -] [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Build of instance bfc6b84f-ba07-4dbb-963d-ec6243c91043 aborted: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part : Unexpected error while running command. Version-Release number of selected component (if applicable): openstack-nova-api-13.1.1-2.el7ost.noarch openstack-nova-cert-13.1.1-2.el7ost.noarch openstack-nova-common-13.1.1-2.el7ost.noarch openstack-nova-compute-13.1.1-2.el7ost.noarch openstack-nova-conductor-13.1.1-2.el7ost.noarch openstack-nova-console-13.1.1-2.el7ost.noarch openstack-nova-novncproxy-13.1.1-2.el7ost.noarch openstack-nova-scheduler-13.1.1-2.el7ost.noarch python-nova-13.1.1-2.el7ost.noarch python-novaclient-3.3.1-1.el7ost.noarch How reproducible: 1. Upload image larger than 20GB on NFS. 2. Boot the instance. We think this is related https://bugs.launchpad.net/nova/+bug/1646181 upstream.
Just to add, we can successfully download the image manually and run qemu-img info on it [heat-admin@cv01 tmp(overcloudrc)]$ qemu-img info testimage.qcow2 image: testimage.qcow2 file format: qcow2 virtual size: 90G (96636764160 bytes) disk size: 31G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
Running a test with changed QEMU_IMG_LIMITS as mentioned in the upstream bug did not solve the issue. ~~~ $ diff -u /usr/lib/python2.7/site-packages/nova/virt/images.py.org /usr/lib/python2.7/site-packages/nova/virt/images.py --- /usr/lib/python2.7/site-packages/nova/virt/images.py.org 2016-12-08 08:19:06.795403823 +0000 +++ /usr/lib/python2.7/site-packages/nova/virt/images.py 2016-12-08 08:24:53.570749335 +0000 @@ -40,7 +40,7 @@ QEMU_IMG_LIMITS = processutils.ProcessLimits( cpu_time=2, - address_space=1 * units.Gi) + address_space=1 * units.Gi * 10) def qemu_img_info(path, format=None): ~~~
After a conversation with upstream QEMU folks (Dan Berrange, StefanH, et al), two things to try: (1) Can you try increase the 'cpu_time' limit as well? To perhaps 6 or 8, or more depending on the environment (2) Can you try removing the 'prlimit' argument from the utils.execute call, and see if that fixes the issue? [...] - out, err = utils.execute(*cmd, prlimit=QEMU_IMG_LIMITS) + out, err = utils.execute(*cmd) [...]
As mentioned before in comment 3, multiply by 10 did not provide a change. In addition rising cpu_time the image is fully converted and the machine starts up. 41 QEMU_IMG_LIMITS = processutils.ProcessLimits( 42 cpu_time=8, 43 address_space=1 * units.Gi * 10)
proposed and abandoned stable/mitaka backport patch https://review.openstack.org/#/c/409775/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0467.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days