Bug 1402594

Summary: Problem booting instances from images larger than 20GB with NFS backend.
Product: Red Hat OpenStack Reporter: Robin Cernin <rcernin>
Component: openstack-novaAssignee: Kashyap Chamarthy <kchamart>
Status: CLOSED ERRATA QA Contact: Prasanth Anbalagan <panbalag>
Severity: high Docs Contact:
Priority: high    
Version: 9.0 (Mitaka)CC: awaugama, berrange, dasmith, eglynn, jschluet, jthomas, kchamart, mlopes, mschuppe, pablo.iranzo, panbalag, sbauza, sclewis, sferdjao, sgordon, srevivo, vromanso
Target Milestone: asyncKeywords: Triaged, ZStream
Target Release: 9.0 (Mitaka)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-13.1.2-10.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1404651 (view as bug list) Environment:
Last Closed: 2017-03-08 17:45:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1404651    

Description Robin Cernin 2016-12-07 22:34:27 UTC
Description of problem:

Uploading images to glance that are larger than ~20GB in size and then booting them causes the following error: 

2463:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part : Unexpected error while running command.
2464:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part
2465:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Exit code: -9
2466:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Stdout: u''
2467:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Stderr: u''
2468:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] 
2474:2016-12-07 12:51:56.005 28644 ERROR nova.compute.manager [req-77fdc7e1-ee3a-4d4f-9e18-b5ffc0432570 7e96b4757412457a8d2e30118109bfe0 425cbc5131c94e0a98893150bcc44fde - - -] [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Build of instance bfc6b84f-ba07-4dbb-963d-ec6243c91043 aborted: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part : Unexpected error while running command.

Version-Release number of selected component (if applicable):

openstack-nova-api-13.1.1-2.el7ost.noarch
openstack-nova-cert-13.1.1-2.el7ost.noarch
openstack-nova-common-13.1.1-2.el7ost.noarch
openstack-nova-compute-13.1.1-2.el7ost.noarch
openstack-nova-conductor-13.1.1-2.el7ost.noarch
openstack-nova-console-13.1.1-2.el7ost.noarch
openstack-nova-novncproxy-13.1.1-2.el7ost.noarch
openstack-nova-scheduler-13.1.1-2.el7ost.noarch
python-nova-13.1.1-2.el7ost.noarch
python-novaclient-3.3.1-1.el7ost.noarch


How reproducible:

1. Upload image larger than 20GB on NFS.
2. Boot the instance.

We think this is related https://bugs.launchpad.net/nova/+bug/1646181 upstream.

Comment 2 Martin Schuppert 2016-12-08 08:07:38 UTC
Just to add, we can successfully download the image manually and run qemu-img info on it

[heat-admin@cv01 tmp(overcloudrc)]$ qemu-img info testimage.qcow2
image: testimage.qcow2
file format: qcow2
virtual size: 90G (96636764160 bytes)
disk size: 31G
cluster_size: 65536
Format specific information:    
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Comment 3 Martin Schuppert 2016-12-08 09:06:56 UTC
Running a test with changed QEMU_IMG_LIMITS as mentioned in the upstream bug did not solve the issue.

~~~
$ diff -u /usr/lib/python2.7/site-packages/nova/virt/images.py.org /usr/lib/python2.7/site-packages/nova/virt/images.py
--- /usr/lib/python2.7/site-packages/nova/virt/images.py.org	2016-12-08 08:19:06.795403823 +0000
+++ /usr/lib/python2.7/site-packages/nova/virt/images.py	2016-12-08 08:24:53.570749335 +0000
@@ -40,7 +40,7 @@
 
 QEMU_IMG_LIMITS = processutils.ProcessLimits(
     cpu_time=2,
-    address_space=1 * units.Gi)
+    address_space=1 * units.Gi * 10)
 
 
 def qemu_img_info(path, format=None):
~~~

Comment 4 Kashyap Chamarthy 2016-12-08 10:02:09 UTC
After a conversation with upstream QEMU folks (Dan Berrange, StefanH, et
al), two things to try:

(1) Can you try increase the 'cpu_time' limit as well?  To perhaps 6 or 
    8, or more depending on the environment

(2) Can you try removing the 'prlimit' argument from the utils.execute
    call, and see if that fixes the issue?

    [...]
    - out, err = utils.execute(*cmd, prlimit=QEMU_IMG_LIMITS)
    + out, err = utils.execute(*cmd)
    [...]

Comment 5 Martin Schuppert 2016-12-08 12:44:28 UTC
As mentioned before in comment 3, multiply by 10 did not provide a change. In addition rising cpu_time the image is fully converted and the machine starts up.


     41 QEMU_IMG_LIMITS = processutils.ProcessLimits(
     42     cpu_time=8,
     43     address_space=1 * units.Gi * 10)

Comment 8 Jon Schlueter 2016-12-14 14:42:39 UTC
proposed and abandoned stable/mitaka backport patch https://review.openstack.org/#/c/409775/

Comment 16 errata-xmlrpc 2017-03-08 17:45:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0467.html

Comment 17 Red Hat Bugzilla 2023-09-14 03:35:51 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days