Bug 1402594 - Problem booting instances from images larger than 20GB with NFS backend. [NEEDINFO]
Summary: Problem booting instances from images larger than 20GB with NFS backend.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: async
: 9.0 (Mitaka)
Assignee: Kashyap Chamarthy
QA Contact: Prasanth Anbalagan
URL:
Whiteboard:
Depends On:
Blocks: 1404651
TreeView+ depends on / blocked
 
Reported: 2016-12-07 22:34 UTC by Robin Cernin
Modified: 2020-06-11 13:07 UTC (History)
17 users (show)

Fixed In Version: openstack-nova-13.1.2-10.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1404651 (view as bug list)
Environment:
Last Closed: 2017-03-08 17:45:26 UTC
Target Upstream Version:
mlopes: needinfo? (kchamart)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1646181 0 None None None 2016-12-08 08:09:22 UTC
Red Hat Knowledge Base (Solution) 2800111 0 None None None 2016-12-27 08:24:48 UTC
Red Hat Product Errata RHBA-2017:0467 0 normal SHIPPED_LIVE openstack-nova bug fix advisory 2017-03-08 22:44:46 UTC

Description Robin Cernin 2016-12-07 22:34:27 UTC
Description of problem:

Uploading images to glance that are larger than ~20GB in size and then booting them causes the following error: 

2463:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part : Unexpected error while running command.
2464:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part
2465:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Exit code: -9
2466:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Stdout: u''
2467:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Stderr: u''
2468:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] 
2474:2016-12-07 12:51:56.005 28644 ERROR nova.compute.manager [req-77fdc7e1-ee3a-4d4f-9e18-b5ffc0432570 7e96b4757412457a8d2e30118109bfe0 425cbc5131c94e0a98893150bcc44fde - - -] [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Build of instance bfc6b84f-ba07-4dbb-963d-ec6243c91043 aborted: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part : Unexpected error while running command.

Version-Release number of selected component (if applicable):

openstack-nova-api-13.1.1-2.el7ost.noarch
openstack-nova-cert-13.1.1-2.el7ost.noarch
openstack-nova-common-13.1.1-2.el7ost.noarch
openstack-nova-compute-13.1.1-2.el7ost.noarch
openstack-nova-conductor-13.1.1-2.el7ost.noarch
openstack-nova-console-13.1.1-2.el7ost.noarch
openstack-nova-novncproxy-13.1.1-2.el7ost.noarch
openstack-nova-scheduler-13.1.1-2.el7ost.noarch
python-nova-13.1.1-2.el7ost.noarch
python-novaclient-3.3.1-1.el7ost.noarch


How reproducible:

1. Upload image larger than 20GB on NFS.
2. Boot the instance.

We think this is related https://bugs.launchpad.net/nova/+bug/1646181 upstream.

Comment 2 Martin Schuppert 2016-12-08 08:07:38 UTC
Just to add, we can successfully download the image manually and run qemu-img info on it

[heat-admin@cv01 tmp(overcloudrc)]$ qemu-img info testimage.qcow2
image: testimage.qcow2
file format: qcow2
virtual size: 90G (96636764160 bytes)
disk size: 31G
cluster_size: 65536
Format specific information:    
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Comment 3 Martin Schuppert 2016-12-08 09:06:56 UTC
Running a test with changed QEMU_IMG_LIMITS as mentioned in the upstream bug did not solve the issue.

~~~
$ diff -u /usr/lib/python2.7/site-packages/nova/virt/images.py.org /usr/lib/python2.7/site-packages/nova/virt/images.py
--- /usr/lib/python2.7/site-packages/nova/virt/images.py.org	2016-12-08 08:19:06.795403823 +0000
+++ /usr/lib/python2.7/site-packages/nova/virt/images.py	2016-12-08 08:24:53.570749335 +0000
@@ -40,7 +40,7 @@
 
 QEMU_IMG_LIMITS = processutils.ProcessLimits(
     cpu_time=2,
-    address_space=1 * units.Gi)
+    address_space=1 * units.Gi * 10)
 
 
 def qemu_img_info(path, format=None):
~~~

Comment 4 Kashyap Chamarthy 2016-12-08 10:02:09 UTC
After a conversation with upstream QEMU folks (Dan Berrange, StefanH, et
al), two things to try:

(1) Can you try increase the 'cpu_time' limit as well?  To perhaps 6 or 
    8, or more depending on the environment

(2) Can you try removing the 'prlimit' argument from the utils.execute
    call, and see if that fixes the issue?

    [...]
    - out, err = utils.execute(*cmd, prlimit=QEMU_IMG_LIMITS)
    + out, err = utils.execute(*cmd)
    [...]

Comment 5 Martin Schuppert 2016-12-08 12:44:28 UTC
As mentioned before in comment 3, multiply by 10 did not provide a change. In addition rising cpu_time the image is fully converted and the machine starts up.


     41 QEMU_IMG_LIMITS = processutils.ProcessLimits(
     42     cpu_time=8,
     43     address_space=1 * units.Gi * 10)

Comment 8 Jon Schlueter 2016-12-14 14:42:39 UTC
proposed and abandoned stable/mitaka backport patch https://review.openstack.org/#/c/409775/

Comment 16 errata-xmlrpc 2017-03-08 17:45:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0467.html


Note You need to log in before you can comment on or make changes to this bug.