Bug 1402594 - Problem booting instances from images larger than 20GB with NFS backend.
Summary: Problem booting instances from images larger than 20GB with NFS backend.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: async
: 9.0 (Mitaka)
Assignee: Kashyap Chamarthy
QA Contact: Prasanth Anbalagan
URL:
Whiteboard:
Depends On:
Blocks: 1404651
TreeView+ depends on / blocked
 
Reported: 2016-12-07 22:34 UTC by Robin Cernin
Modified: 2023-09-14 03:36 UTC (History)
17 users (show)

Fixed In Version: openstack-nova-13.1.2-10.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1404651 (view as bug list)
Environment:
Last Closed: 2017-03-08 17:45:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1646181 0 None None None 2016-12-08 08:09:22 UTC
Red Hat Issue Tracker OSP-28618 0 None None None 2023-09-14 03:36:26 UTC
Red Hat Knowledge Base (Solution) 2800111 0 None None None 2016-12-27 08:24:48 UTC
Red Hat Product Errata RHBA-2017:0467 0 normal SHIPPED_LIVE openstack-nova bug fix advisory 2017-03-08 22:44:46 UTC

Description Robin Cernin 2016-12-07 22:34:27 UTC
Description of problem:

Uploading images to glance that are larger than ~20GB in size and then booting them causes the following error: 

2463:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part : Unexpected error while running command.
2464:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part
2465:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Exit code: -9
2466:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Stdout: u''
2467:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Stderr: u''
2468:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] 
2474:2016-12-07 12:51:56.005 28644 ERROR nova.compute.manager [req-77fdc7e1-ee3a-4d4f-9e18-b5ffc0432570 7e96b4757412457a8d2e30118109bfe0 425cbc5131c94e0a98893150bcc44fde - - -] [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Build of instance bfc6b84f-ba07-4dbb-963d-ec6243c91043 aborted: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part : Unexpected error while running command.

Version-Release number of selected component (if applicable):

openstack-nova-api-13.1.1-2.el7ost.noarch
openstack-nova-cert-13.1.1-2.el7ost.noarch
openstack-nova-common-13.1.1-2.el7ost.noarch
openstack-nova-compute-13.1.1-2.el7ost.noarch
openstack-nova-conductor-13.1.1-2.el7ost.noarch
openstack-nova-console-13.1.1-2.el7ost.noarch
openstack-nova-novncproxy-13.1.1-2.el7ost.noarch
openstack-nova-scheduler-13.1.1-2.el7ost.noarch
python-nova-13.1.1-2.el7ost.noarch
python-novaclient-3.3.1-1.el7ost.noarch


How reproducible:

1. Upload image larger than 20GB on NFS.
2. Boot the instance.

We think this is related https://bugs.launchpad.net/nova/+bug/1646181 upstream.

Comment 2 Martin Schuppert 2016-12-08 08:07:38 UTC
Just to add, we can successfully download the image manually and run qemu-img info on it

[heat-admin@cv01 tmp(overcloudrc)]$ qemu-img info testimage.qcow2
image: testimage.qcow2
file format: qcow2
virtual size: 90G (96636764160 bytes)
disk size: 31G
cluster_size: 65536
Format specific information:    
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Comment 3 Martin Schuppert 2016-12-08 09:06:56 UTC
Running a test with changed QEMU_IMG_LIMITS as mentioned in the upstream bug did not solve the issue.

~~~
$ diff -u /usr/lib/python2.7/site-packages/nova/virt/images.py.org /usr/lib/python2.7/site-packages/nova/virt/images.py
--- /usr/lib/python2.7/site-packages/nova/virt/images.py.org	2016-12-08 08:19:06.795403823 +0000
+++ /usr/lib/python2.7/site-packages/nova/virt/images.py	2016-12-08 08:24:53.570749335 +0000
@@ -40,7 +40,7 @@
 
 QEMU_IMG_LIMITS = processutils.ProcessLimits(
     cpu_time=2,
-    address_space=1 * units.Gi)
+    address_space=1 * units.Gi * 10)
 
 
 def qemu_img_info(path, format=None):
~~~

Comment 4 Kashyap Chamarthy 2016-12-08 10:02:09 UTC
After a conversation with upstream QEMU folks (Dan Berrange, StefanH, et
al), two things to try:

(1) Can you try increase the 'cpu_time' limit as well?  To perhaps 6 or 
    8, or more depending on the environment

(2) Can you try removing the 'prlimit' argument from the utils.execute
    call, and see if that fixes the issue?

    [...]
    - out, err = utils.execute(*cmd, prlimit=QEMU_IMG_LIMITS)
    + out, err = utils.execute(*cmd)
    [...]

Comment 5 Martin Schuppert 2016-12-08 12:44:28 UTC
As mentioned before in comment 3, multiply by 10 did not provide a change. In addition rising cpu_time the image is fully converted and the machine starts up.


     41 QEMU_IMG_LIMITS = processutils.ProcessLimits(
     42     cpu_time=8,
     43     address_space=1 * units.Gi * 10)

Comment 8 Jon Schlueter 2016-12-14 14:42:39 UTC
proposed and abandoned stable/mitaka backport patch https://review.openstack.org/#/c/409775/

Comment 16 errata-xmlrpc 2017-03-08 17:45:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0467.html

Comment 17 Red Hat Bugzilla 2023-09-14 03:35:51 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.