1402594 – Problem booting instances from images larger than 20GB with NFS backend.

Bug 1402594 - Problem booting instances from images larger than 20GB with NFS backend.

Summary: Problem booting instances from images larger than 20GB with NFS backend.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	9.0 (Mitaka)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	async
Target Release:	9.0 (Mitaka)
Assignee:	Kashyap Chamarthy
QA Contact:	Prasanth Anbalagan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1404651
TreeView+	depends on / blocked

Reported:	2016-12-07 22:34 UTC by Robin Cernin
Modified:	2023-09-14 03:36 UTC (History)
CC List:	17 users (show)
Fixed In Version:	openstack-nova-13.1.2-10.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1404651 (view as bug list)
Environment:
Last Closed:	2017-03-08 17:45:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1646181	None	None	None	2016-12-08 08:09:22 UTC
Red Hat Issue Tracker	OSP-28618	None	None	None	2023-09-14 03:36:26 UTC
Red Hat Knowledge Base (Solution)	2800111	None	None	None	2016-12-27 08:24:48 UTC
Red Hat Product Errata	RHBA-2017:0467	normal	SHIPPED_LIVE	openstack-nova bug fix advisory	2017-03-08 22:44:46 UTC

Description Robin Cernin 2016-12-07 22:34:27 UTC

Description of problem:

Uploading images to glance that are larger than ~20GB in size and then booting them causes the following error: 

2463:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part : Unexpected error while running command.
2464:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part
2465:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Exit code: -9
2466:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Stdout: u''
2467:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Stderr: u''
2468:2016-12-07 12:51:55.682 28644 ERROR nova.compute.manager [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] 
2474:2016-12-07 12:51:56.005 28644 ERROR nova.compute.manager [req-77fdc7e1-ee3a-4d4f-9e18-b5ffc0432570 7e96b4757412457a8d2e30118109bfe0 425cbc5131c94e0a98893150bcc44fde - - -] [instance: bfc6b84f-ba07-4dbb-963d-ec6243c91043] Build of instance bfc6b84f-ba07-4dbb-963d-ec6243c91043 aborted: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/8a3fc144ac7fe323269310f45a2df09c221a669b.part : Unexpected error while running command.

Version-Release number of selected component (if applicable):

openstack-nova-api-13.1.1-2.el7ost.noarch
openstack-nova-cert-13.1.1-2.el7ost.noarch
openstack-nova-common-13.1.1-2.el7ost.noarch
openstack-nova-compute-13.1.1-2.el7ost.noarch
openstack-nova-conductor-13.1.1-2.el7ost.noarch
openstack-nova-console-13.1.1-2.el7ost.noarch
openstack-nova-novncproxy-13.1.1-2.el7ost.noarch
openstack-nova-scheduler-13.1.1-2.el7ost.noarch
python-nova-13.1.1-2.el7ost.noarch
python-novaclient-3.3.1-1.el7ost.noarch


How reproducible:

1. Upload image larger than 20GB on NFS.
2. Boot the instance.

We think this is related https://bugs.launchpad.net/nova/+bug/1646181 upstream.

Comment 2 Martin Schuppert 2016-12-08 08:07:38 UTC

Just to add, we can successfully download the image manually and run qemu-img info on it

[heat-admin@cv01 tmp(overcloudrc)]$ qemu-img info testimage.qcow2
image: testimage.qcow2
file format: qcow2
virtual size: 90G (96636764160 bytes)
disk size: 31G
cluster_size: 65536
Format specific information:    
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Comment 3 Martin Schuppert 2016-12-08 09:06:56 UTC

Running a test with changed QEMU_IMG_LIMITS as mentioned in the upstream bug did not solve the issue.

~~~
$ diff -u /usr/lib/python2.7/site-packages/nova/virt/images.py.org /usr/lib/python2.7/site-packages/nova/virt/images.py
--- /usr/lib/python2.7/site-packages/nova/virt/images.py.org	2016-12-08 08:19:06.795403823 +0000
+++ /usr/lib/python2.7/site-packages/nova/virt/images.py	2016-12-08 08:24:53.570749335 +0000
@@ -40,7 +40,7 @@
 
 QEMU_IMG_LIMITS = processutils.ProcessLimits(
     cpu_time=2,
-    address_space=1 * units.Gi)
+    address_space=1 * units.Gi * 10)
 
 
 def qemu_img_info(path, format=None):
~~~

Comment 4 Kashyap Chamarthy 2016-12-08 10:02:09 UTC

After a conversation with upstream QEMU folks (Dan Berrange, StefanH, et
al), two things to try:

(1) Can you try increase the 'cpu_time' limit as well?  To perhaps 6 or 
    8, or more depending on the environment

(2) Can you try removing the 'prlimit' argument from the utils.execute
    call, and see if that fixes the issue?

    [...]
    - out, err = utils.execute(*cmd, prlimit=QEMU_IMG_LIMITS)
    + out, err = utils.execute(*cmd)
    [...]

Comment 5 Martin Schuppert 2016-12-08 12:44:28 UTC

As mentioned before in comment 3, multiply by 10 did not provide a change. In addition rising cpu_time the image is fully converted and the machine starts up.


     41 QEMU_IMG_LIMITS = processutils.ProcessLimits(
     42     cpu_time=8,
     43     address_space=1 * units.Gi * 10)

Comment 8 Jon Schlueter 2016-12-14 14:42:39 UTC

proposed and abandoned stable/mitaka backport patch https://review.openstack.org/#/c/409775/

Comment 16 errata-xmlrpc 2017-03-08 17:45:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0467.html

Comment 17 Red Hat Bugzilla 2023-09-14 03:35:51 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.