Bug 1034433

Summary: qemu-img create with NBD backing file hangs in Rawhide
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: amit.shah, berrange, cfergeau, crobinso, dwmw2, eblake, hreitz, itamar, kwolf, pbonzini, rjones, scottt.tw, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-03 13:54:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 910269    

Description Richard W.M. Jones 2013-11-25 20:15:18 UTC
Description of problem:

Note this only fails in Fedora Rawhide.  In Fedora 19 & 20 it
works as you would expect.

cd /tmp
truncate -s 1G test.raw
qemu-nbd test.raw -t -p 9999

In another window:

cd /tmp
qemu-img create -f qcow2 -b nbd:localhost:9999 -o backing_fmt=raw test.qcow2

The qemu-img command will hang indefinitely.

Version-Release number of selected component (if applicable):

qemu-img-1.7.0-0.1.rc1.fc21.x86_64

How reproducible:

100%

Steps to Reproduce:
[see above]

Comment 1 Richard W.M. Jones 2013-11-25 20:35:43 UTC
This also fails with upstream qemu.  'git bisect' says:

ba2ab2f2ca4150a7e314fbb19fa158bd8ddc36eb is the first bad commit
commit ba2ab2f2ca4150a7e314fbb19fa158bd8ddc36eb
Author: Max Reitz <mreitz>
Date:   Thu Oct 24 20:35:06 2013 +0200

    qcow2: Flush image after creation
    
    Opening the qcow2 image with BDRV_O_NO_FLUSH prevents any flushes during
    the image creation. This means that the image has not yet been flushed
    to disk when qemu-img create exits. This flush is delayed until the next
    operation on the image involving opening it without BDRV_O_NO_FLUSH and
    closing (or directly flushing) it. For large images and/or images with a
    small cluster size and preallocated metadata, this flush may take a
    significant amount of time and may occur unexpectedly.
    
    Reopening the image without BDRV_O_NO_FLUSH right before the end of
    qcow2_create2() results in hoisting the potentially costly flush into
    the image creation, which is expected to take some time (whereas
    successive image operations may be not).
    
    Signed-off-by: Max Reitz <mreitz>
    Reviewed-by: Eric Blake <eblake>
    Reviewed-by: Benoit Canet <benoit>
    Signed-off-by: Kevin Wolf <kwolf>

Comment 2 Richard W.M. Jones 2013-11-25 20:39:29 UTC
I've confirmed this that the commit in comment 1 is the first
one which causes the hang in qemu-img.  Max, any ideas?

Comment 3 Richard W.M. Jones 2013-11-25 20:46:00 UTC
BTW I don't even agree with the premise of this commit.

In libguestfs we usually want to *avoid* flushing anything to
disk if at all possible, because the majority of images we're dealing
with are transient and throwaway.  Don't force syncing and flushing
behaviour.  If people want they can call sync/fsync themselves.

Comment 4 Cole Robinson 2013-11-26 16:22:20 UTC
CCing kevin as well. Kevin, summary: there's a qemu-img regression queued for 1.7, Rich bisected it Comment #1

Comment 5 Kevin Wolf 2013-11-26 16:37:35 UTC
We're too late now for 1.7.0.

The symptoms don't ring a bell, it looks rather odd that a backing file can make
it hang. I guess Max should just debug it.

(In reply to Richard W.M. Jones from comment #3)
> BTW I don't even agree with the premise of this commit.
> 
> In libguestfs we usually want to *avoid* flushing anything to
> disk if at all possible, because the majority of images we're dealing
> with are transient and throwaway.  Don't force syncing and flushing
> behaviour.  If people want they can call sync/fsync themselves.

Hm... So you create the image and then only work with cache=unsafe on it?

Comment 6 Richard W.M. Jones 2013-11-26 16:41:44 UTC
> > In libguestfs we usually want to *avoid* flushing anything to
> > disk if at all possible, because the majority of images we're dealing
> > with are transient and throwaway.  Don't force syncing and flushing
> > behaviour.  If people want they can call sync/fsync themselves.
> 
> Hm... So you create the image and then only work with cache=unsafe on it?

For transient stuff that can be easily recreated, yes.

For example, virt-builder (building new images from scratch)
will do everything with cache=unsafe.  The final step is to
call fsync on the file[1] to commit it to disk before exiting
with a 0 status.

Provided a program using virt-builder checks the exit status
this is safe.

[1] Although this is only necessary because of the default
of qemu using cache=none.  If people know what they are doing
they can disable this fsync in virt-builder.

Comment 7 Richard W.M. Jones 2013-11-26 16:43:16 UTC
(ie. see --no-sync option and notes here:
http://libguestfs.org/virt-builder.1.html )

Comment 8 Hanna Czenczek 2013-11-27 20:13:02 UTC
The issue's cause is apparently that bdrv_img_create opens the backing file and does not close it before it calls bdrv_create. Most block drivers open an empty file, write to it, and then close it when the image has been created. The aforementioned commit makes qcow2 close the complete image file and reopen it, but now it contains data indicating it is actually backed.

This results in the NBD client trying to maintain two connections to the NBD server (one from bdrv_img_create and one from the bdrv_open in qcow2_create2). The NBD server does not accept the second connection since it allows only one at a time. qemu-img create will accordingly hang in some kind of a deadlock.

I have sent a patch which makes bdrv_img_create close the backing file before calling bdrv_create, fixing the issue.

As for flushing at all: I think, there should be an option to specify the caching mode used by bdrv_(img_)create.

Comment 9 Hanna Czenczek 2013-11-27 20:22:30 UTC
On second thought, it is probably not a good idea to take this bug if it isn't for RHEL, thus I reverted the assignee.

Comment 10 Cole Robinson 2013-12-02 19:21:59 UTC
Latest patch is here:

https://lists.gnu.org/archive/html/qemu-devel/2013-11/msg04203.html

Comment 11 Cole Robinson 2013-12-03 13:54:33 UTC
Fixed in qemu-1.7.0-1 built now for rawhide