Bug 524734 - KVM guest ext3 errors at shutdown when using virtio and a qcow2 backing file
Summary: KVM guest ext3 errors at shutdown when using virtio and a qcow2 backing file
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: rawhide
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Mark McLoughlin
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F12VirtBlocker
TreeView+ depends on / blocked
 
Reported: 2009-09-21 23:55 UTC by Tom Horsley
Modified: 2009-10-12 07:16 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-10-12 07:16:26 UTC


Attachments (Terms of Use)
screenshot of the froze up shutdown screen (15.84 KB, image/png)
2009-09-21 23:55 UTC, Tom Horsley
no flags Details
The xml definition for the new "u2" ubuntu machine (1.38 KB, text/plain)
2009-10-02 02:20 UTC, Tom Horsley
no flags Details
The qemu log file for the u2 virtual machine (2.70 KB, text/plain)
2009-10-02 02:29 UTC, Tom Horsley
no flags Details

Description Tom Horsley 2009-09-21 23:55:38 UTC
Created attachment 362017 [details]
screenshot of the froze up shutdown screen

Description of problem:

I finally got my ubuntu KVM to run under rawhide's virt-manager, so decided
to try qcow2 again, converting my raw disk image file to qcow2 to save
as a common base image, then using it as the base of another qcow2 file to
run the KVM with.

All seemed well, working much better than it did when I tried the same
thing in fedora 11, but then I ran shutdown -h now in the KVM, and
it spewed a batch of errors about problems with journal as it was
shutting down and hung.

Version-Release number of selected component (if applicable):
qemu-img-0.10.92-4.fc12.x86_64
qemu-system-x86-0.10.92-4.fc12.x86_64
qemu-common-0.10.92-4.fc12.x86_64
qemu-kvm-0.10.92-4.fc12.x86_64
gpxe-roms-qemu-0.9.7-5.fc12.noarch


How reproducible:
Only tried it once.

Steps to Reproduce:
1.see above.
2.
3.
  
Actual results:
hang at shutdown with journal errors

Expected results:
clean shutdown

Additional info:

Comment 1 Tom Horsley 2009-09-23 22:59:28 UTC
I've now tried this a few more times, and continued to get errors with the
disk using virtio (not always the journal errors above, sometimes it would
hang at boot or flake out randomy in other ways).

I converted the disk to use an emulated ide mode such as is used by
default with windows xp, and the problems all disappeared.

I should probably add that I was using kernel 2.6.31-33.fc12.x86_64 for
all this testing, and I have a 4 processor machine with the ubuntu
KVM also set to use 4 vcpus.

Comment 2 Mark McLoughlin 2009-10-01 08:56:47 UTC
Tom: could you give us more details about the guest? Could you also supply /var/log/libvirt/qemu/$guest.log ?

Perhaps also worth trying Fedora 11 and Fedora 12 guests

hch, kwolf: any ideas on this? sounds like it only happens with qcow2+virtio. My guess is that no cache= parameter is supplied

Comment 3 Tom Horsley 2009-10-01 10:00:22 UTC
I'll give it another try to get a log, but I'm pretty sure I remember
looking in the log when it happened and the only thing in it was the
initial qemu-kvm command line - no messages after that.

The guest is 64 bit ubuntu 9.04 installed via the "alternate" text based
installer iso image.

Comment 4 Tom Horsley 2009-10-01 12:30:27 UTC
Also, this may be something silly I did wrong. I only recently managed to
discover the deeply nested, jargon camouflaged, dialog in virt-manager
that would let me create a qcow2 image at install time, so this qcow2
image is really the result of used qemu-img to convert the raw image
I got in initial install to a qcow2 image. I didn't change any parameters in
the xml definition of the storage for the virtual machine, I just made
the file it referred to be a qcow2 image. If changes to other attributes
were required, they didn't happen (though I did sort of assume it ought to
recognize the image format and do whatever was necessary automagically).

Comment 5 Mark McLoughlin 2009-10-01 13:13:07 UTC
Tom: it's always best to include the guest XML config and log file, so we can spot whether the config is correct

If you have:

    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>

then you need to change it to:

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>

but I assume you don't have the <driver> line at all

You could try making it:

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>

and see if that helps

Comment 6 Tom Horsley 2009-10-02 02:16:40 UTC
OK, I've narrowed it down. What I really need to make it fail is not
just a single qcow2 image but a qcow2 image with another qcow2 image
backing it up. I reinstalled (running on a rawhide system) a new ubuntu
image from scratch, this time using a qcow2 image in the initial install
by diving down into the use existing storage button and creating one
in there. That image worked fine with all the virt-manager created
defaults. The disk definition looked like:

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/u2.img.img'/>
      <target dev='vda' bus='virtio'/>
    </disk>

But then I shutdown the machine, and diddled the disk images to rename the
original image and make a new image based on it, so that I had this
combination:

[root@zooty images]# ls -l u2*
-r--r--r-- 1 root root 2801205248 2009-10-01 21:42 u2-base.img
-rw-rw-rw- 1 root root    6946816 2009-10-01 22:04 u2.img.img
[root@zooty images]# qemu-img info u2-base.img
image: u2-base.img
file format: qcow2
virtual size: 20G (20971520000 bytes)
disk size: 2.6G
cluster_size: 65536
[root@zooty images]# qemu-img info u2.img.img 
image: u2.img.img
file format: qcow2
virtual size: 20G (20971520000 bytes)
disk size: 6.5M
cluster_size: 65536
backing file: u2-base.img (actual path: u2-base.img)

Now when I boot the kvm, it randomly fails, sometimes during boot, sometimes
later. Adding cache='none' does not change this behavior, I still get random
failures.

However, if I change the disk device to look like this:

    <disk type='file' device='disk'>
      <source file='/var/lib/libvirt/images/ubu.img.img'/>
      <target dev='hda' bus='ide'/>
    </disk>

then everything works perfectly fine, even with the qcow2 based on another
qcow2.

Comment 7 Tom Horsley 2009-10-02 02:20:06 UTC
Created attachment 363413 [details]
The xml definition for the new "u2" ubuntu machine

Here's the current incarnation of the xml definition. This one has the
cache=none attribute (which didn't help).

Comment 8 Tom Horsley 2009-10-02 02:29:53 UTC
Created attachment 363416 [details]
The qemu log file for the u2 virtual machine

Nothing much in here but qemu command lines, but I add it for completeness.

Comment 9 Mark McLoughlin 2009-10-02 08:12:23 UTC
Great stuff, Tom - thanks

Christoph: summary seems to be that this is specific to virtio and qcow2 with a backing file

Comment 10 Mark McLoughlin 2009-10-06 15:32:24 UTC
Kevin is looking at this, I think

Comment 11 Mark McLoughlin 2009-10-09 07:41:42 UTC
http://lists.gnu.org/archive/html/qemu-devel/2009-10/msg00767.html

Kevin, it sounds like Anthony is fine with this patch for the stable branch, so I'll go ahead and pull it into F-12 today

Comment 12 Kevin Wolf 2009-10-09 07:55:48 UTC
I was going to ask you to apply it today, but you're just too quick for me. :-)

Yes, please do. It's the only sane thing we can do for a code base that will be used by real users soon.

Comment 13 Mark McLoughlin 2009-10-09 14:36:13 UTC
* Fri Oct  9 2009 Mark McLoughlin <markmc@redhat.com> - 2:0.11.0-6
- Fix fs errors with virtio and qcow2 backing file (#524734)
- Fix ksm initscript errors on kernel missing ksm (#527653)
- Add missing Requires(post): getent, useradd, groupadd (#527087)

Comment 14 Mark McLoughlin 2009-10-09 16:01:59 UTC
tag request: https://fedorahosted.org/rel-eng/ticket/2429

Comment 15 Mark McLoughlin 2009-10-12 07:16:26 UTC
tagged now for F12 GA


Note You need to log in before you can comment on or make changes to this bug.