Bug 497170 - qemu segfaults with read-only disk images
Summary: qemu segfaults with read-only disk images
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: rawhide
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Glauber Costa
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 500185 (view as bug list)
Depends On:
Blocks: F11VirtTarget
TreeView+ depends on / blocked
 
Reported: 2009-04-22 16:24 UTC by Richard W.M. Jones
Modified: 2009-05-20 00:53 UTC (History)
6 users (show)

Fixed In Version: 0.10.4-4.fc11
Clone Of:
Environment:
Last Closed: 2009-05-20 00:53:12 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
readonly-disk-backtrace.txt (6.77 KB, text/plain)
2009-05-13 15:36 UTC, Mark McLoughlin
no flags Details

Description Richard W.M. Jones 2009-04-22 16:24:25 UTC
Description of problem:

When qemu is given a read-only -drive parameter, then certain operations
on that drive (such as mounting partitions from the drive) eventually
cause qemu to print this message:

raw_aio_remove: aio request not found!

and segfault.

Making the image writable removes the error.  *However* note that
I want the drive to be read-only.

Version-Release number of selected component (if applicable):

qemu 0.10-12.fc11.x86_64

How reproducible:

Very reliably, particularly with a RHEL guest image.

Steps to Reproduce:
1. chmod -w RHEL.img
2. qemu -drive RHEL.img
3. try mounting a filesystem inside qemu
  
Actual results:

segfaults with error

Expected results:

should not segfault

Additional info:

Comment 1 Richard W.M. Jones 2009-04-22 17:05:45 UTC
Here is a realistic reproducer and stack trace:

$ gdb --args qemu-system-x86_64 -drive file=/dev/mapper/Guests-RHEL39FV32 -m 384 -kernel /usr/lib64/guestfs/vmlinuz.fedora-10.x86_64 -initrd /usr/lib64/guestfs/initramfs.fedora-10.x86_64.img -append "console=ttyS0" -nographic -serial stdio

bash-3.2# mount /dev/sda1 /
[Thread 0x7fa52eafa910 (LWP 31843) exited]
[New Thread 0x7fa52eafa910 (LWP 31845)]
[Thread 0x7fa52eafa910 (LWP 31845) exited]
raw_aio_remove: aio request not found!
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata1.00: cmd ca/00:02:4d:00:00/00:00:00:00:00/e0 tag 0 dma 1024 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: link is slow to respond, please be patient (ready=0)
ata1: device not ready (errno=-16), forcing hardreset
ata1: soft resetting link
ata1.00: configured for MWDMA2
ata1: EH complete
raw_aio_remove: aio request not found!
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata1.00: cmd ca/00:02:4d:00:00/00:00:00:00:00/e0 tag 0 dma 1024 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: link is slow to respond, please be patient (ready=0)
ata1: device not ready (errno=-16), forcing hardreset
ata1: soft resetting link
ata1.00: configured for MWDMA2
ata1: EH complete

Program received signal SIGSEGV, Segmentation fault.
0x000000000041a6b5 in qemu_paio_cancel (fd=<value optimized out>, 
    aiocb=0x21ef6f0) at posix-aio-compat.c:235
235	        TAILQ_REMOVE(&request_list, aiocb, node);
Missing separate debuginfos, use: debuginfo-install SDL-1.2.13-8.fc11.x86_64 gdbm-1.8.0-31.fc11.x86_64 libICE-1.0.4-7.fc11.x86_64 libSM-1.1.0-4.fc11.x86_64 libXext-1.0.99.1-2.fc11.x86_64 libXtst-1.0.3-5.fc11.x86_64 libasyncns-0.7-2.fc11.x86_64 libattr-2.4.43-3.fc11.x86_64 libcap-2.16-2.fc11.x86_64 pulseaudio-libs-0.9.15-3.test5.fc11.x86_64 tcp_wrappers-libs-7.6-54.fc11.x86_64
(gdb) bt
#0  0x000000000041a6b5 in qemu_paio_cancel (fd=<value optimized out>, 
    aiocb=0x21ef6f0) at posix-aio-compat.c:235
#1  0x000000000041b1a8 in raw_aio_cancel (blockacb=<value optimized out>)
    at block-raw-posix.c:682
#2  0x0000000000432930 in ide_dma_cancel (bm=0x22e5e60)
    at /usr/src/debug/qemu-kvm-0.10/qemu/hw/ide.c:2973
#3  0x0000000000432998 in bmdma_cmd_writeb (opaque=0x22e5e60, addr=0, val=0)
    at /usr/src/debug/qemu-kvm-0.10/qemu/hw/ide.c:2987
#4  0x00000000004074db in cpu_outb (env=0x21b0e80, addr=0, val=0)
    at /usr/src/debug/qemu-kvm-0.10/qemu/vl.c:453
#5  0x000000004271b632 in ?? ()
#6  0x0000000017897108 in ?? ()
#7  0xffffffff81221000 in ?? ()
#8  0x00000000021b0e80 in ?? ()
#9  0x00000000004be275 in phys_page_find (index=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10/qemu/exec.c:389
#10 tlb_set_page_exec (index=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10/qemu/exec.c:1983
#11 0x000000000052bfe3 in tlb_fill (addr=0, is_write=<value optimized out>, 
    mmu_idx=<value optimized out>, retaddr=0x0)
    at /usr/src/debug/qemu-kvm-0.10/qemu/target-i386/op_helper.c:4774
#12 0x00000000004c0c12 in __ldb_cmmu (addr=18446744071581080137, mmu_idx=0)
    at /usr/src/debug/qemu-kvm-0.10/qemu/softmmu_template.h:135
#13 0x00000000004c456b in cpu_x86_exec (env1=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10/qemu/cpu-exec.c:626
#14 0x000000000040ca4c in main_loop ()
    at /usr/src/debug/qemu-kvm-0.10/qemu/vl.c:3862
#15 main () at /usr/src/debug/qemu-kvm-0.10/qemu/vl.c:6126

Comment 2 Richard W.M. Jones 2009-04-22 18:14:47 UTC
Good news.

I have tested this with the latest qemu from svn (r7228), and the bug appears
to be fixed.

I have some more testing to do, and if that works I will close the bug.

Comment 3 Richard W.M. Jones 2009-04-22 18:34:45 UTC
Yes, this is looking good with qemu from svn.

Comment 4 Richard W.M. Jones 2009-04-23 13:01:03 UTC
Reopening and blocking F11VirtTarget.

Comment 5 Mark McLoughlin 2009-04-23 16:18:33 UTC
These patches are waiting to be pulled into the qemu stable branch and should fix it:

http://lists.gnu.org/archive/html/qemu-devel/2009-04/msg01276.html

Comment 6 Fedora Admin XMLRPC Client 2009-05-07 12:12:47 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 7 Fedora Admin XMLRPC Client 2009-05-07 12:13:48 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 8 Fedora Admin XMLRPC Client 2009-05-07 12:14:13 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 9 Fedora Admin XMLRPC Client 2009-05-07 17:58:35 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 10 Mark McLoughlin 2009-05-11 17:41:14 UTC
*** Bug 500185 has been marked as a duplicate of this bug. ***

Comment 11 Fedora Update System 2009-05-13 11:59:08 UTC
qemu-0.10.4-2.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/qemu-0.10.4-2.fc11

Comment 12 Mark McLoughlin 2009-05-13 12:07:50 UTC
Rich could you try qemu-0.10.4-2.fc11 and bump its karma if it fixes your problem?

Comment 13 Richard W.M. Jones 2009-05-13 14:11:46 UTC
Installing qemu-system-x86-0.10.4-2.fc11.x86_64.rpm gives:

Usage:  {start|stop|status|restart|condrestart}
warning: %post(qemu-system-x86-2:0.10.4-2.fc11.x86_64) scriptlet failed, exit status 1

rpm -V qemu-system-x86 gives no output which I assume means
that the failing script (/etc/sysconfig/modules/kvm.modules) is the
same as the one in the package.  I certainly haven't consciously
edited this file ever.

Comment 14 Richard W.M. Jones 2009-05-13 14:32:28 UTC
I'm afraid to say this new qemu doesn't solve this problem,
although it fails in a different way (it now such abruptly
segfaults when I do the same test).  Sorry :-(

Comment 15 Mark McLoughlin 2009-05-13 14:33:27 UTC
(In reply to comment #13)
> Installing qemu-system-x86-0.10.4-2.fc11.x86_64.rpm gives:
> 
> Usage:  {start|stop|status|restart|condrestart}
> warning: %post(qemu-system-x86-2:0.10.4-2.fc11.x86_64) scriptlet failed, exit
> status 1

Thanks Rich, I should have noticed that. Mixup between %{source1} and ${source2}

qemu-0.10.4-3.fc11 is coming:

https://koji.fedoraproject.org/koji/buildinfo?buildID=102024

Comment 16 Mark McLoughlin 2009-05-13 14:34:10 UTC
(In reply to comment #14)
> I'm afraid to say this new qemu doesn't solve this problem,
> although it fails in a different way (it now such abruptly
> segfaults when I do the same test).  Sorry :-( 

Thanks for trying; backtrace?

Comment 17 Fedora Update System 2009-05-13 15:07:04 UTC
qemu-0.10.4-3.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/qemu-0.10.4-3.fc11

Comment 18 Richard W.M. Jones 2009-05-13 15:32:26 UTC
Here's the command I'm using:

gdb --args /usr/bin/qemu-kvm -drive file=/dev/mapper/Guests-RHEL39FV32 -m 384 -no-reboot -kernel vmlinuz.rawhide.x86_64 -initrd initramfs.rawhide.x86_64.img -append 'panic=1 console=ttyS0 guestfs=10.0.2.4:6666 guestfs_verbose=1' -nographic -serial stdio -net channel,6666:unix:/tmp/sock,server,nowait -net user,vlan=0 -net nic,model=virtio,vlan=0

The initramfs in this case is modified so it gives me a shell inside the guest.  At the
shell I do:

mount /dev/sda1 /

(note that I only have *read* access, not write access, to the guest block device).

KVM segfaults about 10-20 seconds after the mount command.

(gdb) thread apply all bt

Thread 2 (Thread 0x7ff55eee0910 (LWP 3815)):
#0  0x000000000046cb43 in bdrv_aio_cancel (acb=0x1864010) at block.c:1471
#1  0x0000000000434140 in ide_dma_cancel (bm=0x1864e60)
    at /usr/src/debug/qemu-kvm-0.10.4/hw/ide.c:2973
#2  0x00000000004341a8 in bmdma_cmd_writeb (opaque=0x1864e60, addr=49152, 
    val=0) at /usr/src/debug/qemu-kvm-0.10.4/hw/ide.c:2987
#3  0x000000000051ed88 in kvm_outb (opaque=<value optimized out>, addr=49152, 
    data=0 '\0') at /usr/src/debug/qemu-kvm-0.10.4/qemu-kvm.c:684
#4  0x000000000054c249 in handle_io (vcpu=<value optimized out>, 
    run=<value optimized out>, kvm=<value optimized out>) at libkvm.c:735
#5  kvm_run (vcpu=<value optimized out>, run=<value optimized out>, 
    kvm=<value optimized out>) at libkvm.c:964
#6  0x000000000051f569 in kvm_cpu_exec (env=0x0)
    at /usr/src/debug/qemu-kvm-0.10.4/qemu-kvm.c:205
#7  0x000000000051f850 in kvm_main_loop_cpu (env=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10.4/qemu-kvm.c:414
#8  ap_main_loop (env=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10.4/qemu-kvm.c:451
#9  0x00000030fca0687a in start_thread () from /lib64/libpthread.so.0
#10 0x00000030fbee04cd in clone () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ff5784f3740 (LWP 3812)):
#0  0x00000030fbed9092 in select () from /lib64/libc.so.6
#1  0x0000000000409c33 in qemu_select (tv=<value optimized out>, 
    xfds=<value optimized out>, wfds=<value optimized out>, 
    rfds=<value optimized out>, max_fd=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10.4/vl.c:3669
#2  main_loop_wait (tv=<value optimized out>, xfds=<value optimized out>, 
    wfds=<value optimized out>, rfds=<value optimized out>, 
    max_fd=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.4/vl.c:3768
#3  0x000000000051f02a in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.10.4/qemu-kvm.c:596
#4  0x000000000040e9c4 in main_loop ()
    at /usr/src/debug/qemu-kvm-0.10.4/vl.c:3831
#5  main () at /usr/src/debug/qemu-kvm-0.10.4/vl.c:6127

Comment 19 Mark McLoughlin 2009-05-13 15:36:27 UTC
Created attachment 343792 [details]
readonly-disk-backtrace.txt

stack trace obtained with a build of qemu-kvm-0.10.4 from git

Comment 20 Mark McLoughlin 2009-05-13 15:40:34 UTC
(gdb) p acb
$1 = (BlockDriverAIOCB *) 0xcb3010
(gdb) p *acb
$2 = {pool = 0x280000570108086, bs = 0x400001018000, cb = 0, opaque = 0x0, 
  next = 0xc001}
(gdb) p *acb->pool
Cannot access memory at address 0x280000570108086

Comment 21 Glauber Costa 2009-05-13 20:46:48 UTC
Rich,

you mentioned that this does not appear upstream.
Is that in qemu upstream, or qemu-kvm?
If this is qemu, this might be a kvm specific problem.

Otherwise, do you think you can identify the commit that causes the problem?

If it gives you too much of a pain, don't bother. bisecting qemu is a PITA ;-(

Comment 22 Richard W.M. Jones 2009-05-13 22:16:16 UTC
Hi Glauber ... Yes, this *doesn't* appear in upstream QEMU or KVM, which is what I generally
use to test / use libguestfs.

I use both qemu from git, and KVM from F-12 (eg. 2:0.10.50-3.kvm85).

[Although KVM from F-12 has another annoying boot-time bug (bug 500564).]

I understand that bisecting is time-consuming.  However maybe I can have a go
at it tomorrow.  If you ping me on IRC and talk me through it (I've never used
git-bisect before).

Comment 23 Fedora Update System 2009-05-14 02:56:34 UTC
qemu-0.10.4-3.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update qemu'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-4954

Comment 24 Richard W.M. Jones 2009-05-14 07:36:27 UTC
I'm assuming here that -3 will fail in the same way, so I'll
put the status back to ASSIGNED.

Comment 25 Mark McLoughlin 2009-05-14 11:06:39 UTC
0.10.4 was supposed to contain fixes for this DMA AIO cancellation stuff, but it turns only half of them were back-ported. I've submitted the rest of them to qemu-devel for the next stable release and cherry picked them into F-11. They fix the problem for me

* Thu May 14 2009 Mark McLoughlin <markmc> - 2:0.10.4-4
- Cherry pick more DMA AIO cancellation fixes from upstream (#497170)

Comment 26 Fedora Update System 2009-05-14 11:33:47 UTC
qemu-0.10.4-4.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/qemu-0.10.4-4.fc11

Comment 27 Richard W.M. Jones 2009-05-14 13:23:45 UTC
Mixed results with 0.10.4-4.

Good thing is that qemu-kvm doesn't crash.

Not so good is that it cannot mount any filesystem if
the block device is readonly, instead giving lengthy kernel
messages and eventually not being able to even read the
superblock.  (Note: this does work OK with the QEMU from
Rawhide).

Comment 28 Mark McLoughlin 2009-05-14 14:22:17 UTC
(In reply to comment #27)
> Mixed results with 0.10.4-4.
> 
> Good thing is that qemu-kvm doesn't crash.

Good. Please bump the update's karma

> Not so good is that it cannot mount any filesystem if
> the block device is readonly, instead giving lengthy kernel
> messages and eventually not being able to even read the
> superblock.  (Note: this does work OK with the QEMU from
> Rawhide).  

Hmm, it's working a bit better than that for me - if I boot a guest with a read-only image, the guest hangs late in boot for me, well after mounting filesystems.

Please file a new bug for this

Comment 29 Fedora Update System 2009-05-15 23:35:40 UTC
qemu-0.10.4-4.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update qemu'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-5050

Comment 30 Fedora Update System 2009-05-19 03:37:05 UTC
qemu-0.10.4-5.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/qemu-0.10.4-5.fc11

Comment 31 Fedora Update System 2009-05-20 00:52:52 UTC
qemu-0.10.4-4.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.