This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes

Bug 497170

Summary: qemu segfaults with read-only disk images
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: qemuAssignee: Glauber Costa <gcosta>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: rawhideCC: dwmw2, frank.arnold, gcosta, markmc, mgoldman, virt-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 0.10.4-4.fc11 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-19 20:53:12 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 480594    
Attachments:
Description Flags
readonly-disk-backtrace.txt none

Description Richard W.M. Jones 2009-04-22 12:24:25 EDT
Description of problem:

When qemu is given a read-only -drive parameter, then certain operations
on that drive (such as mounting partitions from the drive) eventually
cause qemu to print this message:

raw_aio_remove: aio request not found!

and segfault.

Making the image writable removes the error.  *However* note that
I want the drive to be read-only.

Version-Release number of selected component (if applicable):

qemu 0.10-12.fc11.x86_64

How reproducible:

Very reliably, particularly with a RHEL guest image.

Steps to Reproduce:
1. chmod -w RHEL.img
2. qemu -drive RHEL.img
3. try mounting a filesystem inside qemu
  
Actual results:

segfaults with error

Expected results:

should not segfault

Additional info:
Comment 1 Richard W.M. Jones 2009-04-22 13:05:45 EDT
Here is a realistic reproducer and stack trace:

$ gdb --args qemu-system-x86_64 -drive file=/dev/mapper/Guests-RHEL39FV32 -m 384 -kernel /usr/lib64/guestfs/vmlinuz.fedora-10.x86_64 -initrd /usr/lib64/guestfs/initramfs.fedora-10.x86_64.img -append "console=ttyS0" -nographic -serial stdio

bash-3.2# mount /dev/sda1 /
[Thread 0x7fa52eafa910 (LWP 31843) exited]
[New Thread 0x7fa52eafa910 (LWP 31845)]
[Thread 0x7fa52eafa910 (LWP 31845) exited]
raw_aio_remove: aio request not found!
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata1.00: cmd ca/00:02:4d:00:00/00:00:00:00:00/e0 tag 0 dma 1024 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: link is slow to respond, please be patient (ready=0)
ata1: device not ready (errno=-16), forcing hardreset
ata1: soft resetting link
ata1.00: configured for MWDMA2
ata1: EH complete
raw_aio_remove: aio request not found!
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata1.00: cmd ca/00:02:4d:00:00/00:00:00:00:00/e0 tag 0 dma 1024 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: link is slow to respond, please be patient (ready=0)
ata1: device not ready (errno=-16), forcing hardreset
ata1: soft resetting link
ata1.00: configured for MWDMA2
ata1: EH complete

Program received signal SIGSEGV, Segmentation fault.
0x000000000041a6b5 in qemu_paio_cancel (fd=<value optimized out>, 
    aiocb=0x21ef6f0) at posix-aio-compat.c:235
235	        TAILQ_REMOVE(&request_list, aiocb, node);
Missing separate debuginfos, use: debuginfo-install SDL-1.2.13-8.fc11.x86_64 gdbm-1.8.0-31.fc11.x86_64 libICE-1.0.4-7.fc11.x86_64 libSM-1.1.0-4.fc11.x86_64 libXext-1.0.99.1-2.fc11.x86_64 libXtst-1.0.3-5.fc11.x86_64 libasyncns-0.7-2.fc11.x86_64 libattr-2.4.43-3.fc11.x86_64 libcap-2.16-2.fc11.x86_64 pulseaudio-libs-0.9.15-3.test5.fc11.x86_64 tcp_wrappers-libs-7.6-54.fc11.x86_64
(gdb) bt
#0  0x000000000041a6b5 in qemu_paio_cancel (fd=<value optimized out>, 
    aiocb=0x21ef6f0) at posix-aio-compat.c:235
#1  0x000000000041b1a8 in raw_aio_cancel (blockacb=<value optimized out>)
    at block-raw-posix.c:682
#2  0x0000000000432930 in ide_dma_cancel (bm=0x22e5e60)
    at /usr/src/debug/qemu-kvm-0.10/qemu/hw/ide.c:2973
#3  0x0000000000432998 in bmdma_cmd_writeb (opaque=0x22e5e60, addr=0, val=0)
    at /usr/src/debug/qemu-kvm-0.10/qemu/hw/ide.c:2987
#4  0x00000000004074db in cpu_outb (env=0x21b0e80, addr=0, val=0)
    at /usr/src/debug/qemu-kvm-0.10/qemu/vl.c:453
#5  0x000000004271b632 in ?? ()
#6  0x0000000017897108 in ?? ()
#7  0xffffffff81221000 in ?? ()
#8  0x00000000021b0e80 in ?? ()
#9  0x00000000004be275 in phys_page_find (index=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10/qemu/exec.c:389
#10 tlb_set_page_exec (index=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10/qemu/exec.c:1983
#11 0x000000000052bfe3 in tlb_fill (addr=0, is_write=<value optimized out>, 
    mmu_idx=<value optimized out>, retaddr=0x0)
    at /usr/src/debug/qemu-kvm-0.10/qemu/target-i386/op_helper.c:4774
#12 0x00000000004c0c12 in __ldb_cmmu (addr=18446744071581080137, mmu_idx=0)
    at /usr/src/debug/qemu-kvm-0.10/qemu/softmmu_template.h:135
#13 0x00000000004c456b in cpu_x86_exec (env1=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10/qemu/cpu-exec.c:626
#14 0x000000000040ca4c in main_loop ()
    at /usr/src/debug/qemu-kvm-0.10/qemu/vl.c:3862
#15 main () at /usr/src/debug/qemu-kvm-0.10/qemu/vl.c:6126
Comment 2 Richard W.M. Jones 2009-04-22 14:14:47 EDT
Good news.

I have tested this with the latest qemu from svn (r7228), and the bug appears
to be fixed.

I have some more testing to do, and if that works I will close the bug.
Comment 3 Richard W.M. Jones 2009-04-22 14:34:45 EDT
Yes, this is looking good with qemu from svn.
Comment 4 Richard W.M. Jones 2009-04-23 09:01:03 EDT
Reopening and blocking F11VirtTarget.
Comment 5 Mark McLoughlin 2009-04-23 12:18:33 EDT
These patches are waiting to be pulled into the qemu stable branch and should fix it:

http://lists.gnu.org/archive/html/qemu-devel/2009-04/msg01276.html
Comment 6 Fedora Admin XMLRPC Client 2009-05-07 08:12:47 EDT
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.
Comment 7 Fedora Admin XMLRPC Client 2009-05-07 08:13:48 EDT
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.
Comment 8 Fedora Admin XMLRPC Client 2009-05-07 08:14:13 EDT
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.
Comment 9 Fedora Admin XMLRPC Client 2009-05-07 13:58:35 EDT
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.
Comment 10 Mark McLoughlin 2009-05-11 13:41:14 EDT
*** Bug 500185 has been marked as a duplicate of this bug. ***
Comment 11 Fedora Update System 2009-05-13 07:59:08 EDT
qemu-0.10.4-2.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/qemu-0.10.4-2.fc11
Comment 12 Mark McLoughlin 2009-05-13 08:07:50 EDT
Rich could you try qemu-0.10.4-2.fc11 and bump its karma if it fixes your problem?
Comment 13 Richard W.M. Jones 2009-05-13 10:11:46 EDT
Installing qemu-system-x86-0.10.4-2.fc11.x86_64.rpm gives:

Usage:  {start|stop|status|restart|condrestart}
warning: %post(qemu-system-x86-2:0.10.4-2.fc11.x86_64) scriptlet failed, exit status 1

rpm -V qemu-system-x86 gives no output which I assume means
that the failing script (/etc/sysconfig/modules/kvm.modules) is the
same as the one in the package.  I certainly haven't consciously
edited this file ever.
Comment 14 Richard W.M. Jones 2009-05-13 10:32:28 EDT
I'm afraid to say this new qemu doesn't solve this problem,
although it fails in a different way (it now such abruptly
segfaults when I do the same test).  Sorry :-(
Comment 15 Mark McLoughlin 2009-05-13 10:33:27 EDT
(In reply to comment #13)
> Installing qemu-system-x86-0.10.4-2.fc11.x86_64.rpm gives:
> 
> Usage:  {start|stop|status|restart|condrestart}
> warning: %post(qemu-system-x86-2:0.10.4-2.fc11.x86_64) scriptlet failed, exit
> status 1

Thanks Rich, I should have noticed that. Mixup between %{source1} and ${source2}

qemu-0.10.4-3.fc11 is coming:

https://koji.fedoraproject.org/koji/buildinfo?buildID=102024
Comment 16 Mark McLoughlin 2009-05-13 10:34:10 EDT
(In reply to comment #14)
> I'm afraid to say this new qemu doesn't solve this problem,
> although it fails in a different way (it now such abruptly
> segfaults when I do the same test).  Sorry :-( 

Thanks for trying; backtrace?
Comment 17 Fedora Update System 2009-05-13 11:07:04 EDT
qemu-0.10.4-3.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/qemu-0.10.4-3.fc11
Comment 18 Richard W.M. Jones 2009-05-13 11:32:26 EDT
Here's the command I'm using:

gdb --args /usr/bin/qemu-kvm -drive file=/dev/mapper/Guests-RHEL39FV32 -m 384 -no-reboot -kernel vmlinuz.rawhide.x86_64 -initrd initramfs.rawhide.x86_64.img -append 'panic=1 console=ttyS0 guestfs=10.0.2.4:6666 guestfs_verbose=1' -nographic -serial stdio -net channel,6666:unix:/tmp/sock,server,nowait -net user,vlan=0 -net nic,model=virtio,vlan=0

The initramfs in this case is modified so it gives me a shell inside the guest.  At the
shell I do:

mount /dev/sda1 /

(note that I only have *read* access, not write access, to the guest block device).

KVM segfaults about 10-20 seconds after the mount command.

(gdb) thread apply all bt

Thread 2 (Thread 0x7ff55eee0910 (LWP 3815)):
#0  0x000000000046cb43 in bdrv_aio_cancel (acb=0x1864010) at block.c:1471
#1  0x0000000000434140 in ide_dma_cancel (bm=0x1864e60)
    at /usr/src/debug/qemu-kvm-0.10.4/hw/ide.c:2973
#2  0x00000000004341a8 in bmdma_cmd_writeb (opaque=0x1864e60, addr=49152, 
    val=0) at /usr/src/debug/qemu-kvm-0.10.4/hw/ide.c:2987
#3  0x000000000051ed88 in kvm_outb (opaque=<value optimized out>, addr=49152, 
    data=0 '\0') at /usr/src/debug/qemu-kvm-0.10.4/qemu-kvm.c:684
#4  0x000000000054c249 in handle_io (vcpu=<value optimized out>, 
    run=<value optimized out>, kvm=<value optimized out>) at libkvm.c:735
#5  kvm_run (vcpu=<value optimized out>, run=<value optimized out>, 
    kvm=<value optimized out>) at libkvm.c:964
#6  0x000000000051f569 in kvm_cpu_exec (env=0x0)
    at /usr/src/debug/qemu-kvm-0.10.4/qemu-kvm.c:205
#7  0x000000000051f850 in kvm_main_loop_cpu (env=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10.4/qemu-kvm.c:414
#8  ap_main_loop (env=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10.4/qemu-kvm.c:451
#9  0x00000030fca0687a in start_thread () from /lib64/libpthread.so.0
#10 0x00000030fbee04cd in clone () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ff5784f3740 (LWP 3812)):
#0  0x00000030fbed9092 in select () from /lib64/libc.so.6
#1  0x0000000000409c33 in qemu_select (tv=<value optimized out>, 
    xfds=<value optimized out>, wfds=<value optimized out>, 
    rfds=<value optimized out>, max_fd=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10.4/vl.c:3669
#2  main_loop_wait (tv=<value optimized out>, xfds=<value optimized out>, 
    wfds=<value optimized out>, rfds=<value optimized out>, 
    max_fd=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.4/vl.c:3768
#3  0x000000000051f02a in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.10.4/qemu-kvm.c:596
#4  0x000000000040e9c4 in main_loop ()
    at /usr/src/debug/qemu-kvm-0.10.4/vl.c:3831
#5  main () at /usr/src/debug/qemu-kvm-0.10.4/vl.c:6127
Comment 19 Mark McLoughlin 2009-05-13 11:36:27 EDT
Created attachment 343792 [details]
readonly-disk-backtrace.txt

stack trace obtained with a build of qemu-kvm-0.10.4 from git
Comment 20 Mark McLoughlin 2009-05-13 11:40:34 EDT
(gdb) p acb
$1 = (BlockDriverAIOCB *) 0xcb3010
(gdb) p *acb
$2 = {pool = 0x280000570108086, bs = 0x400001018000, cb = 0, opaque = 0x0, 
  next = 0xc001}
(gdb) p *acb->pool
Cannot access memory at address 0x280000570108086
Comment 21 Glauber Costa 2009-05-13 16:46:48 EDT
Rich,

you mentioned that this does not appear upstream.
Is that in qemu upstream, or qemu-kvm?
If this is qemu, this might be a kvm specific problem.

Otherwise, do you think you can identify the commit that causes the problem?

If it gives you too much of a pain, don't bother. bisecting qemu is a PITA ;-(
Comment 22 Richard W.M. Jones 2009-05-13 18:16:16 EDT
Hi Glauber ... Yes, this *doesn't* appear in upstream QEMU or KVM, which is what I generally
use to test / use libguestfs.

I use both qemu from git, and KVM from F-12 (eg. 2:0.10.50-3.kvm85).

[Although KVM from F-12 has another annoying boot-time bug (bug 500564).]

I understand that bisecting is time-consuming.  However maybe I can have a go
at it tomorrow.  If you ping me on IRC and talk me through it (I've never used
git-bisect before).
Comment 23 Fedora Update System 2009-05-13 22:56:34 EDT
qemu-0.10.4-3.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update qemu'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-4954
Comment 24 Richard W.M. Jones 2009-05-14 03:36:27 EDT
I'm assuming here that -3 will fail in the same way, so I'll
put the status back to ASSIGNED.
Comment 25 Mark McLoughlin 2009-05-14 07:06:39 EDT
0.10.4 was supposed to contain fixes for this DMA AIO cancellation stuff, but it turns only half of them were back-ported. I've submitted the rest of them to qemu-devel for the next stable release and cherry picked them into F-11. They fix the problem for me

* Thu May 14 2009 Mark McLoughlin <markmc@redhat.com> - 2:0.10.4-4
- Cherry pick more DMA AIO cancellation fixes from upstream (#497170)
Comment 26 Fedora Update System 2009-05-14 07:33:47 EDT
qemu-0.10.4-4.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/qemu-0.10.4-4.fc11
Comment 27 Richard W.M. Jones 2009-05-14 09:23:45 EDT
Mixed results with 0.10.4-4.

Good thing is that qemu-kvm doesn't crash.

Not so good is that it cannot mount any filesystem if
the block device is readonly, instead giving lengthy kernel
messages and eventually not being able to even read the
superblock.  (Note: this does work OK with the QEMU from
Rawhide).
Comment 28 Mark McLoughlin 2009-05-14 10:22:17 EDT
(In reply to comment #27)
> Mixed results with 0.10.4-4.
> 
> Good thing is that qemu-kvm doesn't crash.

Good. Please bump the update's karma

> Not so good is that it cannot mount any filesystem if
> the block device is readonly, instead giving lengthy kernel
> messages and eventually not being able to even read the
> superblock.  (Note: this does work OK with the QEMU from
> Rawhide).  

Hmm, it's working a bit better than that for me - if I boot a guest with a read-only image, the guest hangs late in boot for me, well after mounting filesystems.

Please file a new bug for this
Comment 29 Fedora Update System 2009-05-15 19:35:40 EDT
qemu-0.10.4-4.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update qemu'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-5050
Comment 30 Fedora Update System 2009-05-18 23:37:05 EDT
qemu-0.10.4-5.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/qemu-0.10.4-5.fc11
Comment 31 Fedora Update System 2009-05-19 20:52:52 EDT
qemu-0.10.4-4.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.