Red Hat Bugzilla – Bug 1255144
qemu-kvm Segmentation fault in ahci_write_fis_d2h ahci.c
Last modified: 2015-09-08 10:45:35 EDT
Description of problem:
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Install OS X Yosemite using these instructions http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/
2. Do some CPU intensive task like compressing a large file
qemu-kvm crashes with a segmentation failure
Program received signal SIGSEGV, Segmentation fault.
ahci_write_fis_d2h (ad=0x55555794e8b0, cmd_fis=cmd_fis@entry=0x0) at hw/ide/ahci.c:728
728 uint64_t tbl_addr = le64_to_cpu(ad->cur_cmd->tbl_addr);
#0 0x00005555557969e9 in ahci_write_fis_d2h (ad=0x55555794e8b0, cmd_fis=cmd_fis@entry=0x0) at hw/ide/ahci.c:728
#1 0x0000555555796a4f in ahci_cmd_done (cmd_fis=0x0, ad=0x55555794e8b0) at hw/ide/ahci.c:1291
#2 0x0000555555796a4f in ahci_cmd_done (dma=0x55555794e8b0) at hw/ide/ahci.c:1284
#3 0x0000555555791437 in ide_flush_cb (s=0x55555794e968) at hw/ide/core.c:485
#4 0x0000555555791437 in ide_flush_cb (opaque=0x55555794e968, ret=<optimized out>) at hw/ide/core.c:924
#5 0x000055555585239e in bdrv_co_em_bh (opaque=0x7fffcc2f72d0) at block.c:4929
#6 0x000055555584c334 in aio_bh_poll (ctx=ctx@entry=0x5555562cd660) at async.c:85
#7 0x000055555585ad10 in aio_dispatch (ctx=0x5555562cd660) at aio-posix.c:137
#8 0x000055555584c1be in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>)
#9 0x00007ffff64c9a8a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#10 0x0000555555859943 in main_loop_wait () at main-loop.c:200
#11 0x0000555555859943 in main_loop_wait (timeout=<optimized out>) at main-loop.c:245
#12 0x0000555555859943 in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:494
#13 0x000055555560366b in main () at vl.c:1798
#14 0x000055555560366b in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4362
jsnow, this look familiar at all?
It looks like ad->cur_cmd is possibly invalid, which was a case that occurred during migration, and should be disabled for 2.3.x. This is a variable that I suspect is prone to races and I tried to reduce its usage for 2.4.0, so newer versions might actually alleviate this crash.
It looks like the stack trace here has some optimizations compiled in, so it's strange to read. If it's at all possible, I recommend building QEMU from upstream source ( https://github.com/qemu/qemu/archive/v2.3.1.tar.gz ) and compiling with something like
# ./configure --target-list=x86_64-softmmu --enable-debug
Then you can use the qemu-system-x86_64 binary and attempt to reproduce the crash. Try "bt full" and "print ad->cur_cmd" in the frame where it faults.
You can also try upgrading to the 2.4 version, but I'm curious about the nature of the crash in the 2.3.1 version in case it is exposing a problem elsewhere in the code.
I'm pretty interested to see your results either way, since I don't have an OSX image to test with.
Created attachment 1069114 [details]
The backtrace completely changed but it is reproducable using qemu-2.3.1 compiled with:
'./configure' '--target-list=x86_64-softmmu' '--enable-debug' '--enable-kvm' '--enable-spice' '--prefix=/home/meeuw/git/qemu/'
print ad->cur_cmd obviously doesn't work for this backtrace.
I've tried qemu (git) master a few weeks ago and it didn't segfault but OS X crashed (freezed) anyway.
This bug seems to be related to e1000-82545em and slirp. Sorry for the fuzz
(In reply to Dick Marinus from comment #4)
> This bug seems to be related to e1000-82545em and slirp. Sorry for the fuzz
I tried asking upstream, but the stacktrace didn't make a lot of sense to the SLIRP maintainers either.
If you run into it again, try "thread apply all bt full" to see if one of the other threads is the one actually segfaulting, so we can see what's really going on.
No worries on the noise.