This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1255144 - qemu-kvm Segmentation fault in ahci_write_fis_d2h ahci.c
qemu-kvm Segmentation fault in ahci_write_fis_d2h ahci.c
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: qemu (Show other bugs)
22
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Fedora Virtualization Maintainers
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-19 14:21 EDT by Dick Marinus
Modified: 2015-09-08 10:45 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-09-05 14:48:06 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
BZ1255144-gdb-bt-full.txt (2.35 KB, text/plain)
2015-09-01 15:06 EDT, Dick Marinus
no flags Details

  None (edit)
Description Dick Marinus 2015-08-19 14:21:34 EDT
Description of problem:


Version-Release number of selected component (if applicable):
qemu-kvm-2.3.1-1.fc22.x86_64


How reproducible:


Steps to Reproduce:
1. Install OS X Yosemite using these instructions http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/
2. Do some CPU intensive task like compressing a large file

Actual results:
qemu-kvm crashes with a segmentation failure

Additional info:

gdb backtrace:
Program received signal SIGSEGV, Segmentation fault.
ahci_write_fis_d2h (ad=0x55555794e8b0, cmd_fis=cmd_fis@entry=0x0) at hw/ide/ahci.c:728
728             uint64_t tbl_addr = le64_to_cpu(ad->cur_cmd->tbl_addr);


(gdb) bt
#0  0x00005555557969e9 in ahci_write_fis_d2h (ad=0x55555794e8b0, cmd_fis=cmd_fis@entry=0x0) at hw/ide/ahci.c:728
#1  0x0000555555796a4f in ahci_cmd_done (cmd_fis=0x0, ad=0x55555794e8b0) at hw/ide/ahci.c:1291
#2  0x0000555555796a4f in ahci_cmd_done (dma=0x55555794e8b0) at hw/ide/ahci.c:1284
#3  0x0000555555791437 in ide_flush_cb (s=0x55555794e968) at hw/ide/core.c:485
#4  0x0000555555791437 in ide_flush_cb (opaque=0x55555794e968, ret=<optimized out>) at hw/ide/core.c:924
#5  0x000055555585239e in bdrv_co_em_bh (opaque=0x7fffcc2f72d0) at block.c:4929
#6  0x000055555584c334 in aio_bh_poll (ctx=ctx@entry=0x5555562cd660) at async.c:85
#7  0x000055555585ad10 in aio_dispatch (ctx=0x5555562cd660) at aio-posix.c:137
#8  0x000055555584c1be in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>)
    at async.c:219
#9  0x00007ffff64c9a8a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#10 0x0000555555859943 in main_loop_wait () at main-loop.c:200
#11 0x0000555555859943 in main_loop_wait (timeout=<optimized out>) at main-loop.c:245
#12 0x0000555555859943 in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:494
#13 0x000055555560366b in main () at vl.c:1798
#14 0x000055555560366b in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4362
Comment 1 Cole Robinson 2015-08-31 16:59:47 EDT
jsnow, this look familiar at all?
Comment 2 John Snow 2015-08-31 18:30:08 EDT
It looks like ad->cur_cmd is possibly invalid, which was a case that occurred during migration, and should be disabled for 2.3.x. This is a variable that I suspect is prone to races and I tried to reduce its usage for 2.4.0, so newer versions might actually alleviate this crash.

It looks like the stack trace here has some optimizations compiled in, so it's strange to read. If it's at all possible, I recommend building QEMU from upstream source ( https://github.com/qemu/qemu/archive/v2.3.1.tar.gz ) and compiling with something like

# ./configure --target-list=x86_64-softmmu --enable-debug

Then you can use the qemu-system-x86_64 binary and attempt to reproduce the crash. Try "bt full" and "print ad->cur_cmd" in the frame where it faults.

You can also try upgrading to the 2.4 version, but I'm curious about the nature of the crash in the 2.3.1 version in case it is exposing a problem elsewhere in the code.

I'm pretty interested to see your results either way, since I don't have an OSX image to test with.
Comment 3 Dick Marinus 2015-09-01 15:06:25 EDT
Created attachment 1069114 [details]
BZ1255144-gdb-bt-full.txt

The backtrace completely changed but it is reproducable using qemu-2.3.1 compiled with:

'./configure' '--target-list=x86_64-softmmu' '--enable-debug' '--enable-kvm' '--enable-spice' '--prefix=/home/meeuw/git/qemu/'

print ad->cur_cmd obviously doesn't work for this backtrace.

I've tried qemu (git) master a few weeks ago and it didn't segfault but OS X crashed (freezed) anyway.
Comment 4 Dick Marinus 2015-09-05 14:48:44 EDT
This bug seems to be related to e1000-82545em and slirp. Sorry for the fuzz
Comment 5 John Snow 2015-09-08 10:45:35 EDT
(In reply to Dick Marinus from comment #4)
> This bug seems to be related to e1000-82545em and slirp. Sorry for the fuzz

I tried asking upstream, but the stacktrace didn't make a lot of sense to the SLIRP maintainers either.

If you run into it again, try "thread apply all bt full" to see if one of the other threads is the one actually segfaulting, so we can see what's really going on.

No worries on the noise.
--js

Note You need to log in before you can comment on or make changes to this bug.