Bug 487720 - qemu-kvm segfaults on startup in SDL_memcpyMMX/SSE
qemu-kvm segfaults on startup in SDL_memcpyMMX/SSE
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: SDL (Show other bugs)
rawhide
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: Thomas Woerner
Fedora Extras Quality Assurance
: Patch
: 487018 491131 494146 494449 (view as bug list)
Depends On:
Blocks: F11VirtBlocker
  Show dependency treegraph
 
Reported: 2009-02-27 11:40 EST by Jay Fenlason
Modified: 2014-08-31 19:29 EDT (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-08 05:54:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Proposed fix for the cpuid register clobbering problem (2.16 KB, patch)
2009-02-27 17:55 EST, Eduardo Habkost
no flags Details | Diff

  None (edit)
Description Jay Fenlason 2009-02-27 11:40:15 EST
Description of problem:
Attempting to run qemu-kvm on my AMD x86_64 box (fenlason-lab4.boston.devel.redhat.com) exits with a segfault immediately after opening a window.

Version-Release number of selected component (if applicable):
kvm-84-1.fc11.x86_64

How reproducible:
Always

Steps to Reproduce:
1.ssh -Y root@fenlason-lab4.boston.devel.redhat.com  (ask me for the root password if needed)
2. cd /local/home/rawhide/root
3. qemu-kvm obsd44.img  (or any other image)
  
Actual results:
New window opens, segfault

Expected results:
new window with running vm in it.

Additional info:
Comment 1 Eduardo Habkost 2009-02-27 15:01:16 EST
It's crashing inside SDL. I don't see yet what is making 'src' point to a invalid address.

(gdb) bt full
#0  0x00000037cb217ef7 in SDL_memcpySSE () at src/video/SDL_blit.c:141
No locals.
#1  SDL_BlitCopy (info=<value optimized out>) at src/video/SDL_blit.c:172
        src = 0x4113e010 <Address 0x4113e010 out of bounds>
        dst = 0x1c77d90 ""
        w = <value optimized out>
        h = 15
        srcskip = <value optimized out>
        dstskip = <value optimized out>
#2  0x00000037cb217d3a in SDL_SoftBlit (src=0x1bea260, srcrect=<value optimized out>, dst=0x1b381b0, dstrect=0x7fff530b4820) at src/video/SDL_blit.c:97
        info = {s_pixels = 0x7f734113e010 "���", s_width = 720, s_height = 16, s_skip = 0, d_pixels = 0x1c77d90 "", d_width = 720, d_height = 16, d_skip = 0,
          aux_data = 0x0, src = 0x1bea2c0, table = 0x0, dst = 0x1bdfd70}
        okay = 1
        src_locked = <value optimized out>
        dst_locked = 1
#3  0x00000037cb22e0dc in SDL_LowerBlit (src=0x1bea260, srcrect=0x7fff530b47d0, dst=0xb40, dstrect=0xb40) at src/video/SDL_surface.c:440
        do_blit = 0x1c77d90
        hw_srcrect = {x = 2880, y = 0, w = 0, h = 0}
        hw_dstrect = {x = 720, y = 0, w = 0, h = 0}
#4  0x00000037cb22e2b7 in SDL_UpperBlit (src=0x1c77d90, srcrect=<value optimized out>, dst=0xb40, dstrect=0xb40) at src/video/SDL_surface.c:530
        sr = {x = 0, y = 0, w = 720, h = 16}
        fulldst = {x = 0, y = 0, w = 0, h = 0}
        srcx = 1
        srcy = 0
        w = 29851024
        h = <value optimized out>
#5  0x00000000004927cf in sdl_update (ds=<value optimized out>, x=0, y=0, w=720, h=<value optimized out>) at sdl.c:64
        rec = {x = 0, y = 0, w = 720, h = 16}
#6  0x0000000000000000 in ?? ()
No symbol table info available.
Comment 2 Eduardo Habkost 2009-02-27 16:28:36 EST
Found the issue:

At this point, %rbx carries the 'src' value, that will be passed to SDL_memcpySSE(). After the call to SDL_HasSSE(), %rbx gets corrupted. I don't know if it is a gcc issue or an issue on some asm code inside SDL_HasSSE().
 
(gdb)
169             if(SDL_HasSSE())
3: /x $rbx = 0x7fffec3a2010
1: x/10i $rip
0x127e6a <SDL_BlitCopy+58>:     callq  0x118100 <SDL_HasSSE@plt>
0x127e6f <SDL_BlitCopy+63>:     test   %eax,%eax
0x127e71 <SDL_BlitCopy+65>:     mov    0x20(%rsp),%edx
0x127e75 <SDL_BlitCopy+69>:     je     0x127f74 <SDL_BlitCopy+324>
0x127e7b <SDL_BlitCopy+75>:     test   %ebp,%ebp
0x127e7d <SDL_BlitCopy+77>:     je     0x127f63 <SDL_BlitCopy+307>
0x127e83 <SDL_BlitCopy+83>:     lea    0x7(%rdx),%r8d
0x127e87 <SDL_BlitCopy+87>:     test   %edx,%edx
0x127e89 <SDL_BlitCopy+89>:     movslq 0x34(%rsp),%rcx
0x127e8e <SDL_BlitCopy+94>:     lea    -0x1(%rbp),%r14d
(gdb) fr
#0  SDL_BlitCopy (info=<value optimized out>) at src/video/SDL_blit.c:169
169             if(SDL_HasSSE())
(gdb) ni
0x0000000000127e6f      169             if(SDL_HasSSE())
3: /x $rbx = 0xec3a2010
1: x/10i $rip
0x127e6f <SDL_BlitCopy+63>:     test   %eax,%eax
0x127e71 <SDL_BlitCopy+65>:     mov    0x20(%rsp),%edx
0x127e75 <SDL_BlitCopy+69>:     je     0x127f74 <SDL_BlitCopy+324>
0x127e7b <SDL_BlitCopy+75>:     test   %ebp,%ebp
0x127e7d <SDL_BlitCopy+77>:     je     0x127f63 <SDL_BlitCopy+307>
0x127e83 <SDL_BlitCopy+83>:     lea    0x7(%rdx),%r8d
0x127e87 <SDL_BlitCopy+87>:     test   %edx,%edx
0x127e89 <SDL_BlitCopy+89>:     movslq 0x34(%rsp),%rcx
0x127e8e <SDL_BlitCopy+94>:     lea    -0x1(%rbp),%r14d
0x127e92 <SDL_memcpySSE>:       mov    %edx,%r9d


I am running SDL-1.2.13-7.fc11.x86_64.


Tip: to reproduce the bug more easily under gdb without getting KVM involved (sometimes the KVM-specific threads confuse gdb), you can reproduce the bug using:

$ dd if=/dev/zero of=/tmp/zero.img bs=1M count=20
$ qemu-kvm -no-kvm /tmp/zero.img
Comment 3 Eduardo Habkost 2009-02-27 17:13:25 EST
This is where the problem happens:

static __inline__ int CPU_getCPUIDFeatures(void)
{
        int features = 0;
#if defined(__GNUC__) && ( defined(i386) || defined(__x86_64__) )
        __asm__ (
"        movl    %%ebx,%%edi\n"
"        xorl    %%eax,%%eax         # Set up for CPUID instruction    \n"
"        cpuid                       # Get and save vendor ID          \n"
"        cmpl    $1,%%eax            # Make sure 1 is valid input for CPUID\n"
"        jl      1f                  # We dont have the CPUID instruction\n"
"        xorl    %%eax,%%eax                                           \n"
"        incl    %%eax                                                 \n"
"        cpuid                       # Get family/model/stepping/features\n"
"        movl    %%edx,%0                                              \n"
"1:                                                                    \n"
"        movl    %%edi,%%ebx\n"
        : "=m" (features)
        :
        : "%eax", "%ecx", "%edx", "%edi"
        );
[...]

It only saves and restores the lower 32-bits of %rbx (%ebx).
Comment 4 Eduardo Habkost 2009-02-27 17:55:02 EST
Created attachment 333551 [details]
Proposed fix for the cpuid register clobbering problem
Comment 5 Eduardo Habkost 2009-02-28 12:37:15 EST
Patch posted upstream: http://lists.libsdl.org/pipermail/sdl-libsdl.org/2009-February/068912.html
Comment 6 Eduardo Habkost 2009-03-01 12:07:15 EST
*** Bug 487018 has been marked as a duplicate of this bug. ***
Comment 7 Tom London 2009-03-01 13:08:31 EST
I can confirm that the patch to SDL makes qemu-kvm "work for me".

Thanks.....
Comment 8 Mace Moneta 2009-03-11 14:03:21 EDT
The mailing list indicates that this has been fixed in the upstream SDL 1.3:

http://lists.libsdl.org/pipermail/sdl-libsdl.org/2009-March/068931.html
Comment 9 Eduardo Habkost 2009-03-11 14:22:55 EDT
Adding to F11Blocker.
Comment 10 Mark McLoughlin 2009-03-20 12:34:04 EDT
Nice catch Eduardo; moving to F11VirtBlocker
Comment 11 Mark McLoughlin 2009-03-20 13:50:30 EDT
*** Bug 491131 has been marked as a duplicate of this bug. ***
Comment 12 Joachim Frieben 2009-04-04 07:55:02 EDT
Issue also applies to the latest vintage package qemu-0.10-5.fc11.x86_64:

$ qemu -m 512 -boot d -cdrom ./jaunty-desktop-i386.iso -localtime -monitor stdio -no-kqemu

crashes, and 'dmesg' reports

qemu[4566]: segfault at 876f1010 ip 0000003fd0417f07 sp 00007fffb82f8590 error 4 in libSDL-1.2.so.0.11.2[3fd0400000+6b000]
qemu[4610]: segfault at 5ffdf010 ip 0000003fd0417f07 sp 00007fff90bc3c70 error 4 in libSDL-1.2.so.0.11.2[3fd0400000+6b000]
Comment 13 Mark McLoughlin 2009-04-06 05:23:09 EDT
*** Bug 494146 has been marked as a duplicate of this bug. ***
Comment 14 Mark McLoughlin 2009-04-06 05:28:38 EDT
*** Bug 494075 has been marked as a duplicate of this bug. ***
Comment 15 Thomas Woerner 2009-04-07 10:40:43 EDT
Is the fix in SDL-1.3 sufficient to fix this problem?
Comment 16 Eduardo Habkost 2009-04-07 10:48:38 EDT
(In reply to comment #15)
> Is the fix in SDL-1.3 sufficient to fix this problem?  

Yes. The code on SDL SVN trunk should fix the issue too, because it has a new #ifdef block that make it use %rbx/%rdi on x86_64.
Comment 17 Thomas Woerner 2009-04-07 11:47:50 EDT
Please have a look at SDL-1.2.3-9.fc11 in rawhide:

http://koji.fedoraproject.org/koji/taskinfo?taskID=1283010
Comment 18 Mark McLoughlin 2009-04-07 11:55:06 EDT
For easy reference, the patch is:

http://cvs.fedoraproject.org/viewvc/rpms/SDL/devel/SDL-1.2.13-rh487720.patch?revision=1.1&view=markup
Comment 19 Adam Goode 2009-04-07 11:58:14 EDT
ppc doesn't crash, it just produces this message and hangs:

invalid/unsupported opcode: 00 - 18 - 01 (00004070) 00000004 1
invalid/unsupported opcode: 00 - 04 - 17 (000095c8) 000095ec 0


I will test tonight with new SDL.
Comment 20 Eduardo Habkost 2009-04-07 12:06:41 EDT
(In reply to comment #19)
> ppc doesn't crash, it just produces this message and hangs:
> 
> invalid/unsupported opcode: 00 - 18 - 01 (00004070) 00000004 1
> invalid/unsupported opcode: 00 - 04 - 17 (000095c8) 000095ec 0

The bug being handled here is x86_64-specific. If you have an issue, it is a different problem.


> 
> 
> I will test tonight with new SDL.  

Additional testing never hurts, of course, but I doubt your ppc problem is related to SDL.
Comment 21 Adam Goode 2009-04-07 12:29:21 EDT
Yes, this is definitely possible. I will test tonight and possibly de-duplicate my ppc bug.
Comment 22 Jay Fenlason 2009-04-07 13:45:03 EDT
Using the SDL downloaded from koji, my reproducer no longer segfaults on startup.
Comment 23 Joachim Frieben 2009-04-07 13:52:07 EDT
I can confirm that SDL-1.2.13-9.fc11.x86_64 solves the issue.
Comment 24 Mace Moneta 2009-04-07 14:05:32 EDT
Confirmed here as well.
Comment 25 Tom London 2009-04-07 14:48:53 EDT
Confirmed here too....
Comment 26 Thomas Woerner 2009-04-08 05:54:18 EDT
Closing as RAWHIDE.
Comment 27 Zdenek Kabelac 2009-04-08 08:04:29 EDT
Hmm I've installed this koji version of SDL - so it's not crashing - but also not refreshing my SDL qemu screen - is this a new bug - or it's the result of using new SDL library with qemu code ?
Comment 28 Thomas Woerner 2009-04-08 10:50:34 EDT
*** Bug 494449 has been marked as a duplicate of this bug. ***
Comment 29 Ryan C. Gordon 2009-07-01 15:17:22 EDT
We just applied this patch to the upstream libsdl.org Subversion repository, thanks!

--ryan.

Note You need to log in before you can comment on or make changes to this bug.