487720 – qemu-kvm segfaults on startup in SDL_memcpyMMX/SSE

Bug 487720 - qemu-kvm segfaults on startup in SDL_memcpyMMX/SSE

Summary: qemu-kvm segfaults on startup in SDL_memcpyMMX/SSE

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	SDL
Sub Component:
Version:	rawhide
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Thomas Woerner
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (4):	487018 491131 494146 494449 (view as bug list)
Depends On:
Blocks:	F11VirtBlocker
TreeView+	depends on / blocked

Reported:	2009-02-27 16:40 UTC by Jay Fenlason
Modified:	2014-08-31 23:29 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-04-08 09:54:18 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Proposed fix for the cpuid register clobbering problem (2.16 KB, patch) 2009-02-27 22:55 UTC, Eduardo Habkost	no flags	Details \| Diff
View All

Description Jay Fenlason 2009-02-27 16:40:15 UTC

Description of problem:
Attempting to run qemu-kvm on my AMD x86_64 box (fenlason-lab4.boston.devel.redhat.com) exits with a segfault immediately after opening a window.

Version-Release number of selected component (if applicable):
kvm-84-1.fc11.x86_64

How reproducible:
Always

Steps to Reproduce:
1.ssh -Y root.devel.redhat.com  (ask me for the root password if needed)
2. cd /local/home/rawhide/root
3. qemu-kvm obsd44.img  (or any other image)
  
Actual results:
New window opens, segfault

Expected results:
new window with running vm in it.

Additional info:

Comment 1 Eduardo Habkost 2009-02-27 20:01:16 UTC

It's crashing inside SDL. I don't see yet what is making 'src' point to a invalid address.

(gdb) bt full
#0  0x00000037cb217ef7 in SDL_memcpySSE () at src/video/SDL_blit.c:141
No locals.
#1  SDL_BlitCopy (info=<value optimized out>) at src/video/SDL_blit.c:172
        src = 0x4113e010 <Address 0x4113e010 out of bounds>
        dst = 0x1c77d90 ""
        w = <value optimized out>
        h = 15
        srcskip = <value optimized out>
        dstskip = <value optimized out>
#2  0x00000037cb217d3a in SDL_SoftBlit (src=0x1bea260, srcrect=<value optimized out>, dst=0x1b381b0, dstrect=0x7fff530b4820) at src/video/SDL_blit.c:97
        info = {s_pixels = 0x7f734113e010 "���", s_width = 720, s_height = 16, s_skip = 0, d_pixels = 0x1c77d90 "", d_width = 720, d_height = 16, d_skip = 0,
          aux_data = 0x0, src = 0x1bea2c0, table = 0x0, dst = 0x1bdfd70}
        okay = 1
        src_locked = <value optimized out>
        dst_locked = 1
#3  0x00000037cb22e0dc in SDL_LowerBlit (src=0x1bea260, srcrect=0x7fff530b47d0, dst=0xb40, dstrect=0xb40) at src/video/SDL_surface.c:440
        do_blit = 0x1c77d90
        hw_srcrect = {x = 2880, y = 0, w = 0, h = 0}
        hw_dstrect = {x = 720, y = 0, w = 0, h = 0}
#4  0x00000037cb22e2b7 in SDL_UpperBlit (src=0x1c77d90, srcrect=<value optimized out>, dst=0xb40, dstrect=0xb40) at src/video/SDL_surface.c:530
        sr = {x = 0, y = 0, w = 720, h = 16}
        fulldst = {x = 0, y = 0, w = 0, h = 0}
        srcx = 1
        srcy = 0
        w = 29851024
        h = <value optimized out>
#5  0x00000000004927cf in sdl_update (ds=<value optimized out>, x=0, y=0, w=720, h=<value optimized out>) at sdl.c:64
        rec = {x = 0, y = 0, w = 720, h = 16}
#6  0x0000000000000000 in ?? ()
No symbol table info available.

Comment 2 Eduardo Habkost 2009-02-27 21:28:36 UTC

Found the issue:

At this point, %rbx carries the 'src' value, that will be passed to SDL_memcpySSE(). After the call to SDL_HasSSE(), %rbx gets corrupted. I don't know if it is a gcc issue or an issue on some asm code inside SDL_HasSSE().
 
(gdb)
169             if(SDL_HasSSE())
3: /x $rbx = 0x7fffec3a2010
1: x/10i $rip
0x127e6a <SDL_BlitCopy+58>:     callq  0x118100 <SDL_HasSSE@plt>
0x127e6f <SDL_BlitCopy+63>:     test   %eax,%eax
0x127e71 <SDL_BlitCopy+65>:     mov    0x20(%rsp),%edx
0x127e75 <SDL_BlitCopy+69>:     je     0x127f74 <SDL_BlitCopy+324>
0x127e7b <SDL_BlitCopy+75>:     test   %ebp,%ebp
0x127e7d <SDL_BlitCopy+77>:     je     0x127f63 <SDL_BlitCopy+307>
0x127e83 <SDL_BlitCopy+83>:     lea    0x7(%rdx),%r8d
0x127e87 <SDL_BlitCopy+87>:     test   %edx,%edx
0x127e89 <SDL_BlitCopy+89>:     movslq 0x34(%rsp),%rcx
0x127e8e <SDL_BlitCopy+94>:     lea    -0x1(%rbp),%r14d
(gdb) fr
#0  SDL_BlitCopy (info=<value optimized out>) at src/video/SDL_blit.c:169
169             if(SDL_HasSSE())
(gdb) ni
0x0000000000127e6f      169             if(SDL_HasSSE())
3: /x $rbx = 0xec3a2010
1: x/10i $rip
0x127e6f <SDL_BlitCopy+63>:     test   %eax,%eax
0x127e71 <SDL_BlitCopy+65>:     mov    0x20(%rsp),%edx
0x127e75 <SDL_BlitCopy+69>:     je     0x127f74 <SDL_BlitCopy+324>
0x127e7b <SDL_BlitCopy+75>:     test   %ebp,%ebp
0x127e7d <SDL_BlitCopy+77>:     je     0x127f63 <SDL_BlitCopy+307>
0x127e83 <SDL_BlitCopy+83>:     lea    0x7(%rdx),%r8d
0x127e87 <SDL_BlitCopy+87>:     test   %edx,%edx
0x127e89 <SDL_BlitCopy+89>:     movslq 0x34(%rsp),%rcx
0x127e8e <SDL_BlitCopy+94>:     lea    -0x1(%rbp),%r14d
0x127e92 <SDL_memcpySSE>:       mov    %edx,%r9d


I am running SDL-1.2.13-7.fc11.x86_64.


Tip: to reproduce the bug more easily under gdb without getting KVM involved (sometimes the KVM-specific threads confuse gdb), you can reproduce the bug using:

$ dd if=/dev/zero of=/tmp/zero.img bs=1M count=20
$ qemu-kvm -no-kvm /tmp/zero.img

Comment 3 Eduardo Habkost 2009-02-27 22:13:25 UTC

This is where the problem happens:

static __inline__ int CPU_getCPUIDFeatures(void)
{
        int features = 0;
#if defined(__GNUC__) && ( defined(i386) || defined(__x86_64__) )
        __asm__ (
"        movl    %%ebx,%%edi\n"
"        xorl    %%eax,%%eax         # Set up for CPUID instruction    \n"
"        cpuid                       # Get and save vendor ID          \n"
"        cmpl    $1,%%eax            # Make sure 1 is valid input for CPUID\n"
"        jl      1f                  # We dont have the CPUID instruction\n"
"        xorl    %%eax,%%eax                                           \n"
"        incl    %%eax                                                 \n"
"        cpuid                       # Get family/model/stepping/features\n"
"        movl    %%edx,%0                                              \n"
"1:                                                                    \n"
"        movl    %%edi,%%ebx\n"
        : "=m" (features)
        :
        : "%eax", "%ecx", "%edx", "%edi"
        );
[...]

It only saves and restores the lower 32-bits of %rbx (%ebx).

Comment 4 Eduardo Habkost 2009-02-27 22:55:02 UTC

Created attachment 333551 [details]
Proposed fix for the cpuid register clobbering problem

Comment 5 Eduardo Habkost 2009-02-28 17:37:15 UTC

Patch posted upstream: http://lists.libsdl.org/pipermail/sdl-libsdl.org/2009-February/068912.html

Comment 6 Eduardo Habkost 2009-03-01 17:07:15 UTC

*** Bug 487018 has been marked as a duplicate of this bug. ***

Comment 7 Tom London 2009-03-01 18:08:31 UTC

I can confirm that the patch to SDL makes qemu-kvm "work for me".

Thanks.....

Comment 8 Mace Moneta 2009-03-11 18:03:21 UTC

The mailing list indicates that this has been fixed in the upstream SDL 1.3:

http://lists.libsdl.org/pipermail/sdl-libsdl.org/2009-March/068931.html

Comment 9 Eduardo Habkost 2009-03-11 18:22:55 UTC

Adding to F11Blocker.

Comment 10 Mark McLoughlin 2009-03-20 16:34:04 UTC

Nice catch Eduardo; moving to F11VirtBlocker

Comment 11 Mark McLoughlin 2009-03-20 17:50:30 UTC

*** Bug 491131 has been marked as a duplicate of this bug. ***

Comment 12 Joachim Frieben 2009-04-04 11:55:02 UTC

Issue also applies to the latest vintage package qemu-0.10-5.fc11.x86_64:

$ qemu -m 512 -boot d -cdrom ./jaunty-desktop-i386.iso -localtime -monitor stdio -no-kqemu

crashes, and 'dmesg' reports

qemu[4566]: segfault at 876f1010 ip 0000003fd0417f07 sp 00007fffb82f8590 error 4 in libSDL-1.2.so.0.11.2[3fd0400000+6b000]
qemu[4610]: segfault at 5ffdf010 ip 0000003fd0417f07 sp 00007fff90bc3c70 error 4 in libSDL-1.2.so.0.11.2[3fd0400000+6b000]

Comment 13 Mark McLoughlin 2009-04-06 09:23:09 UTC

*** Bug 494146 has been marked as a duplicate of this bug. ***

Comment 14 Mark McLoughlin 2009-04-06 09:28:38 UTC

*** Bug 494075 has been marked as a duplicate of this bug. ***

Comment 15 Thomas Woerner 2009-04-07 14:40:43 UTC

Is the fix in SDL-1.3 sufficient to fix this problem?

Comment 16 Eduardo Habkost 2009-04-07 14:48:38 UTC

(In reply to comment #15)
> Is the fix in SDL-1.3 sufficient to fix this problem?  

Yes. The code on SDL SVN trunk should fix the issue too, because it has a new #ifdef block that make it use %rbx/%rdi on x86_64.

Comment 17 Thomas Woerner 2009-04-07 15:47:50 UTC

Please have a look at SDL-1.2.3-9.fc11 in rawhide:

http://koji.fedoraproject.org/koji/taskinfo?taskID=1283010

Comment 18 Mark McLoughlin 2009-04-07 15:55:06 UTC

For easy reference, the patch is:

http://cvs.fedoraproject.org/viewvc/rpms/SDL/devel/SDL-1.2.13-rh487720.patch?revision=1.1&view=markup

Comment 19 Adam Goode 2009-04-07 15:58:14 UTC

ppc doesn't crash, it just produces this message and hangs:

invalid/unsupported opcode: 00 - 18 - 01 (00004070) 00000004 1
invalid/unsupported opcode: 00 - 04 - 17 (000095c8) 000095ec 0


I will test tonight with new SDL.

Comment 20 Eduardo Habkost 2009-04-07 16:06:41 UTC

(In reply to comment #19)
> ppc doesn't crash, it just produces this message and hangs:
> 
> invalid/unsupported opcode: 00 - 18 - 01 (00004070) 00000004 1
> invalid/unsupported opcode: 00 - 04 - 17 (000095c8) 000095ec 0

The bug being handled here is x86_64-specific. If you have an issue, it is a different problem.


> 
> 
> I will test tonight with new SDL.  

Additional testing never hurts, of course, but I doubt your ppc problem is related to SDL.

Comment 21 Adam Goode 2009-04-07 16:29:21 UTC

Yes, this is definitely possible. I will test tonight and possibly de-duplicate my ppc bug.

Comment 22 Jay Fenlason 2009-04-07 17:45:03 UTC

Using the SDL downloaded from koji, my reproducer no longer segfaults on startup.

Comment 23 Joachim Frieben 2009-04-07 17:52:07 UTC

I can confirm that SDL-1.2.13-9.fc11.x86_64 solves the issue.

Comment 24 Mace Moneta 2009-04-07 18:05:32 UTC

Confirmed here as well.

Comment 25 Tom London 2009-04-07 18:48:53 UTC

Confirmed here too....

Comment 26 Thomas Woerner 2009-04-08 09:54:18 UTC

Closing as RAWHIDE.

Comment 27 Zdenek Kabelac 2009-04-08 12:04:29 UTC

Hmm I've installed this koji version of SDL - so it's not crashing - but also not refreshing my SDL qemu screen - is this a new bug - or it's the result of using new SDL library with qemu code ?

Comment 28 Thomas Woerner 2009-04-08 14:50:34 UTC

*** Bug 494449 has been marked as a duplicate of this bug. ***

Comment 29 Ryan C. Gordon 2009-07-01 19:17:22 UTC

We just applied this patch to the upstream libsdl.org Subversion repository, thanks!

--ryan.

Note You need to log in before you can comment on or make changes to this bug.