Description of problem: Attempting to run qemu-kvm on my AMD x86_64 box (fenlason-lab4.boston.devel.redhat.com) exits with a segfault immediately after opening a window. Version-Release number of selected component (if applicable): kvm-84-1.fc11.x86_64 How reproducible: Always Steps to Reproduce: 1.ssh -Y root.devel.redhat.com (ask me for the root password if needed) 2. cd /local/home/rawhide/root 3. qemu-kvm obsd44.img (or any other image) Actual results: New window opens, segfault Expected results: new window with running vm in it. Additional info:
It's crashing inside SDL. I don't see yet what is making 'src' point to a invalid address. (gdb) bt full #0 0x00000037cb217ef7 in SDL_memcpySSE () at src/video/SDL_blit.c:141 No locals. #1 SDL_BlitCopy (info=<value optimized out>) at src/video/SDL_blit.c:172 src = 0x4113e010 <Address 0x4113e010 out of bounds> dst = 0x1c77d90 "" w = <value optimized out> h = 15 srcskip = <value optimized out> dstskip = <value optimized out> #2 0x00000037cb217d3a in SDL_SoftBlit (src=0x1bea260, srcrect=<value optimized out>, dst=0x1b381b0, dstrect=0x7fff530b4820) at src/video/SDL_blit.c:97 info = {s_pixels = 0x7f734113e010 "���", s_width = 720, s_height = 16, s_skip = 0, d_pixels = 0x1c77d90 "", d_width = 720, d_height = 16, d_skip = 0, aux_data = 0x0, src = 0x1bea2c0, table = 0x0, dst = 0x1bdfd70} okay = 1 src_locked = <value optimized out> dst_locked = 1 #3 0x00000037cb22e0dc in SDL_LowerBlit (src=0x1bea260, srcrect=0x7fff530b47d0, dst=0xb40, dstrect=0xb40) at src/video/SDL_surface.c:440 do_blit = 0x1c77d90 hw_srcrect = {x = 2880, y = 0, w = 0, h = 0} hw_dstrect = {x = 720, y = 0, w = 0, h = 0} #4 0x00000037cb22e2b7 in SDL_UpperBlit (src=0x1c77d90, srcrect=<value optimized out>, dst=0xb40, dstrect=0xb40) at src/video/SDL_surface.c:530 sr = {x = 0, y = 0, w = 720, h = 16} fulldst = {x = 0, y = 0, w = 0, h = 0} srcx = 1 srcy = 0 w = 29851024 h = <value optimized out> #5 0x00000000004927cf in sdl_update (ds=<value optimized out>, x=0, y=0, w=720, h=<value optimized out>) at sdl.c:64 rec = {x = 0, y = 0, w = 720, h = 16} #6 0x0000000000000000 in ?? () No symbol table info available.
Found the issue: At this point, %rbx carries the 'src' value, that will be passed to SDL_memcpySSE(). After the call to SDL_HasSSE(), %rbx gets corrupted. I don't know if it is a gcc issue or an issue on some asm code inside SDL_HasSSE(). (gdb) 169 if(SDL_HasSSE()) 3: /x $rbx = 0x7fffec3a2010 1: x/10i $rip 0x127e6a <SDL_BlitCopy+58>: callq 0x118100 <SDL_HasSSE@plt> 0x127e6f <SDL_BlitCopy+63>: test %eax,%eax 0x127e71 <SDL_BlitCopy+65>: mov 0x20(%rsp),%edx 0x127e75 <SDL_BlitCopy+69>: je 0x127f74 <SDL_BlitCopy+324> 0x127e7b <SDL_BlitCopy+75>: test %ebp,%ebp 0x127e7d <SDL_BlitCopy+77>: je 0x127f63 <SDL_BlitCopy+307> 0x127e83 <SDL_BlitCopy+83>: lea 0x7(%rdx),%r8d 0x127e87 <SDL_BlitCopy+87>: test %edx,%edx 0x127e89 <SDL_BlitCopy+89>: movslq 0x34(%rsp),%rcx 0x127e8e <SDL_BlitCopy+94>: lea -0x1(%rbp),%r14d (gdb) fr #0 SDL_BlitCopy (info=<value optimized out>) at src/video/SDL_blit.c:169 169 if(SDL_HasSSE()) (gdb) ni 0x0000000000127e6f 169 if(SDL_HasSSE()) 3: /x $rbx = 0xec3a2010 1: x/10i $rip 0x127e6f <SDL_BlitCopy+63>: test %eax,%eax 0x127e71 <SDL_BlitCopy+65>: mov 0x20(%rsp),%edx 0x127e75 <SDL_BlitCopy+69>: je 0x127f74 <SDL_BlitCopy+324> 0x127e7b <SDL_BlitCopy+75>: test %ebp,%ebp 0x127e7d <SDL_BlitCopy+77>: je 0x127f63 <SDL_BlitCopy+307> 0x127e83 <SDL_BlitCopy+83>: lea 0x7(%rdx),%r8d 0x127e87 <SDL_BlitCopy+87>: test %edx,%edx 0x127e89 <SDL_BlitCopy+89>: movslq 0x34(%rsp),%rcx 0x127e8e <SDL_BlitCopy+94>: lea -0x1(%rbp),%r14d 0x127e92 <SDL_memcpySSE>: mov %edx,%r9d I am running SDL-1.2.13-7.fc11.x86_64. Tip: to reproduce the bug more easily under gdb without getting KVM involved (sometimes the KVM-specific threads confuse gdb), you can reproduce the bug using: $ dd if=/dev/zero of=/tmp/zero.img bs=1M count=20 $ qemu-kvm -no-kvm /tmp/zero.img
This is where the problem happens: static __inline__ int CPU_getCPUIDFeatures(void) { int features = 0; #if defined(__GNUC__) && ( defined(i386) || defined(__x86_64__) ) __asm__ ( " movl %%ebx,%%edi\n" " xorl %%eax,%%eax # Set up for CPUID instruction \n" " cpuid # Get and save vendor ID \n" " cmpl $1,%%eax # Make sure 1 is valid input for CPUID\n" " jl 1f # We dont have the CPUID instruction\n" " xorl %%eax,%%eax \n" " incl %%eax \n" " cpuid # Get family/model/stepping/features\n" " movl %%edx,%0 \n" "1: \n" " movl %%edi,%%ebx\n" : "=m" (features) : : "%eax", "%ecx", "%edx", "%edi" ); [...] It only saves and restores the lower 32-bits of %rbx (%ebx).
Created attachment 333551 [details] Proposed fix for the cpuid register clobbering problem
Patch posted upstream: http://lists.libsdl.org/pipermail/sdl-libsdl.org/2009-February/068912.html
*** Bug 487018 has been marked as a duplicate of this bug. ***
I can confirm that the patch to SDL makes qemu-kvm "work for me". Thanks.....
The mailing list indicates that this has been fixed in the upstream SDL 1.3: http://lists.libsdl.org/pipermail/sdl-libsdl.org/2009-March/068931.html
Adding to F11Blocker.
Nice catch Eduardo; moving to F11VirtBlocker
*** Bug 491131 has been marked as a duplicate of this bug. ***
Issue also applies to the latest vintage package qemu-0.10-5.fc11.x86_64: $ qemu -m 512 -boot d -cdrom ./jaunty-desktop-i386.iso -localtime -monitor stdio -no-kqemu crashes, and 'dmesg' reports qemu[4566]: segfault at 876f1010 ip 0000003fd0417f07 sp 00007fffb82f8590 error 4 in libSDL-1.2.so.0.11.2[3fd0400000+6b000] qemu[4610]: segfault at 5ffdf010 ip 0000003fd0417f07 sp 00007fff90bc3c70 error 4 in libSDL-1.2.so.0.11.2[3fd0400000+6b000]
*** Bug 494146 has been marked as a duplicate of this bug. ***
*** Bug 494075 has been marked as a duplicate of this bug. ***
Is the fix in SDL-1.3 sufficient to fix this problem?
(In reply to comment #15) > Is the fix in SDL-1.3 sufficient to fix this problem? Yes. The code on SDL SVN trunk should fix the issue too, because it has a new #ifdef block that make it use %rbx/%rdi on x86_64.
Please have a look at SDL-1.2.3-9.fc11 in rawhide: http://koji.fedoraproject.org/koji/taskinfo?taskID=1283010
For easy reference, the patch is: http://cvs.fedoraproject.org/viewvc/rpms/SDL/devel/SDL-1.2.13-rh487720.patch?revision=1.1&view=markup
ppc doesn't crash, it just produces this message and hangs: invalid/unsupported opcode: 00 - 18 - 01 (00004070) 00000004 1 invalid/unsupported opcode: 00 - 04 - 17 (000095c8) 000095ec 0 I will test tonight with new SDL.
(In reply to comment #19) > ppc doesn't crash, it just produces this message and hangs: > > invalid/unsupported opcode: 00 - 18 - 01 (00004070) 00000004 1 > invalid/unsupported opcode: 00 - 04 - 17 (000095c8) 000095ec 0 The bug being handled here is x86_64-specific. If you have an issue, it is a different problem. > > > I will test tonight with new SDL. Additional testing never hurts, of course, but I doubt your ppc problem is related to SDL.
Yes, this is definitely possible. I will test tonight and possibly de-duplicate my ppc bug.
Using the SDL downloaded from koji, my reproducer no longer segfaults on startup.
I can confirm that SDL-1.2.13-9.fc11.x86_64 solves the issue.
Confirmed here as well.
Confirmed here too....
Closing as RAWHIDE.
Hmm I've installed this koji version of SDL - so it's not crashing - but also not refreshing my SDL qemu screen - is this a new bug - or it's the result of using new SDL library with qemu code ?
*** Bug 494449 has been marked as a duplicate of this bug. ***
We just applied this patch to the upstream libsdl.org Subversion repository, thanks! --ryan.