+++ This bug was initially created as a clone of Bug #210422 +++ Description of problem: Started running on kernel-xen (in Domain-0) and QEMU no longer works. No kqemu used, qemu runs fully as a non-privileged user, just completely regular process. qemu ran in XEN domain on the same host with kernel-2.6.16 built from linux-2.6-xen.hg works. Both Domain-0 and the XEN domain run RawHide.i386. Version-Release number of selected component (if applicable): kernel-xen-2.6.18-1.2747.fc6.i686 xen-3.0.2-44.i386 qemu-0.8.2-3.fc6.i386 SDL-1.2.10-6.2.i386 alsa-lib-1.0.12-2.fc6.i386 glibc-2.5-3.i686 libX11-1.0.3-4.fc6.i386 libXau-1.0.1-3.1.i386 libXcursor-1.1.7-1.1.i386 libXdmcp-1.0.1-2.1.i386 libXext-1.0.1-2.1.i386 libXfixes-4.0.1-2.1.i386 libXrandr-1.1.1-3.1.i386 libXrender-0.9.1-3.1.i386 How reproducible: Always. Steps to Reproduce: 1. qemu -cdrom /dev/zero -net none -m 1 Actual results: Could not open '/dev/kqemu' - QEMU acceleration layer not activated [segv] Expected results: Could not open '/dev/kqemu' - QEMU acceleration layer not activated [displayed window containing Bochs BIOS screen with failed boot] Additional info: core file etc. upon request but you should easily reproduce it yourself. Not fully certain it is XEN specific but I use QEMU pretty often and it worked last time on non-XEN kernel. Program terminated with signal 11, Segmentation fault. #0 cpu_x86_exec (env1=0x9d70998) at /usr/src/debug/qemu-0.8.2/cpu-exec.c:772 b772 gen_func(); (gdb) bt #0 cpu_x86_exec (env1=0x9d70998) at /usr/src/debug/qemu-0.8.2/cpu-exec.c:772 #1 0x08050968 in main_loop () at /usr/src/debug/qemu-0.8.2/vl.c:5069 #2 0x08051de3 in main (argc=1536, argv=0x0) at /usr/src/debug/qemu-0.8.2/vl.c:6221 Previous frame inner to this frame (corrupt stack?) -- Additional comment from srostedt on 2006-10-16 21:55 EST -- I just tried this with kernel-xen-2.6.18-1.2784.fc6 xen-3.0.2-44 qemu-0.8.2-3.fc6 And it worked for me. Could you verify that the latest kernel-xen fixes this problem? -- Additional comment from jkratoch on 2006-10-17 14:09 EST -- Created an attachment (id=138700) qemu -cdrom /dev/zero -net none -m 1 kernel-xen-2.6.18-1.2798.fc6.i686 xen-3.0.2-45.el5.i386 qemu-0.8.2-3.fc6.i386 It is sad you could not reproduce it. Really running i386 (32-bit)? -- Additional comment from jkratoch on 2006-10-19 14:00 EST -- It is workaroundable by echo 0 >/proc/sys/kernel/exec-shield (still on that kernel-xen-2.6.18-1.2798.fc6.i686) as suggested by Caolan McNamara in Bug 210748. Still not aware of the specific cause but I assume you already know. -- Additional comment from srostedt on 2006-10-20 21:55 EST -- No I didn't notice that this was for i386 only. You did mention that you were using that, but I wasn't. So I was able to get it to seg fault. OK, now that I have something that doesn't work, I can take a closer look at it. I also switched this BZ to state that this is not for all hardware, but for i686. -- Additional comment from srostedt on 2006-10-24 12:19 EST -- The fix for bz 200382 seems to have caused this bug. Will look into it further. -- Additional comment from srostedt on 2006-10-25 10:26 EST -- OK, I've confirmed that the fix for 200382 caused this problem. I have a patch that has already been submitted to the maintainers. But I must first confirm that the patch doesn't break 200382 before I close this.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion.
QE ack for RHEL5B2.
*** Bug 212588 has been marked as a duplicate of this bug. ***
OK, after a lot of testing last night, I've got some results: First of all, I can no longer reproduce the bug reported in #200382. Even using that same pre-fc6 rawhide kernel, with a recompiled hypervisor to get past the flood of timer messages that the original 1.2439 kernel+HV swamps the console with, everything works fine. I really don't know what precisely was the trigger for hitting the exec_limit:=~0UL path, but I can't reproduce it now, although I can definitely trigger the execshield GPF path in general. But I can force that path to be taken by setting limit to -1 by force when this path is taken. Doing so, with the #200382 patch otherwise completely reverted, I can reproduce exactly the GPF exec_limit=0xffffffff fixups that used to cause problems: #GPF fixup (0[seg:0]) at 00110918, CPU#1. exec_limit: ffffffff, user_cs: 0000ffff/00cffb00, CPU_cs: 000004f4/00c0fb00. and this succeeds just fine, on a RHEL-5 kernel+hypervisor. Furthermore, on the exact GPF infinite loops that we were getting before: #GPF fixup (0[seg:0]) at 080c76e1, CPU#0. exec_limit: ffffffff, user_cs: 0000ffff/00cffb00, CPU_cs: 000067ff/00cffb00. we were trying to change the limit to 0xfffff from 0xf67ff, with all other bits of current and intended CS the same; so even the new proposed patch to test only the limit bits wouldn't actually make the slightest difference to the problem that was happening in bug 200382. In short, I think we need to simply remove the linux-2.6-xen-execshield-lazy-exec-limit.patch entirely --- the fixed form cannot possibly fix the problem that it was initially generated for, and I have tested that the unpatched RHEL-5 kernel is just as effective as the "fixed" one at not crashing qemu/mono etc.
Please change bugzilla status to POST once the removal of linux-2.6-xen-execshield-lazy-exec-limit.patch is posted to rhkernel-list.
in kernel-2.6.18-1.2744.el5