Version-Release number of selected component: gnome-shell-3.11.2-3.fc21 Additional info: reporter: libreport-2.1.9 backtrace_rating: 4 cmdline: /usr/bin/gnome-shell crash_function: js::Shape::finalize executable: /usr/bin/gnome-shell kernel: 3.13.0-0.rc1.git0.1.fc21.x86_64 runlevel: N 5 type: CCpp uid: 1000 Truncated backtrace: Thread no. 1 (10 frames) #0 js::Shape::finalize at /usr/src/debug/mozjs17.0.0/js/src/jspropertytree.cpp:210 #1 finalize<js::Shape> at /usr/src/debug/mozjs17.0.0/js/src/jsgc.cpp:355 #2 FinalizeTypedArenas<js::Shape> at /usr/src/debug/mozjs17.0.0/js/src/jsgc.cpp:419 #3 js::gc::FinalizeArenas at /usr/src/debug/mozjs17.0.0/js/src/jsgc.cpp:460 #4 foregroundFinalize at /usr/src/debug/mozjs17.0.0/js/src/jsgc.cpp:3803 #5 SweepPhase at /usr/src/debug/mozjs17.0.0/js/src/jsgc.cpp:3823 #6 IncrementalCollectSlice at /usr/src/debug/mozjs17.0.0/js/src/jsgc.cpp:4245 #7 GCCycle at /usr/src/debug/mozjs17.0.0/js/src/jsgc.cpp:4408 #8 Collect at /usr/src/debug/mozjs17.0.0/js/src/jsgc.cpp:4516 #9 js_InvokeOperationCallback at /usr/src/debug/mozjs17.0.0/js/src/jscntxt.cpp:1028 Potential duplicate: bug 1028813
Created attachment 828936 [details] File: backtrace
Created attachment 828937 [details] File: cgroup
Created attachment 828938 [details] File: core_backtrace
Created attachment 828939 [details] File: dso_list
Created attachment 828940 [details] File: environ
Created attachment 828941 [details] File: exploitable
Created attachment 828944 [details] File: limits
Created attachment 828946 [details] File: maps
Created attachment 828948 [details] File: open_fds
Created attachment 828950 [details] File: proc_pid_status
Created attachment 828952 [details] File: var_log_messages
This seems to happen quite randomly ... the only thing I can say is that it happens more often when under high CPU load or when many windows are open. reporter: libreport-2.1.9 backtrace_rating: 4 cmdline: /usr/bin/gnome-shell crash_function: js::Shape::finalize executable: /usr/bin/gnome-shell kernel: 3.12.3-1.fc21.x86_64 package: gnome-shell-3.11.2-3.fc21 reason: Process /usr/bin/gnome-shell was killed by signal 11 (SIGSEGV) runlevel: N 5 type: CCpp uid: 1000
Another user experienced a similar problem: Just switching windows from xchat to abrt (while it was busy reporting the *last* time Shell crashed :>) reporter: libreport-2.1.10 backtrace_rating: 4 cmdline: /usr/bin/gnome-shell crash_function: js::Shape::finalize executable: /usr/bin/gnome-shell kernel: 3.13.0-0.rc4.git1.1.fc21.x86_64 package: gnome-shell-3.11.2-3.fc21 reason: gnome-shell killed by SIGSEGV runlevel: N 5 type: CCpp uid: 1001
Another user experienced a similar problem: Just happened during regular use of the desktop, no specific trigger. reporter: libreport-2.1.10 backtrace_rating: 4 cmdline: /usr/bin/gnome-shell crash_function: js::Shape::finalize executable: /usr/bin/gnome-shell kernel: 3.13.0-0.rc7.git1.1.fc21.x86_64 package: gnome-shell-3.11.3-1.fc21 reason: gnome-shell killed by SIGSEGV runlevel: N 5 type: CCpp uid: 1001
*** Bug 1034468 has been marked as a duplicate of this bug. ***
*** Bug 1037692 has been marked as a duplicate of this bug. ***
*** Bug 1035285 has been marked as a duplicate of this bug. ***
<owen> adamw: the js crashes are some sort of memory corruption issue, and aren't going to be debuggable without a high quality valgrind log or a triggerable-at-will reproducer in the hands of a developer <owen> adamw: That is, the backtrace at time of crash is unlikely to provide a useful clue
So, another attempt to pin down a common thread between people hitting this: what graphics adapter does everyone have?
VGA compatible controller: ATI Technologies Inc RV770 [Radeon HD 4850] (prog-if 00 [VGA controller]) Subsystem: PC Partner Limited Sapphire HD 4850 512MB GDDR3 PCI-E Dual Slot Fansink
00:02.0 VGA compatible controller [0300]: Intel Corporation 3rd Gen Core processor Graphics Controller [8086:0166] (rev 09) Subsystem: Lenovo Device [17aa:21f9] Kernel driver in use: i915
and I have an NVIDIA, so *that* ain't it either. Owen, could you perhaps give us a bit more detail on exactly what getting a 'high quality valgrind log' would entail? I'm willing to sit in front of a slow-as-molasses Shell for a while to try and get one, but I don't want to suffer through that if there's a danger the result would be useless :) https://wiki.gnome.org/Valgrind seems mostly focused on debugging memory leaks, not corruption - would one of the invocations there be appropriate for this case, or should we use something different? Thanks!
Btw, does this reproduce at all when there is *not* high system load? Seems was triggered by a yum upgrade for me today. High system load would sound like a plausible reason to reveal race conditions.
In my case gnome-shell sometimes killed by SIGSEGV in js::Shape::finalize while computer is in idle state.
Adam, would this help? https://wiki.gnome.org/Projects/GnomeShell/Debugging
seppo: yeah, happens to me all the time on something as simple as an alt-tab when the system is otherwise idle. mathieu: I'm not sure that's specific enough to this case. I asked the devs on IRC yesterday and they told me to ask the mozjs devs, so I'll do that, just haven't got around to it yet.
so I'm working on this now, but in case I get distracted, here's what I got from the devs: <jimb> adamw: You need to build SpiderMonkey with --enable-valgrind <jimb> adamw: https://developer.mozilla.org/en-US/docs/Debugging_Mozilla_with_Valgrind so, the plan is to build mozjs24 with --enable-valgrind - I don't think I then need to rebuild gjs or gnome-shell - then try and get a log here. My scratch build of mozjs24 is at http://koji.fedoraproject.org/koji/taskinfo?taskID=6503082 .
http://koji.fedoraproject.org/koji/taskinfo?taskID=6503123 is a scratch build which is actually *working*.
well, I installed that mozjs24 plus the gjs and gnome-shell debuginfo packages and tried: env G_SLICE=always-malloc valgrind --tool=memcheck --leak-check=full --leak-resolution=high --num-callers=20 --log-file=/home/adamw/gnome-shell_valgrind.log --smc-check=all-non-file gnome-shell --replace but that doesn't seem to launch gnome-shell fully, the log cuts off early and shell never seems to start. anyone else have any luck?
Created attachment 862340 [details] valgrind log on gnome-shell startup Insanely slow. When waiting patiently for a while there's a couple of memory errors in valgrind. I'd expect the Gnome-shell would eventually have come up but running this without a really really powerful machine with a lot of memory isn't a good idea. My laptop with 4G of memory definitely fell short. HTH ps. Needed to install also cogl and clutter debuginfos to full information in valgrind
Finally managed to catch a crash in valgrind. Ran: env G_SLICE=always-malloc valgrind --tool=memcheck --leak-check=full --leak-resolution=high --num-callers=12 --log-file=/home/adamw/gnome-shell_valgrind.log --smc-check=all-non-file gnome-shell --replace note, 12 callers not 20. With 12 callers it runs just about bearably, with 16GB of RAM. I just hope the crash was the JS one. The valgrind log is here: https://www.happyassassin.net/temp/gnome-shell_valgrind.log I'll try and attach it as well, but it may be too large.
Created attachment 863053 [details] valgrind log of a crash
I see this: ==6838== Process terminating with default action of signal 11 (SIGSEGV) ==6838== Access not within mapped region at address 0x1D9C4000 ==6838== at 0x3692F68C78: js::jit::BaselineScript::pcForReturnOffset(JSScript*, unsigned int) (BaselineJIT.cpp:683) ==6838== by 0x3692FDACE9: js::jit::IonFrameIterator::baselineScriptAndPc(JSScript**, unsigned char**) const (IonFrames.cpp:217) ==6838== by 0x3692FDC961: js::jit::GetPcScript(JSContext*, JSScript**, unsigned char**) (IonFrames.cpp:1051) ==6838== by 0x3692E4D3AF: js_InferFlags(JSContext*, unsigned int) (jscntxtinlines.h:536) ==6838== by 0x3692E565F9: js::GetPropertyHelper(JSContext*, JS::Handle<JSObject*>, JS::Handle<long>, unsigned int, JS::MutableHandle<JS::Value>) (jsobj.cpp:3538) ==6838== by 0x3692F6518F: js::jit::DoGetPropFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICGetProp_Fallback*, JS::MutableHandle<JS::Value>, JS::MutableHandle<JS::Value>) (BaselineIC.cpp:5453) ==6838== by 0xCB7C0AD: ??? ==6838== by 0xCB8193E: ??? ==6838== by 0x37970B37: ??? ==6838== by 0xCB73853: ??? ==6838== by 0x3692F67384: EnterBaseline(JSContext*, js::jit::EnterJitData&) [clone .part.191] (BaselineJIT.cpp:105) ==6838== by 0x3692F676D9: js::jit::EnterBaselineMethod(JSContext*, js::RunState&) (BaselineJIT.cpp:81) ==6838== If you believe this happened as a result of a stack ==6838== overflow in your program's main thread (unlikely but ==6838== possible), you can try to increase the size of the ==6838== main thread stack using the --main-stacksize= flag. ==6838== The main thread stack size used in this run was 8388608.
mozjs folks asked me to add another valgrind parameter and do it again, so... ==14803== Process terminating with default action of signal 11 (SIGSEGV) ==14803== Bad permissions for mapped region at address 0x17D54060 ==14803== at 0x17D54060: ??? ==14803== by 0x3692F67384: EnterBaseline(JSContext*, js::jit::EnterJitData&) [clone .part.191] (BaselineJIT.cpp:105) ==14803== by 0x3692F676D9: js::jit::EnterBaselineMethod(JSContext*, js::RunState&) (BaselineJIT.cpp:81) ==14803== by 0x3692CFD182: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:2334) ==14803== by 0x3692CFDBC7: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:438) ==14803== by 0x3692CEFD2B: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:500) ==14803== by 0x3692E0EADE: js::CallOrConstructBoundFunction(JSContext*, unsigned int, JS::Value*) (jsfun.cpp:1212) ==14803== by 0x3692CEFD72: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (jscntxtinlines.h:321) ==14803== by 0x3692E0EEDF: js_fun_apply(JSContext*, unsigned int, JS::Value*) (jsfun.cpp:982) ==14803== by 0x3692CEFD72: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (jscntxtinlines.h:321) ==14803== by 0x3692CFFBED: js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value*, JS::Value*) (Interpreter.cpp:531) ==14803== by 0x3692F5F9DF: js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICCall_Fallback*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) (BaselineIC.cpp:7007)
Created attachment 863090 [details] better valgrind log of a crash
Also filed with mozilla as https://bugzilla.mozilla.org/show_bug.cgi?id=972725 , as the mozilla folks suggested an upstream report may be appropriate.
*** Bug 1052964 has been marked as a duplicate of this bug. ***
So at this point mozjs folks say "So, this is probably an error which need to be fixed in the way the JS API is used in gjs_value_from_g_argument, the trick with --vgdb-error=0, is something which can be used by developers, to find out more about the context of valgrind error reports. (such as running 'where full' in gdb, instead of 'bt')" I can't get a run through gdb to work, though - it never fully initializes Shell, it's like when I tried running it through valgrind with num-callers=20, like it's just too much work or something. It kills the existing Shell, and the new one starts to start up, but never seems to reach the point of actually running (I never see window decorations). No, gdb wasn't at a break point, I checked. They also asked me to build mozjs24 with --enable-debug, and get yet another valgrind crash log with that. I tried. The mozjs24 with --enable-debug is here: http://koji.fedoraproject.org/koji/taskinfo?taskID=6531078 gjs rebuilt against that mozjs24 - which you need, or else Shell will crash on start - is here: https://www.happyassassin.net/temp/gjs-1.39.3-2.1.fc21.x86_64.rpm https://www.happyassassin.net/temp/gjs-debuginfo-1.39.3-2.1.fc21.x86_64.rpm but when running Shell through valgrind with those builds, it didn't crash once for me - heisenbug!. It *does* still crash when not running through valgrind. If anyone else wants to try, please do. I'll try again shortly.
Hmm, could it be an infinit recursion loop on our side, or some cleanup cycle on the C side ?
Latest from upstream (02-18, this isn't new) is: ----- (In reply to Nicolas B. Pierron [:nbp] from comment #16) > Terrence: Any idea what could cause AssertHeapIsIdleOrIterating to fail > during an interruption callback. This asserts that we are not actively in a GC or otherwise tracing the heap, e.g. for JS_IterateCompartmentsCellIters for about:memory, JS_DumpHeap for debugging, etc. So this would imply that script is running from a finalizer, a JSTracer callback, while using CellIter, or perhaps someone is just calling the interrupt hook manually in one of this places. This is not allowed and, as far as I know there is nowhere in SpiderMonkey proper where this can happen. However, I think in the past we've had trouble with finalizers in Gecko accidentally trying to run scripts. ---- Today I managed to catch one of the crashes in gdb (but not the gdb-via-valgrind arrangement someone requested, just direct in gdb) with a --debug build of mozjs24, and got this: Assertion failure: !rt->isHeapCollecting(), at /builddir/build/BUILD/mozjs-24.2.0/js/src/jsapi.cpp:206 Program received signal SIGSEGV, Segmentation fault. 0x0000003ac9652590 in AssertHeapIsIdleOrIterating (rt=<optimized out>) at /usr/src/debug/mozjs-24.2.0/js/src/jsapi.cpp:206 206 JS_ASSERT(!rt->isHeapCollecting()); is that any use to anyone?
also an interesting comment from upstream: https://bugzilla.mozilla.org/show_bug.cgi?id=972725#c24 "SpiderMonkey embeddings /must not/ call back into the API from a finalizer, full stop. We do allow API usage, including running (almost) arbitrary script code, during GC, but /only/ during the JSGCCallback when the phase is JSGC_END. Gecko has the same need: it implements something called "delayed finalization." The idea is that when finalizers need to interact with SpiderMonkey they push the operation into a list, then run these operations in order when they get the JSGC_END callback. I guess gnome-shell needs something similar." But interestingly, I haven't seen this bug lately on either of my Rawhide systems, I'm almost sure. Seems like it magically got solved, somehow or other. mozjs hasn't changed, but gjs has had a few bumps in March and April. I can't recall exactly when it stopped happening to me.
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle. Changing version to '22'. More information and reason for this action is here: https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22
no comments for long time.