Bug 802903
Summary: | [abrt] gnome-shell-3.2.2.1-1.fc16: __GI___libc_malloc: Process /usr/bin/gnome-shell was killed by signal 6 (SIGABRT) | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Dave Jeffery <david.richard.jeffery> | ||||||||||||||||||||
Component: | gnome-shell | Assignee: | Owen Taylor <otaylor> | ||||||||||||||||||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||||||||||
Priority: | unspecified | ||||||||||||||||||||||
Version: | 16 | CC: | maxamillion, nalimilan, otaylor, rstrode, samkraju, walters | ||||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||
Whiteboard: | abrt_hash:485fb63cf66f81ecaf7b0a3cb5919c396d4cfdf1 | ||||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||
Last Closed: | 2013-02-13 20:26:20 UTC | Type: | --- | ||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||
Attachments: |
|
Description
Dave Jeffery
2012-03-13 17:36:34 UTC
Created attachment 569738 [details]
File: dso_list
Created attachment 569739 [details]
File: var_log_messages
Created attachment 569740 [details]
File: maps
Created attachment 569741 [details]
File: backtrace
Hey, thanks for getting this trace using G_SLICE=malloc. Would you be kind enough to get another one? We'd need you to run the Shell like this: G_SLICE=malloc MALLOC_CHECK_2 gnome-shell --replace or even better G_SLICE=malloc G_DEBUG=gc-friendly valgrind --tool=memcheck gnome-shell --replace (for the latter, you'll need to install Valgrind) Thanks! Hm, as I said in the other bug, the commands are rather: G_SLICE=always-malloc MALLOC_CHECK_2 gnome-shell --replace or G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind --tool=memcheck gnome-shell --replace Just tried to send a backtrace using Valgrind but ABRT said reporting disabled as backtrace unusable. Will try again later. AFAIK, abrt shouldn't be triggered at all if you run gnome-shell in Valgrind. Even if it does, the Valgrind output is what we're interested in (not the ABRT backtrace). Can you attach it? Created attachment 570021 [details]
3 Valgrind logs for Gnome Shell running under AMD/Catalyst
It's very difficult to get Gnome Shell to crash when running under Valgrind as the sheer volume of errors means it stops every 15 seconds or so.
Am I doing something wrong?
Created attachment 570027 [details]
Valgrind log for Gnome Shell crash on AMD/Catalyst driver
Yes! Finally managed to get a Valgrind crash log for Gnome Shell crashing in this manner.
I hope it helps! Dave
Thanks! This is full of memory errors coming from fglrx, which means it's probably AMD's fault here. ;-) This will help convince them they need to fix something. (For reference, the master bug is probably: https://bugzilla.redhat.com/show_bug.cgi?id=702257) Those errors aren't necessarily harmful, you can imagine a graphics driver copying to memory that wasn't allocated, since it could be mapped graphics card memory. That log shows the message: org.gnome.Shell already exists on bus and --replace not specified which means it didn't crash, it just didn't start since gnome-shell was already running. If either Ray or Milan want to tell me how to output something that's more useful then if you can explain what I need to do I'll be happy to help. Well Milan's idea is pretty good. 1) sudo debuginfo-install gnome-shell 2) bring up a terminal 3) (as your user) run: env G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind --log-file=/tmp/shell.log --tool=memcheck gnome-shell --replace Note the --replace at the end. 4) at this point things will be very slow. If you can make it crash under this very unpleasant set of circumstances then it would help track down the problem. Unfortunately, it may be unusable. I'd also like to know if ~/.xsession-errors has those libfolks warnings only under crash circumstances or all the time, or just that one time. Dave, what version of evolutiond-data-server are you running? (In reply to comment #13) > Those errors aren't necessarily harmful, you can imagine a graphics driver > copying to memory that wasn't allocated, since it could be mapped graphics card > memory. Ah, didn't know that. > That log shows the message: > > org.gnome.Shell already exists on bus and --replace not specified > > which means it didn't crash, it just didn't start since gnome-shell was already > running. Yeah, I saw that too, but I thought it was from the instance that automatically started in that terminal after Ctrl+C was hit. But indeed, gnome-shell does not restart in the same terminal in that case AFAIK. Anyway, a trace of the crash would be good. I'm running: evolution-data-server-3.2.3-2.fc16 (64-bit) In the morning I'll follow the instructions in Comment 15. I'll try and see when the libfolks warnings occur tomorrow as well. Created attachment 570487 [details]
.xsession-errors BEFORE a crash happens
.xsession-errors BEFORE a crash happens
Created attachment 570488 [details]
.xsession-errors AFTER a crash happens
.xsession-errors AFTER a crash happens
so one thing that's interesting is this message: (gnome-shell:2467): Clutter-WARNING **: Unable to compile the GLSL shader: Fragment shader failed to compile with the following errors: ERROR: 1:1: error(#105) #version must occur before any other statement in the program ERROR: error(#273) 1 compilation errors. No code generated That's most likely coming from this line src/st/st-scroll-view-fade.c: st_scroll_view_fade_init(), where it does: if (clutter_feature_available (CLUTTER_FEATURE_SHADERS_GLSL))• {• shader = cogl_create_shader (COGL_SHADER_TYPE_FRAGMENT);• cogl_shader_source (shader, fade_glsl_shader);• cogl_shader_compile (shader);• if (!cogl_shader_is_compiled (shader))• {• gchar *log_buf = cogl_shader_get_info_log (shader);• • g_warning (G_STRLOC ": Unable to compile the fade shader: %s",• log_buf);• g_free (log_buf);• • cogl_handle_unref (shader);• shader = COGL_INVALID_HANDLE;• }• }• fade_glsl_shader starts like this: static const gchar *fade_glsl_shader =• "uniform sampler2D tex;\n"• "uniform float height;\n"• "uniform float width;\n"• So it doesn't have #version at the start. According to the spec it doesn't need it: Version 1.10 of the language does not require shaders to include this directive, and shaders that do not include a #version directive will be treated as targeting version 1.10. Shaders that specify #version 100 will be treated as targeting version 1.00 of the OpenGL ES Shading Language, but might be worth trying to add it. It could be the warning is unrelated to the problem, or it could be the error path caused by this warning is what's causing the heap corruption. Either way, fixing the warning isn't sufficient to fix the bug, but it's worth persuing nonetheless, I think. So one other thing you could try, Dave, if you don't mind... as root, open up 1) cp /usr/share/gnome-shell/theme/gnome-shell.css /usr/share/gnome-shell/theme/gnome-shell.css.original 2) sed -i -e 's/fade-offset: .*;/fade-offset: 0px;/' gnome-shell.css then restart the shell and see if the Clutter-WARNING goes away and see if the crashes stop. Created attachment 570882 [details]
GNOME Shell Valgrind Crash log
This is the log I received when I was running GNOME Shell under Valgrind as requested - when it crashed, I was thrown out of my desktop and, via a terminal-style screen, back to GDM.
(In reply to comment #23) > This is the log I received when I was running GNOME Shell under Valgrind as > requested - when it crashed, I was thrown out of my desktop and, via a > terminal-style screen, back to GDM. And you weren't able to get the end of the log? Looks truncated to me. I guess you did not start gnome-shell from a virtual console, but from gnome-terminal? The problem with this is that gnome-terminal got killed too (not sure why...). Ah, sorry I didn't know about Virtual Terminals. I've just Googled them - Ctrl+Alt+F2? I'll try again now. So i don't know if this is the cause of the problem, but there is a real bug spotted by your log: ==6045== Conditional jump or move depends on uninitialised value(s) ==6045== at 0x334C43FE6A: meta_window_actor_update_bounding_region_and_borders (meta-window-actor.c:1650) ==6045== by 0x334C440B13: meta_window_actor_pre_paint (meta-window-actor.c:1916) ==6045== by 0x334C436217: meta_repaint_func (compositor.c:1192) ==6045== by 0x3184E7A314: _clutter_run_repaint_functions (clutter-main.c:2978) ==6045== by 0x3184E7AC6C: clutter_clock_dispatch (clutter-master-clock.c:367) ==6045== by 0x3169444ACC: g_main_context_dispatch (gmain.c:2441) ==6045== by 0x31694452C7: g_main_context_iterate (gmain.c:3089) ==6045== by 0x3169445814: g_main_loop_run (gmain.c:3297) ==6045== by 0x334C456AB0: meta_run (main.c:555) ==6045== by 0x4029E0: main (main.c:571) Looking at the code for meta_window_actor_update_bounding_region_and_borders I see it does: MetaFrameBorders borders;• if (window->priv->frame != NULL)• meta_frame_calc_borders (frame, &borders);• /* some code here that uses borders */ priv->last_borders = borders;• The probem is if window->priv->frame == NULL then borders will contain bogus, uninitialized values. I'm not sure how that would cause heap corruption, though. It seems like it would just cause drawing artifacts /"screen dirt". Do you have any windows you use regularly that don't have title bars? The only instance of graphical corruption I get is in Totem. If I make a window full screen and then make the window normal size again you get some random pixels on the left hand and bottom edge of the totem window. Before I get the crash all the title bars disappear from all my windows for about a second. I've since realised the crash logs I got yesterday were for a different bug - video crashing in Subtitle Editor rather than the bug I was looking for. That's why the crash logs are truncated. I tried for nearly two hours yesterday to get a good crash log when running through valgrind with no success - the bug just doesn't seem to happen when the desktop is running that slowly. Perhaps something is happening too soon - before something else has finished - and that's what's causing this crash. I'll try again to get a good crash log for you - I'm sorry I'm making such a mess of it! Thanks for doing so much of the troubleshooting! There's this in your log, too: ==6045== Invalid read of size 8 ==6045== at 0x334C466190: meta_later_remove (util.c:914) ==6045== by 0x334C466316: call_idle_later (util.c:834) ==6045== by 0x3169444ACC: g_main_context_dispatch (gmain.c:2441) ==6045== by 0x31694452C7: g_main_context_iterate (gmain.c:3089) ==6045== by 0x3169445814: g_main_loop_run (gmain.c:3297) ==6045== by 0x334C456AB0: meta_run (main.c:555) ==6045== by 0x4029E0: main (main.c:571) ==6045== Address 0x284d3d08 is 8 bytes inside a block of size 16 free'd ==6045== at 0x4A0662E: free (vg_replace_malloc.c:366) ==6045== by 0x316944B7E2: g_free (gmem.c:263) ==6045== by 0x31694606BE: g_slice_free1 (gslice.c:907) ==6045== by 0x3169461399: g_slist_delete_link (gslist.c:583) ==6045== by 0x334C466148: meta_later_remove (util.c:919) ==6045== by 0x334C466316: call_idle_later (util.c:834) ==6045== by 0x3169444ACC: g_main_context_dispatch (gmain.c:2441) ==6045== by 0x31694452C7: g_main_context_iterate (gmain.c:3089) ==6045== by 0x3169445814: g_main_loop_run (gmain.c:3297) ==6045== by 0x334C456AB0: meta_run (main.c:555) ==6045== by 0x4029E0: main (main.c:571) That invalid read is bad, it's this code here: void• meta_later_remove (guint later_id)• {• GSList *l;• • for (l = laters; l; l = l->next)• {• MetaLater *later = l->data;• if (later->id == later_id)• {• laters = g_slist_delete_link (laters, l);• /* If this was a "repaint func" later, we just let the• * repaint func run and get removed• */• destroy_later (later);• }• }• }• Somehow the laters list is getting destroyed and then later iterated through. That could be caused by or the cause of heap corruption (or both). I'll poke and see if I see anything in the surrounding code. I'm still curious about the answer to comment 22. oh that code snippet from comment 28 has an obvious problem in it. g_slist_delete_link invalidates l and then l = l->next is immediately run. This could cause all sorts of weird behavior. mind giving the test packages here a try (without valgrind to maximize the probability the bug will happen) and tell me if the bug goes away? http://koji.fedoraproject.org/koji/taskinfo?taskID=3908111 I've just installed them - I'll reboot and see how I get on today! Crashed almost immediately. The backtrace is at: https://bugzilla.redhat.com/show_bug.cgi?id=702257 alright thanks. I'm trying the suggestion in Comment 22 now - I'll let you know how I get on. Over an hour without a single crash. Look like the CSS was the problem! My desktop is much, much more stable since I made the change to the CSS file suggested in Comment 22. However I did get one crash, and that's logged in the original report along with the Backtrace. After several hours use: A single crash in several hours of use since making that CSS change. Normally I would have got between four and six crashes an hour. Plus something odd I've noticed in addition: a game my friend wrote in Pygame is actually playable now - it was very unresponsive and slow before. Many thanks for your help - it's so nice I don't have ABRT popping up all the time! and just to be sure, if you look at the file, the changes weren't reverted by an unintentional update? No, it still says: StScrollView.vfade { -st-vfade-offset: 0px; } It was 68px before. well, I think there are two spots in the css. one that normally says 68px and one that normally says 24px. The 24px one says 0px as well, yea? This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |