Red Hat Bugzilla – Bug 55119
Panel crashes on login
Last modified: 2007-04-18 12:37:46 EDT
Description of Problem:
Got a new laptop and installed 7.2 on it. On first login the panel crashed.
This could be some race condition that's dependent on CPU speed / amount of
work being done on first login etc.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. log in on first boot after install
Crash in the panel (attached output from bug-buddy)
I'm attaching the info saved from bug-buddy
Created attachment 35121 [details]
output saved from bug-buddy
Yuck, looks like some sort of memory corruption caused by gwmh getting in some
specific state - most likely totally unreproduceable...
see also http://bugzilla.gnome.org/show_bug.cgi?id=63352 and the Red Hat bug
backlinked from there
In case it helps, I can reproduce this reliably on my Dell Inspiron 5000 laptop
(P3-650). It happens much more reliably when the power is connected (i.e.
running at 650). With the power disconnected (speedstep kicks in and running at
500mhz) it is far less likely to do this.
Oh, it is just dawning on me that this is all happening on laptops. Doh. I know
this bug. ;-)
On some laptops with a bad BIOS, apps that touch /proc/apm will segfault,
at least a lot of the time. IIRC. Something along those lines. I know it was
some bug with buggy BIOS and apps segfaulting trying to use apm.
If you do "cat /proc/apm" does cat segfault? Can I get laptop make/model for
people seeing the bug?
cat /proc/apm doesn't segfault. I did it about 100 times, wrote a script that
did it 1000 times.
On a whim, I disabled the battery monitor (I didn't use it when I was running
RH 7.1) and the panel still crashed. So that wasn't it. Without the battery
monitor, the panel shouldn't be touching /proc/apm correct?
I haven't generated a bug-buddy report since I haven't seen the need to e-mail
it, but if it would be helpful I can.
The thing is, the panel touches /proc/apm itself to decide whether to run the
battery monitor by default. I don't think failure of cat to segfault necessarily
invalidates the theory, since cat is such a simple app. There may be some more
complicated trigger than just touching the file. I'll go ask a kernel person today.
I've a Dell Inspiron 8000 on which I've run RH7.0 for almost a year without a
hitch. I upgraded the bios (maybe bad move) to the A17 release from Dell. I
then immediately installed RH7.2. Possible due to this same probelm reported
here I've re-installed the OS opproximately 8 time in the last week.
In addition to the panel crashing which it does after some installs but not
others I've had the following problems some of the time:
* gnome_terminal won't start (segmentation fault)
* under gnome: the shared libraries have been corrupted (can't ls -l them.
* under gnome: the panel.d directory has been corrupted (can't ls -l them.
* under KDE: the icon .png files have been corrupted (can't ls -l them.
* under KDE: can't read the icon.png files (file ownership group, filesize, and
date tag are all wrong)
I've made sure to repartion the drive with each install and check for bad
blocks. I've tried using an ext2 filesystems instead of ext3. I've tried
running KDE instead of gnome. None of these permutations helped. I just
installed RH7.1 to see if that makes a difference. So far so good but its been
running < 1 hour.
I haven't narrowed it down to whether the problem is in starting X, rebooting
the machine, with the bios, or with the laptop management apps, but it is seems
pretty clear that my problem and maybe the originators problem is with the file
I have similar experience on both a laptop and a desktop computer. So I do not
think either it is related to apm. The panel crashes randomly when opening a new
session, and especially when the computer was freshly booted.
When the panel crashes, I can restart it by hand, by just typing 'panel&' in
some terminal window.
It then gives a warning which is maybe useful:
Gtk-WARNING **: gtk_signal_disconnect_by_data(): could not find handler
containing data (0x818A2A8)
Doesn't it mean that the panel is trying to access some memory area pointed to
by an uninitialised pointer ? Sometimes the area is in a protected page and
kaboom, sometimes not: the pointee does not contain meaningful information and
the panel issues the above warning.
Just a guess.
I have the same problem exact.
I was able to get around it by deleteing all the .gnome* configuration and then
letting it be recreated on the next login.
After things were recreated I never had this problem again.
Sorry I didn't try to figure exactly what configuration file(s) were causing the
Ok, bugs 55119, 55276 and 58297 here look like the same problem (which I have
also). If you check at gnome.org for
http://bugzilla.gnome.org/show_bug.cgi?id=69333 you can see this problem is
reported by a LOT of people and looks specific to RH7.2.
Look at bugzilla.gnome.org for about 500 reports of the same crash :(
Red Hat really needs to update their gnome-core package to the latest release,
if not we'll end up having the same problem with the next release...
There's been no crash reports for 126.96.36.199, and all people asked to try this
release have said that it fixed it for them. One small lead is that it seems to
happen only with accounts created during install. If they remove the account and
create a new one all is fine. (This has to be equivalent to 'rm ~/.gnome maybe?)
I spent a couple hours going through all the dups on gnome.org last night.
There are several distinct crashes in there. The vast majority of people didn't
include a backtrace in the report, so I don't really know which one is "the"
Anyhow, I'll either get the new gnome-core or carefully sort through the
patches since our current gnome-core and apply some of them.
*** Bug 52496 has been marked as a duplicate of this bug. ***
*** Bug 57003 has been marked as a duplicate of this bug. ***
*** Bug 55276 has been marked as a duplicate of this bug. ***
*** Bug 58297 has been marked as a duplicate of this bug. ***
*** Bug 56023 has been marked as a duplicate of this bug. ***
This bug seen on RH7.0 , RH7.1, RH7.2
on Dell Inspiron 5000e and 7500 (ie ATI Rage Mobitiy P and N chipsets)
Also on various Acer Servers using ATI Mach64 2/4 Meg RAM on-boards chipsets
All systems fixed by doing CTRL-ALT-BACKSPACE before login
Fault was cleared on RH7.0 by updating to latest Ximian Gnome 1.4 (oops!)
Inspiron 7500 has seemingly braindamaged APM hardware - so forget it. 5000e
Neither Dell system probes well via Xconfigurator.
No closer to reproducing this or figuring out the problem :-/
I got another lead from someone - can people post their "xdpyinfo" output?
Or just note whether you are using an 8-bit or other less-common bit depth?
8-bit does work here (I would have been surprised if it didn't I guess), but I'm
still wondering if it has something to do with the specific X visual.
I'm running 24-bit. I'll try 8 and 16 bit tonight when I get home to see if
there is any change with the problem.
Created attachment 49804 [details]
gdb backtrace, while Panel crash dialog still open
Excellent! If you can reproduce it, is there any chance you could do "export
MALLOC_CHECK_=2" in /etc/profile or somewhere?
That backtrace is inside g_malloc() which means memory got corrupted somehow;
MALLOC_CHECK_=2 may convince it to crash closer to the root cause of the problem.
Note trailing underscore in the env variable name.
Another dup just hit gnome.org:
Again it happens only on an Inspiron for them.
I went through a hundred or so of the gnome.org dups a couple weeks ago, the
create_menu_at() backtrace in that bug and the
gdk_window_foreign_new() backtrace on this bug are the most common traces.
However neither backtrace is actually telling us enough to fix the problem
without debug symbols/MALLOC_CHECK_/etc.
*** Bug 65309 has been marked as a duplicate of this bug. ***
Just for completeness. Here's a backtrace with MALLOC_CHECK_=2. This is with
16-bit colors and a Rage Mobility P/M AGP 2x card.
This is also running the very latest gnome-core from CVS.
#0 0x404c8e82 in gdk_window_add_filter () from /usr/lib/libgdk-1.2.so.0
#1 0x0805baec in task_new (window=0x8131b10) at gwmh.c:1726
#2 0x0805be74 in client_list_sync (xwindow_ids=0x815d690, n_ids=1)
#3 0x0805ac7d in gwmh_desk_update (imask=GWMH_DESK_INFO_CLIENT_LIST)
#4 0x0805b53e in gwmh_idle_handler (data=0x0) at gwmh.c:1490
#5 0x404e7ddc in g_idle_dispatch (source_data=0x805b500,
dispatch_time=0xbffff670, user_data=0x0) at gmain.c:1367
#6 0x404e6e41 in g_main_dispatch (dispatch_time=0xbffff670) at gmain.c:656
#7 0x404e7445 in g_main_iterate (block=1, dispatch=1) at gmain.c:877
#8 0x404e75d4 in g_main_run (loop=0x81bce00) at gmain.c:935
#9 0x403e888f in gtk_main () from /usr/lib/libgtk-1.2.so.0
#10 0x0805ea46 in main (argc=1, argv=0xbffff774) at main.c:657
#11 0x40514647 in __libc_start_main (main=0x805e70c <main>, argc=1,
ubp_av=0xbffff774, init=0x8056c70 <_init>, fini=0x80a78d0 <_fini>,
rtld_fini=0x4000dcd4 <_dl_fini>, stack_end=0xbffff76c)
I think this should be closed out here. It's not Red Hat specific. The only clue
I've found is that commenting out client_list_sync() in gwmh.c and just making
it return TRUE makes the crash go away.
As we never figured this out and we are unlikely to do a 7.x update
at this point anyway, closing bug.