Description of Problem: Got a new laptop and installed 7.2 on it. On first login the panel crashed. This could be some race condition that's dependent on CPU speed / amount of work being done on first login etc. Version-Release number of selected component (if applicable): gnome-core-1.4.0.4-38 How Reproducible: Steps to Reproduce: 1. log in on first boot after install 2. 3. Actual Results: Crash in the panel (attached output from bug-buddy) Expected Results: No crash Additional Information: I'm attaching the info saved from bug-buddy
Created attachment 35121 [details] output saved from bug-buddy
Yuck, looks like some sort of memory corruption caused by gwmh getting in some specific state - most likely totally unreproduceable...
see also http://bugzilla.gnome.org/show_bug.cgi?id=63352 and the Red Hat bug backlinked from there
In case it helps, I can reproduce this reliably on my Dell Inspiron 5000 laptop (P3-650). It happens much more reliably when the power is connected (i.e. running at 650). With the power disconnected (speedstep kicks in and running at 500mhz) it is far less likely to do this.
Oh, it is just dawning on me that this is all happening on laptops. Doh. I know this bug. ;-) On some laptops with a bad BIOS, apps that touch /proc/apm will segfault, at least a lot of the time. IIRC. Something along those lines. I know it was some bug with buggy BIOS and apps segfaulting trying to use apm. If you do "cat /proc/apm" does cat segfault? Can I get laptop make/model for people seeing the bug?
cat /proc/apm doesn't segfault. I did it about 100 times, wrote a script that did it 1000 times. On a whim, I disabled the battery monitor (I didn't use it when I was running RH 7.1) and the panel still crashed. So that wasn't it. Without the battery monitor, the panel shouldn't be touching /proc/apm correct? I haven't generated a bug-buddy report since I haven't seen the need to e-mail it, but if it would be helpful I can.
The thing is, the panel touches /proc/apm itself to decide whether to run the battery monitor by default. I don't think failure of cat to segfault necessarily invalidates the theory, since cat is such a simple app. There may be some more complicated trigger than just touching the file. I'll go ask a kernel person today.
I've a Dell Inspiron 8000 on which I've run RH7.0 for almost a year without a hitch. I upgraded the bios (maybe bad move) to the A17 release from Dell. I then immediately installed RH7.2. Possible due to this same probelm reported here I've re-installed the OS opproximately 8 time in the last week. In addition to the panel crashing which it does after some installs but not others I've had the following problems some of the time: * gnome_terminal won't start (segmentation fault) * under gnome: the shared libraries have been corrupted (can't ls -l them. Read/write error) * under gnome: the panel.d directory has been corrupted (can't ls -l them. Read/write error) * under KDE: the icon .png files have been corrupted (can't ls -l them. Read/write error) * under KDE: can't read the icon.png files (file ownership group, filesize, and date tag are all wrong) I've made sure to repartion the drive with each install and check for bad blocks. I've tried using an ext2 filesystems instead of ext3. I've tried running KDE instead of gnome. None of these permutations helped. I just installed RH7.1 to see if that makes a difference. So far so good but its been running < 1 hour. I haven't narrowed it down to whether the problem is in starting X, rebooting the machine, with the bios, or with the laptop management apps, but it is seems pretty clear that my problem and maybe the originators problem is with the file system.
I have similar experience on both a laptop and a desktop computer. So I do not think either it is related to apm. The panel crashes randomly when opening a new session, and especially when the computer was freshly booted. When the panel crashes, I can restart it by hand, by just typing 'panel&' in some terminal window. It then gives a warning which is maybe useful: Gtk-WARNING **: gtk_signal_disconnect_by_data(): could not find handler containing data (0x818A2A8) Doesn't it mean that the panel is trying to access some memory area pointed to by an uninitialised pointer ? Sometimes the area is in a protected page and kaboom, sometimes not: the pointee does not contain meaningful information and the panel issues the above warning. Just a guess.
I have the same problem exact. I was able to get around it by deleteing all the .gnome* configuration and then letting it be recreated on the next login. After things were recreated I never had this problem again. Sorry I didn't try to figure exactly what configuration file(s) were causing the problems.
Ok, bugs 55119, 55276 and 58297 here look like the same problem (which I have also). If you check at gnome.org for http://bugzilla.gnome.org/show_bug.cgi?id=59500 or http://bugzilla.gnome.org/show_bug.cgi?id=69333 you can see this problem is reported by a LOT of people and looks specific to RH7.2.
Look at bugzilla.gnome.org for about 500 reports of the same crash :( Red Hat really needs to update their gnome-core package to the latest release, if not we'll end up having the same problem with the next release... There's been no crash reports for 1.4.0.6, and all people asked to try this release have said that it fixed it for them. One small lead is that it seems to happen only with accounts created during install. If they remove the account and create a new one all is fine. (This has to be equivalent to 'rm ~/.gnome maybe?) Kjartan
I spent a couple hours going through all the dups on gnome.org last night. There are several distinct crashes in there. The vast majority of people didn't include a backtrace in the report, so I don't really know which one is "the" crash. :-/ Anyhow, I'll either get the new gnome-core or carefully sort through the patches since our current gnome-core and apply some of them.
*** Bug 52496 has been marked as a duplicate of this bug. ***
*** Bug 57003 has been marked as a duplicate of this bug. ***
*** Bug 55276 has been marked as a duplicate of this bug. ***
*** Bug 58297 has been marked as a duplicate of this bug. ***
*** Bug 56023 has been marked as a duplicate of this bug. ***
This bug seen on RH7.0 , RH7.1, RH7.2 on Dell Inspiron 5000e and 7500 (ie ATI Rage Mobitiy P and N chipsets) Also on various Acer Servers using ATI Mach64 2/4 Meg RAM on-boards chipsets All systems fixed by doing CTRL-ALT-BACKSPACE before login Fault was cleared on RH7.0 by updating to latest Ximian Gnome 1.4 (oops!) Inspiron 7500 has seemingly braindamaged APM hardware - so forget it. 5000e seems ok. Neither Dell system probes well via Xconfigurator.
No closer to reproducing this or figuring out the problem :-/
I got another lead from someone - can people post their "xdpyinfo" output? Or just note whether you are using an 8-bit or other less-common bit depth?
8-bit does work here (I would have been surprised if it didn't I guess), but I'm still wondering if it has something to do with the specific X visual.
I'm running 24-bit. I'll try 8 and 16 bit tonight when I get home to see if there is any change with the problem.
Created attachment 49804 [details] gdb backtrace, while Panel crash dialog still open
Excellent! If you can reproduce it, is there any chance you could do "export MALLOC_CHECK_=2" in /etc/profile or somewhere? That backtrace is inside g_malloc() which means memory got corrupted somehow; MALLOC_CHECK_=2 may convince it to crash closer to the root cause of the problem. Note trailing underscore in the env variable name.
Another dup just hit gnome.org: http://bugzilla.gnome.org/show_bug.cgi?id=76037 Again it happens only on an Inspiron for them. I went through a hundred or so of the gnome.org dups a couple weeks ago, the create_menu_at() backtrace in that bug and the gdk_window_foreign_new() backtrace on this bug are the most common traces. However neither backtrace is actually telling us enough to fix the problem without debug symbols/MALLOC_CHECK_/etc.
*** Bug 65309 has been marked as a duplicate of this bug. ***
Just for completeness. Here's a backtrace with MALLOC_CHECK_=2. This is with 16-bit colors and a Rage Mobility P/M AGP 2x card. This is also running the very latest gnome-core from CVS. #0 0x404c8e82 in gdk_window_add_filter () from /usr/lib/libgdk-1.2.so.0 #1 0x0805baec in task_new (window=0x8131b10) at gwmh.c:1726 #2 0x0805be74 in client_list_sync (xwindow_ids=0x815d690, n_ids=1) at gwmh.c:1860 #3 0x0805ac7d in gwmh_desk_update (imask=GWMH_DESK_INFO_CLIENT_LIST) at gwmh.c:1100 #4 0x0805b53e in gwmh_idle_handler (data=0x0) at gwmh.c:1490 #5 0x404e7ddc in g_idle_dispatch (source_data=0x805b500, dispatch_time=0xbffff670, user_data=0x0) at gmain.c:1367 #6 0x404e6e41 in g_main_dispatch (dispatch_time=0xbffff670) at gmain.c:656 #7 0x404e7445 in g_main_iterate (block=1, dispatch=1) at gmain.c:877 #8 0x404e75d4 in g_main_run (loop=0x81bce00) at gmain.c:935 #9 0x403e888f in gtk_main () from /usr/lib/libgtk-1.2.so.0 #10 0x0805ea46 in main (argc=1, argv=0xbffff774) at main.c:657 #11 0x40514647 in __libc_start_main (main=0x805e70c <main>, argc=1, ubp_av=0xbffff774, init=0x8056c70 <_init>, fini=0x80a78d0 <_fini>, rtld_fini=0x4000dcd4 <_dl_fini>, stack_end=0xbffff76c) at ../sysdeps/generic/libc-start.c:129 (gdb) HTH
I think this should be closed out here. It's not Red Hat specific. The only clue I've found is that commenting out client_list_sync() in gwmh.c and just making it return TRUE makes the crash go away.
As we never figured this out and we are unlikely to do a 7.x update at this point anyway, closing bug.