Bug 149772
Summary: | segfault on __glXInitialize over ssh tunnel (not when without -O) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ian Pilcher <arequipeno> |
Component: | xorg-x11 | Assignee: | X/OpenGL Maintenance List <xgl-maint> |
Status: | CLOSED ERRATA | QA Contact: | David Lawrence <dkl> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 3 | CC: | jakub, mharris, tgl |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-04-20 09:35:58 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 136450 |
Description
Ian Pilcher
2005-02-26 04:26:02 UTC
Of all the ironies, this problem doesn't occur when the remote display is Cygwin/X on Windows XP! Just attempted to reproduce your problem here and no dice. A wild guess is that this may be caused by something like a broken GLX extension. What graphics driver are you using? The remote display in this case was my IBM ThinkPad T30 with an ATI Radeon Mobility 7500. I just tried again -- ssh'ed from my office (Matrox Millenium G200 AGP) to my wife's computer (also FC3) and typed oocalc. Same result, segfault before the splash screen appears. Reversing things -- ssh'ing from my wife's computer (Matrox Millenium G450 DVI) to the home system -- same result again. Oddly, I can ssh -Y localhost and start oocalc just fine. I'm puzzled. Working for me with openoffice.org-1.1.3-9.5.0, can you verify that it's still
broken for you ?
if so, try running
> gdb /usr/lib/ooo-1.1/program/soffice.bin
(gdb) run
... wait until it crashes...
(gdb) bt
and paste the output of bt here
Problem is still 100% reproducible with openoffice.org-1.1.3-9.5.0.fc3. I just tried it "both ways" between my home office system and my wife's system. My home office system is a dual 1GHz Pentium III, 2GB RAM, Matrox G200 AGP w/ 16MB; my wife's system is a 866MHz Pentium III, 512MB RAM, Matrox G450 DVI w/ 32MB. Both are fully updated (as of 16 Mar 2005) Fedora Core 3. Since gdb isn't installed on my wife's system, I used it as the "client" (i.e. the system running the X server), and I used my home office system as the "server" (i.e. the system running OpenOffice.org). Here is the output ("home" is my home office system; "charo" is my wife's system): [pilcher@charo ~]$ ssh -Y home pilcher@home's password: Last login: Wed Mar 16 09:32:44 2005 from charo [pilcher@home ~]$ nedit /etc/printcap & [1] 28712 [pilcher@home ~]$ [1]+ Done nedit /etc/printcap [pilcher@home ~]$ oocalc Segmentation fault [pilcher@home ~]$ which oocalc /usr/bin/oocalc [pilcher@home ~]$ gdb /usr/lib/ooo-1.1/program/soffice.bin GNU gdb Red Hat Linux (6.1post-1.20040607.43rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)...Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) run Starting program: /usr/lib/ooo-1.1/program/soffice.bin (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...[Thread debugging using libthread_db enabled] [New Thread -1208305984 (LWP 28776)] (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(nodebugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208305984 (LWP 28776)] 0x00bb4ae5 in free () from /lib/tls/libc.so.6 (gdb) bt #0 0x00bb4ae5 in free () from /lib/tls/libc.so.6 #1 0x00fe3c06 in __glXInitialize () from /usr/X11R6/lib/libGL.so.1 #2 0x00fdf3db in __indirect_glCompressedTexSubImage3D () from /usr/X11R6/lib/libGL.so.1 #3 0x00fe0453 in glXGetConfig () from /usr/X11R6/lib/libGL.so.1 #4 0x00e197af in X11SalOpenGL::MakeVisualWeights () from /usr/lib/ooo-1.1/program/libvclplug_gen645li.so #5 0x00e3d65c in SalDisplay::BestVisual () from /usr/lib/ooo-1.1/program/libvclplug_gen645li.so #6 0x00e3c33d in SalXLib::Init () from /usr/lib/ooo-1.1/program/libvclplug_gen645li.so #7 0x00e3bbab in SalData::Init () from /usr/lib/ooo-1.1/program/libvclplug_gen645li.so #8 0x00e4724c in create_SalInstance () from /usr/lib/ooo-1.1/program/libvclplug_gen645li.so #9 0x03a3af82 in vcl::SolarThreadExecutor::execute () from /usr/lib/ooo-1.1/program/libvcl645li.so #10 0x03a3bda7 in CreateSalInstance () from /usr/lib/ooo-1.1/program/libvcl645li.so #11 0x0387b838 in InitVCL () from /usr/lib/ooo-1.1/program/libvcl645li.so #12 0x0387b70d in SVMain () from /usr/lib/ooo-1.1/program/libvcl645li.so #13 0x03a3c03d in main () from /usr/lib/ooo-1.1/program/libvcl645li.so #14 0x00b67e33 in __libc_start_main () from /lib/tls/libc.so.6 #15 0x08063a41 in _start () (gdb) I'm seeing what seems to be a closely related behavior with openoffice.org-1.1.3-6.5.0.fc3 on an up-to-date FC3 machine. I am ssh'ing to it from an old HPUX machine (don't ask) and typing "ooffice", and what happens is that the HP's X server instantly locks up :-( and doesn't recover short of being killed. Meanwhile the ooffice process on the FC3 machine goes away immediately or nearly so; it does not drop a corefile though. This works fine if I use a normal non-SSH X connection. Also, this did *not* happen with the immediately prior FC3 version of openoffice. I don't think the ssh software has changed recently at either end, either. One possible clue is that on the HP, I see the X server has spawned an "ogld" child process before it hangs. The ogld is pretty wedged too --- it has to be killed separately. caolanm->ian: That crash is coming from glXGetConfig, from libgl. See https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=139712 for a similiar looking report. On that issue I attached a sample standalone program. Can you try running that little demo program over your tunnel instead of OOo and see if the little testcase fails as well. caolanm->tgl: Don't know much about that, give the demo program I mention a whirl though on the hope that ogld is "opengl daemon" or something and that this is related to libgl on the FC3 side and not really directly an openoffice.org thing Bug #139712 also indicates that newer X.Org fix the issue occuring in that bug report. You might want to upgrade to 6.8.2 from fc3-testing and see if that fixes the problem or not. It might just go away for free. cmc->Ian: were you able to try the testcase program to see if it also crashed ? Sorry for the delay. I missed the reference to the test program. Short answer: The testcase also crashes when run over the ssh tunnel. Long answer: The testcase only crashes when built with optimization (-O). [pilcher@home temp]$ c99 -O -g -Wall -W -pedantic -o crash crash.c -L/usr/X11R6/lib -lX11 -lGL [pilcher@home temp]$ ./crash Segmentation fault [pilcher@home temp]$ gdb ./crash GNU gdb Red Hat Linux (6.1post-1.20040607.43rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) run Starting program: /home/pilcher/temp/crash [Thread debugging using libthread_db enabled] [New Thread -1208940864 (LWP 32268)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208940864 (LWP 32268)] 0x00bb4ab9 in free () from /lib/tls/libc.so.6 (gdb) bt #0 0x00bb4ab9 in free () from /lib/tls/libc.so.6 #1 0x006eec06 in __glXInitialize () from /usr/X11R6/lib/libGL.so.1 #2 0x006ea3db in __indirect_glCompressedTexSubImage3D () from /usr/X11R6/lib/libGL.so.1 #3 0x006eb453 in glXGetConfig () from /usr/X11R6/lib/libGL.so.1 #4 0x08048604 in main () at crash.c:22 [pilcher@home temp]$ c99 -g -Wall -W -pedantic -o crash crash.c -L/usr/X11R6/lib -lX11 -lGL [pilcher@home temp]$ ./crash libGL error: XF86DRIAuthConnection failed libGL error: reverting to (slow) indirect rendering All OK Well it's not openoffice.org specific, happening with an X lib. Though optimization specific is interesting. I tried the testgl.c program shown in bug #139712 on my setup with ssh to HPUX. I did *not* compile with optimization, just gcc testgl.c -o testgl -L/usr/X11R6/lib -lX11 -lGL This is able to lock up my HPUX X server just like the ooffice case. There is no core dump on the Fedora side so far as I can tell, but it does print this on stderr: Xlib: sequence lost (0x10000 > 0xa) in reply type 0x0! X Error of failed request: 0 Major opcode of failed request: 0 () Serial number of failed request: 0 Current serial number in output stream: 10 This pretty much lets ooffice off the hook: the problem is at the X level. I'm currently using xorg-x11-libs-6.8.1-12.FC3.21 xorg-x11-Mesa-libGL-6.8.1-12.FC3.21 Will try updated X libs as suggested by Mike. I have updated both systems to xorg-x11-6.8.2-1.FC3.13. Starting oocalc via an SSH tunnel no longer segfaults. I now get the same message that I got when I compiled the test program without optimization: [pilcher@charo ~]$ libGL error: XF86DRIAuthConnection failed libGL error: reverting to (slow) indirect rendering After this, things appear to work. Is this the expected behavior when using a remote display? I've updated to openoffice.org-1.1.3-9.5.0.fc3 xorg-x11-6.8.2-1.FC3.13 and I still see the X server freeze over ssh ... Tom: We've reviewed this report and the initial report filed by Ian,
is about openoffice itself segfaulting, not the X server. Ian has
indicated that updating both systems to the newer xorg-x11-6.8.2-1.FC3.13
release has resolved this problem for him, so this Ian's bug is resolved
now. Reading through the comments you've added, it seems that the
problem you are experiencing is unrelated to the problem reported by
Ian, as your problem is about your X server crashing rather than openoffice
crashing.
>I am ssh'ing to it from an old HPUX machine (don't ask) and
>typing "ooffice", and what happens is that the HP's X server
>instantly locks up :-(
If the X server is crashing, then there is a bug in HP's X server if
running an application can cause it to crash. I'd recommend reporting
that to Hewlett Packard's customer service centre for them to
investigate, as it sounds serious.
Hope this helps.
Ian: Ok, thanks for the update. Hardware acceleration via DRI is only available when the client is running on the same system as the X server. Hardware acceleration with remote clients is not implemented in the DRI and so remote OpenGL clients always use Mesa software fallbacks. I'm not sure why you are seeing that error message however. Do you have any libGL environment variables set by chance? If you consider the error message to be a bug, please open a new bug report separately so we can investigate the error message. Thanks. Setting status to "ERRATA", now that problem is confirmed fixed in a released update. |