Bug 149772 - segfault on __glXInitialize over ssh tunnel (not when without -O)
Summary: segfault on __glXInitialize over ssh tunnel (not when without -O)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11
Version: 3
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: X/OpenGL Maintenance List
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks: FC4Blocker
TreeView+ depends on / blocked
 
Reported: 2005-02-26 04:26 UTC by Ian Pilcher
Modified: 2007-11-30 22:11 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-04-20 09:35:58 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Ian Pilcher 2005-02-26 04:26:02 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050224 Firefox/1.0.1 Fedora/1.0.1-1.3.1

Description of problem:
Trying to start oowriter or oocalc over an SSH connection, the program
segfaults almost immediately; nothing is ever displayed on the X screen.

Version-Release number of selected component (if applicable):
openoffice.org-1.1.3-6.5.0.fc3

How reproducible:
Always

Steps to Reproduce:
1.  ssh -Y from one Fedora Core 3 system to another.
2.  Try to start OpenOffice.org (oocalc or oowriter).
3.  Enjoy the segfaulty goodness!
  

Actual Results:  Segmentation fault.  Nothing appears on the X screen.

Expected Results:  OpenOffice.org splash screen, followed by application window, should appear on
the remote X display.

Additional info:

Bothe systems are fully updated (as of 02/25/05) Fedora Core 3.  I am able to
use other X applications (Nedit, GnuCash, Thunderbird) over the SSH tunnel.

Comment 1 Ian Pilcher 2005-02-26 17:32:33 UTC
Of all the ironies, this problem doesn't occur when the remote display is
Cygwin/X on Windows XP!

Comment 2 Sitsofe Wheeler 2005-02-27 14:01:09 UTC
Just attempted to reproduce your problem here and no dice. A wild guess is that
this may be caused by something like a broken GLX extension. What graphics
driver are you using?

Comment 3 Ian Pilcher 2005-02-27 20:31:11 UTC
The remote display in this case was my IBM ThinkPad T30 with an ATI Radeon
Mobility 7500.  I just tried again -- ssh'ed from my office (Matrox Millenium
G200 AGP) to my wife's computer (also FC3) and typed oocalc.  Same result,
segfault before the splash screen appears.  Reversing things -- ssh'ing from
my wife's computer (Matrox Millenium G450 DVI) to the home system -- same
result again.

Oddly, I can ssh -Y localhost and start oocalc just fine.  I'm puzzled.

Comment 4 Caolan McNamara 2005-03-16 14:33:56 UTC
Working for me with openoffice.org-1.1.3-9.5.0, can you verify that it's still
broken for you ?

if so, try running 

> gdb /usr/lib/ooo-1.1/program/soffice.bin
(gdb) run
... wait until it crashes...
(gdb) bt
and paste the output of bt here

Comment 5 Ian Pilcher 2005-03-16 15:44:17 UTC
Problem is still 100% reproducible with openoffice.org-1.1.3-9.5.0.fc3.  I
just tried it "both ways" between my home office system and my wife's system.
My home office system is a dual 1GHz Pentium III, 2GB RAM, Matrox G200 AGP w/
16MB; my wife's system is a 866MHz Pentium III, 512MB RAM, Matrox G450 DVI w/
32MB.  Both are fully updated (as of 16 Mar 2005) Fedora Core 3.

Since gdb isn't installed on my wife's system, I used it as the "client" (i.e.
the system running the X server), and I used my home office system as the
"server" (i.e. the system running OpenOffice.org).  Here is the output ("home"
is my home office system; "charo" is my wife's system):

[pilcher@charo ~]$ ssh -Y home
pilcher@home's password:
Last login: Wed Mar 16 09:32:44 2005 from charo
[pilcher@home ~]$ nedit /etc/printcap &
[1] 28712
[pilcher@home ~]$
[1]+  Done                    nedit /etc/printcap
[pilcher@home ~]$ oocalc
Segmentation fault
[pilcher@home ~]$ which oocalc
/usr/bin/oocalc
[pilcher@home ~]$ gdb /usr/lib/ooo-1.1/program/soffice.bin
GNU gdb Red Hat Linux (6.1post-1.20040607.43rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols
found)...Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) run
Starting program: /usr/lib/ooo-1.1/program/soffice.bin
(no debugging symbols found)...(no debugging symbols found)...(no debugging
symbols found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols found)...(no
debugging symbols found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols found)...(no
debugging symbols found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols found)...(no
debugging symbols found)...[Thread debugging using libthread_db enabled]
[New Thread -1208305984 (LWP 28776)]
(no debugging symbols found)...(no debugging symbols found)...(no debugging
symbols found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols found)...(no
debugging symbols found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols found)...(no
debugging symbols found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols found)...(no
debugging symbols found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols found)...(no
debugging symbols found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols found)...(no
debugging symbols found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols found)...(no
debugging symbols found)...(nodebugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols found)...(no
debugging symbols found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols found)...(no
debugging symbols found)...(no debugging symbols found)...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208305984 (LWP 28776)]
0x00bb4ae5 in free () from /lib/tls/libc.so.6
(gdb) bt
#0  0x00bb4ae5 in free () from /lib/tls/libc.so.6
#1  0x00fe3c06 in __glXInitialize () from /usr/X11R6/lib/libGL.so.1
#2  0x00fdf3db in __indirect_glCompressedTexSubImage3D ()
   from /usr/X11R6/lib/libGL.so.1
#3  0x00fe0453 in glXGetConfig () from /usr/X11R6/lib/libGL.so.1
#4  0x00e197af in X11SalOpenGL::MakeVisualWeights ()
   from /usr/lib/ooo-1.1/program/libvclplug_gen645li.so
#5  0x00e3d65c in SalDisplay::BestVisual ()
   from /usr/lib/ooo-1.1/program/libvclplug_gen645li.so
#6  0x00e3c33d in SalXLib::Init ()
   from /usr/lib/ooo-1.1/program/libvclplug_gen645li.so
#7  0x00e3bbab in SalData::Init ()
   from /usr/lib/ooo-1.1/program/libvclplug_gen645li.so
#8  0x00e4724c in create_SalInstance ()
   from /usr/lib/ooo-1.1/program/libvclplug_gen645li.so
#9  0x03a3af82 in vcl::SolarThreadExecutor::execute ()
   from /usr/lib/ooo-1.1/program/libvcl645li.so
#10 0x03a3bda7 in CreateSalInstance ()
   from /usr/lib/ooo-1.1/program/libvcl645li.so
#11 0x0387b838 in InitVCL () from /usr/lib/ooo-1.1/program/libvcl645li.so
#12 0x0387b70d in SVMain () from /usr/lib/ooo-1.1/program/libvcl645li.so
#13 0x03a3c03d in main () from /usr/lib/ooo-1.1/program/libvcl645li.so
#14 0x00b67e33 in __libc_start_main () from /lib/tls/libc.so.6
#15 0x08063a41 in _start ()
(gdb)


Comment 6 Tom Lane 2005-03-18 02:19:04 UTC
I'm seeing what seems to be a closely related behavior with
openoffice.org-1.1.3-6.5.0.fc3 on an up-to-date FC3 machine.

I am ssh'ing to it from an old HPUX machine (don't ask) and
typing "ooffice", and what happens is that the HP's X server
instantly locks up :-( and doesn't recover short of being killed.
Meanwhile the ooffice process on the FC3 machine goes away immediately
or nearly so; it does not drop a corefile though.  This works fine
if I use a normal non-SSH X connection.  Also, this did *not*
happen with the immediately prior FC3 version of openoffice.
I don't think the ssh software has changed recently at either end,
either.

One possible clue is that on the HP, I see the X server has spawned
an "ogld" child process before it hangs.  The ogld is pretty wedged
too --- it has to be killed separately.

Comment 7 Caolan McNamara 2005-03-18 09:10:23 UTC
caolanm->ian: That crash is coming from glXGetConfig, from libgl. See
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=139712 for a similiar
looking report. On that issue I attached a sample standalone program. Can you
try running that little demo program over your tunnel instead of OOo and see if
the little testcase fails as well.

caolanm->tgl: Don't know much about that, give the demo program I mention a
whirl though on the hope that ogld is "opengl daemon" or something and that this
is related to libgl on the FC3 side and not really directly an openoffice.org thing

Comment 8 Mike A. Harris 2005-03-18 11:17:52 UTC
Bug #139712 also indicates that newer X.Org fix the issue occuring
in that bug report.  You might want to upgrade to 6.8.2 from fc3-testing
and see if that fixes the problem or not.  It might just go away for free.

Comment 9 Caolan McNamara 2005-03-22 14:22:03 UTC
cmc->Ian: were you able to try the testcase program to see if it also crashed ?

Comment 10 Ian Pilcher 2005-03-22 16:27:00 UTC
Sorry for the delay.  I missed the reference to the test program.

Short answer:  The testcase also crashes when run over the ssh tunnel.

Long answer:  The testcase only crashes when built with optimization (-O).

[pilcher@home temp]$ c99 -O -g -Wall -W -pedantic -o crash crash.c
-L/usr/X11R6/lib -lX11 -lGL
[pilcher@home temp]$ ./crash
Segmentation fault
[pilcher@home temp]$ gdb ./crash
GNU gdb Red Hat Linux (6.1post-1.20040607.43rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
library "/lib/tls/libthread_db.so.1".

(gdb) run
Starting program: /home/pilcher/temp/crash
[Thread debugging using libthread_db enabled]
[New Thread -1208940864 (LWP 32268)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208940864 (LWP 32268)]
0x00bb4ab9 in free () from /lib/tls/libc.so.6
(gdb) bt
#0  0x00bb4ab9 in free () from /lib/tls/libc.so.6
#1  0x006eec06 in __glXInitialize () from /usr/X11R6/lib/libGL.so.1
#2  0x006ea3db in __indirect_glCompressedTexSubImage3D ()
   from /usr/X11R6/lib/libGL.so.1
#3  0x006eb453 in glXGetConfig () from /usr/X11R6/lib/libGL.so.1
#4  0x08048604 in main () at crash.c:22


[pilcher@home temp]$ c99 -g -Wall -W -pedantic -o crash crash.c -L/usr/X11R6/lib
-lX11 -lGL
[pilcher@home temp]$ ./crash
libGL error: XF86DRIAuthConnection failed
libGL error: reverting to (slow) indirect rendering
All OK


Comment 11 Caolan McNamara 2005-03-22 18:40:01 UTC
Well it's not openoffice.org specific, happening with an X lib. Though
optimization specific is interesting.

Comment 12 Tom Lane 2005-03-22 20:59:53 UTC
I tried the testgl.c program shown in bug #139712 on my setup with ssh to HPUX.
I did *not* compile with optimization, just
gcc testgl.c -o testgl -L/usr/X11R6/lib -lX11 -lGL
This is able to lock up my HPUX X server just like the ooffice case.  There is
no core dump on the Fedora side so far as I can tell, but it does print this on
stderr:

Xlib: sequence lost (0x10000 > 0xa) in reply type 0x0!
X Error of failed request:  0
  Major opcode of failed request:  0 ()
  Serial number of failed request:  0
  Current serial number in output stream:  10

This pretty much lets ooffice off the hook: the problem is at the X level.
I'm currently using
xorg-x11-libs-6.8.1-12.FC3.21
xorg-x11-Mesa-libGL-6.8.1-12.FC3.21
Will try updated X libs as suggested by Mike. 

Comment 13 Ian Pilcher 2005-03-29 20:43:19 UTC
I have updated both systems to xorg-x11-6.8.2-1.FC3.13.  Starting oocalc via
an SSH tunnel no longer segfaults.  I now get the same message that I got when
I compiled the test program without optimization:

    [pilcher@charo ~]$ libGL error: XF86DRIAuthConnection failed
    libGL error: reverting to (slow) indirect rendering

After this, things appear to work.

Is this the expected behavior when using a remote display?

Comment 14 Tom Lane 2005-04-04 15:03:52 UTC
I've updated to
openoffice.org-1.1.3-9.5.0.fc3
xorg-x11-6.8.2-1.FC3.13
and I still see the X server freeze over ssh ...

Comment 16 Mike A. Harris 2005-04-20 09:33:59 UTC
Tom:  We've reviewed this report and the initial report filed by Ian,
is about openoffice itself segfaulting, not the X server.  Ian has
indicated that updating both systems to the newer xorg-x11-6.8.2-1.FC3.13
release has resolved this problem for him, so this Ian's bug is resolved
now.  Reading through the comments you've added, it seems that the
problem you are experiencing is unrelated to the problem reported by
Ian, as your problem is about your X server crashing rather than openoffice
crashing.

>I am ssh'ing to it from an old HPUX machine (don't ask) and
>typing "ooffice", and what happens is that the HP's X server
>instantly locks up :-(

If the X server is crashing, then there is a bug in HP's X server if
running an application can cause it to crash.  I'd recommend reporting
that to Hewlett Packard's customer service centre for them to
investigate, as it sounds serious.

Hope this helps.


Comment 17 Mike A. Harris 2005-04-20 09:35:58 UTC
Ian: Ok, thanks for the update.  Hardware acceleration via DRI is only
available when the client is running on the same system as the X server.
Hardware acceleration with remote clients is not implemented in the DRI
and so remote OpenGL clients always use Mesa software fallbacks.

I'm not sure why you are seeing that error message however.  Do you have
any libGL environment variables set by chance?  If you consider the
error message to be a bug, please open a new bug report separately
so we can investigate the error message.

Thanks.

Setting status to "ERRATA", now that problem is confirmed fixed in a
released update.


Note You need to log in before you can comment on or make changes to this bug.