Bug 127134

Summary: vnc server crashes with xorg
Product: [Fedora] Fedora Reporter: John Rayburn <johnrayburn>
Component: vncAssignee: Tim Waugh <twaugh>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 2CC: tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-29 11:14:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vnc log file from crash
none
patch to make vnc use fb instead of cfb* none

Description John Rayburn 2004-07-02 13:45:02 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040207 Firefox/0.8

Description of problem:
We use vnc to peer program at our company.  We are all running Fedora
Core 2.  The following is how we do this:

 vncserver -AlwaysShared -geometry 1600x1200
 vncviewer 127.0.0.1:1

So I connect to my local vncserver and another developer connects.

However, after a few minutes the vncserver crashes, leaving no Xvnc
processes running, but the locks files remain.   

Version-Release number of selected component (if applicable):
vnc-server-4.0-1.beta4.11 and vnc-server-4.0-3

How reproducible:
Always

Steps to Reproduce:
1. Start vncserver in AlwaysShared mode
2. Connect one or more clients
3. Server crashes usually within an hour the first time, and then
crashes within several minutes after that
    

Actual Results:  Xvnc process crashes.  Log file found below.

Expected Results:  Xvnc should not crash

Additional info:

I've done a fair bit of searching, and I have seen that people have
had crashes when running on Fedora Core 2 and some seem possibly
related to the Xorg usage, however I've found no fixes.  I tried
installing the binaries from realvnc.com, and the crashes continue,
both on my machine and when connected to another machine.

Any help would be greatly appreciated.  If there is any more
information that I can provide, please let me know. 

We are often using the Eclipse development platform when the crash
occurs, but no specific relationship seems to exist.

Comment 1 John Rayburn 2004-07-02 13:47:12 UTC
Created attachment 101597 [details]
vnc log file from crash

This is the vnc log file.  Are there any other files of use to help diagnose?

Comment 2 Tim Waugh 2004-07-02 14:09:43 UTC
Yes.  Do you have a core file? (Don't attach it, I just need to know
if you have it.)

Please fetch and install the vnc-debuginfo package for the version you
are using at the moment.  For 4.0-3 you can find it here:

http://download.fedora.redhat.com/pub/fedora/linux/core/development/i386/debug/

If you have a core file, run:

gdb /usr/bin/Xvnc core.xxx (where core.xxx is your core file)

then at the (gdb) prompt type 'bt' and press enter.  It that output
that I need to see.

If you don't have a core file, you'll need to attach gdb to the
process ID of the Xvnc process.  You can find this out from the System
Monitor utility in the System Tools part of the main menu on the
panel.  Once you have the process ID, run this command in a terminal
that is *not* inside the VNC session:

gdb /usr/bin/Xvnc 5241 (use the real process ID instead of 5241)

Then at the (gdb) prompt type 'c' and press enter.  Once VNC crashes
again, you should find this session is back at the (gdb) prompt: type
'bt' and press enter at this stage.

Let me know if I'm not being clear.  Thanks.

Comment 3 John Rayburn 2004-07-02 15:44:49 UTC
Results from gdb (the instructions were good):

GNU gdb Red Hat Linux (6.0post-0.20040223.19rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host
libthread_db library "/lib/tls/libthread_db.so.1".
 
Attaching to program: /usr/bin/Xvnc, process 18917
Error while mapping shared library sections:
: Success.
Error while reading shared library symbols:
: No such file or directory.
Reading symbols from /usr/lib/libfreetype.so.6...done.
Loaded symbols for /usr/lib/libfreetype.so.6
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
0x0043e402 in ?? ()
(gdb) c
Continuing.
 
Program received signal SIGSEGV, Segmentation fault.
0x00172242 in _int_free () from /lib/tls/libc.so.6
(gdb) bt
#0  0x00172242 in _int_free () from /lib/tls/libc.so.6
#1  0x0017373b in free () from /lib/tls/libc.so.6
#2  0x080b3cf8 in cfb16DestroyPixmap (pPixmap=0x229780) at cfbpixmap.c:109
#3  0x0810e420 in ShmDestroyPixmap (pPixmap=0x229780) at shm.c:325
#4  0x0804fdc9 in FreeResource (id=37812699, skipDeleteFuncType=0) at
resource.c:546
#5  0x080637e6 in ProcFreePixmap (client=0xa2e7bf0) at dispatch.c:1530
#6  0x08068c42 in Dispatch () at dispatch.c:454
#7  0x0804ef48 in main (argc=20, argv=0xfee684a4, envp=0xf81020e8) at
main.c:440
#8  0x00125ad4 in __libc_start_main () from /lib/tls/libc.so.6
#9  0x0804e441 in _start ()
(gdb)


Comment 4 Tim Waugh 2004-07-02 16:06:12 UTC
Thanks.  This crash is happening in the X code (xorg-x11/XFree86)
rather than from VNC-specific code, so it's difficult to trace why it
is happening unfortunately.

Does it make any different what colour depth you use for the server? 
Do the crashes continue if you start it with:

vncserver -AlwaysShared -geometry 1600x1200 -depth 24

for example?

Comment 5 Bastien Nocera 2004-07-20 10:12:47 UTC
The problem also occurs with the latest vnc (4.0 final) compiled on RHEL3.

#0  0xb7438ea5 in _int_free () from /lib/tls/libc.so.6
#1  0xb7437e68 in free () from /lib/tls/libc.so.6
#2  0x080aff48 in cfb16DestroyPixmap (pPixmap=0xb74fc380) at
cfbpixmap.c:109
#3  0x081083a4 in ShmDestroyPixmap (pPixmap=0x841c628) at shm.c:317
#4  0x08062865 in dixDestroyPixmap (value=0xf8100078, pid=12620644)
    at dispatch.c:1472
#5  0x0804f41b in FreeResource (id=12620644, skipDeleteFuncType=0)
    at resource.c:533
#6  0x08062a86 in ProcFreePixmap (client=0x8387250) at dispatch.c:1531
#7  0x08060d57 in Dispatch () at dispatch.c:450
#8  0x0804e100 in main (argc=19, argv=0xbffff264, envp=0xf8100078)
    at main.c:435

We can reproduce the crash by using IBM WSAD (a modified Eclipse),
creating a new Java project, typing a text such as "public class
Wibble {}". The class name will get underlined, play a lot with the
mouse and highlighting, and it will eventually crash.

Dan Berrange managed to get a better backtrace with ElectricFence:
#0  cfb16ClippedLineCopy (pDrawable=0x1, pGC=0xb2315002, x1=-2, y1=16,
x2=0, y2=18, boxp=0xb22ecff4, shorten=1) at cfb8lineCO.c:1474
#1  0x080b7e38 in cfb16LineSS1Rect (pDrawable=0xb2314f3c,
pGC=0xb16b8f8c, mode=0, npt=41, pptInit=0xb140d38c) at cfb8lineCO.c:1220
#2  0x080636cb in ProcPolyLine (client=0xb2bccf28) at dispatch.c:1849
#3  0x08060a97 in Dispatch () at dispatch.c:450
#4  0x0804ddf0 in main (argc=19, argv=0xbfffc204, envp=0xb2315002) at
main.c:435
#5  0xb73d2748 in __libc_start_main () from /lib/tls/libc.so.6
#6  0x0804d861 in _start ()

The problem happens in all colourdepths.

Comment 6 Daniel Berrangé 2004-07-20 10:34:23 UTC
The precise steps to reproduce are

 * Launch VNC
 * Launch WebSphere Application Developer 5.1
 * Create a new Java Project
 * Create a new Java file, eg Foo.java
 * Type in some code with a delibrate mistake, eg a class name that
doesn't match the file name:

    public class Wibble {
    }

   WSAD will underline the word 'Wibble' with a red wavey line

 * Now highlight part of the word Wibble with the mouse from right to
left. eg start highlight at letter 'l' and move mouse to the left
towards letter 'W'.

The server will die with SIG-11 during this highlight operation.

Comment 7 Daniel Berrangé 2004-07-20 10:57:26 UTC
I've uploaded a Core dump & XScope traffic log illustrating the crash to:

http://people.redhat.com/berrange/vnc/

NB, the crash is occuring in PolyLine request with sequence number
00008e6e. 

NB2, the XScope log is a little misleading, since there is a bug
preventing it from displaying -ve co-ordinates in the request. For the
first 8 co-ordinates from the request 0008e63, the 'x' value is
actually -ve - check with the core dump to verify

The crash occurs during processing of the line segment between points
8 & 9, ie the line segment from

(-2,16) to (0, 18)


Comment 8 Tim Waugh 2004-07-20 11:06:48 UTC
Can anyone reproduce this problem inside Xnest (rather than VNC)?

Comment 9 Bastien Nocera 2004-07-20 11:12:31 UTC
From the backtrace generated by Daniel, it looks like the crash would
be in cfbrrop.h line 260:
#define RROP_SOLID(dst)     (*(dst) = DoRRop (*(dst), rrop_and, rrop_xor))


Comment 10 Bastien Nocera 2004-07-20 12:51:13 UTC
I haven't managed to reproduce the bug with Xnest running under
ElectricFence (and I tried hard...)

Comment 12 Tim Waugh 2004-07-20 13:11:51 UTC
Needs analysis by someone with more X knowledge.  Mike: any ideas?

For the record, Daniel's Xvnc.21168.core.gz was created by
vnc-server-4.0-0.beta4.1.1.i386.rpm.

Comment 13 John Rayburn 2004-07-28 13:55:20 UTC
I was wondering if any this bug any new news was found on this bug, 
or if any bugs against x/xorg had been filed so I could track it.

Thanks!

John Rayburn

Comment 14 Tim Waugh 2004-07-29 15:03:28 UTC
It's not yet clear which package is at fault.  mharris: any input?

Comment 15 Daniel Berrangé 2004-08-06 13:03:21 UTC
FYI, In reference to comment #10 - looking at the Imakefile in
xc/programs/Xserver, the Xnest server does not include the mfb/cfb
subdirectories (where the troublesome code is located) in its build,
which explains why the bug couldn't be reproduced with Xnest. The
other X server types do, however, all seem to include the mfb/cfb
directories.

Comment 16 Tim Waugh 2004-08-06 14:01:14 UTC
It was my suggestion to try Xnest: the idea was to try to take VNC out
of the equation and see if the bug remained.

Comment 17 John Rayburn 2004-08-10 13:38:04 UTC
Since Xnest does not use the troublesome code, what can we do next to
pin this down?  Sorry if this comment is useless, just trying to see
how I can help.

Comment 18 Tim Waugh 2004-08-10 13:52:57 UTC
I think it needs someone with X expertise to take a close look at the
stack trace and offer a clue about where to look next.

Comment 19 John Rayburn 2004-08-10 14:52:33 UTC
Is there another site, x related maybe, where I can submit this bug to
get X expertise in on it?

Comment 23 Mike A. Harris 2004-08-15 01:31:49 UTC
Changing OS release/version to Fedora Core 2, as indicated in bug
report comments above.

Comment 27 Kristian Høgsberg 2004-08-24 18:17:14 UTC
Created attachment 103034 [details]
patch to make vnc use fb instead of cfb*

I was able to reproduce this with plain eclipse 3.0 as described in #6 - right
clicking on the selected 'Wibble' is also a quick way to crash Xvnc it seems. 
I tried debugging this for a little while but the cfb (color framebuffer)
module in the X.org code that Xvnc uses is basically intractable, and actively
being phased out in favor of Keith Packards generic fb implementation.

So instead of actually fixing the bug in cfb, I chicked out and changed Xvnc to
use the fb implementation instead.  This fixes the problem for me; I wasn't
able to crash Xvnc with the patch applied.  The fb code is designed to be a
drop-in replacement for the mfb and cfb layers and the patch to Xvnc is pretty
small.	Even so, this is definitely something we should try to get upstream to
the http://realvnc.com guys, but it would be cool if people could test this a
bit first.

I've attached a patch to the RPM package and I've put some RPMs up here:

  http://freedesktop.org/~krh/vnc

not sure if they work with FC2 though, they were compiled with gcc/g++ 3.4.1
from rawhide.

Comment 28 Tim Waugh 2004-08-26 15:10:55 UTC
Current rawhide now includes this patch (thanks!), in vnc-4.0-4.

I have also made test packages for Red Hat Enterprise Linux 3
available, here:

ftp://people.redhat.com/twaugh/tmp/vnc/

Please try these packages and say whether they fix the problem for
you.  I also need to know if there are any other regressions.

Comment 29 Tim Waugh 2004-08-26 16:37:56 UTC
NB: a problem has been discovered with 4.0-0.beta4.1.2, and I will be
making a new test package soon.

Comment 30 Tim Waugh 2004-08-26 21:57:08 UTC
Please try 4.0-0.beta4.1.3 from the same location.

Comment 31 Tim Waugh 2004-09-29 11:14:33 UTC
4.0-5 should fix it.

Comment 32 John Flanagan 2004-12-20 21:56:38 UTC
An advisory has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-461.html


Comment 33 Billy Biggs 2005-02-10 00:08:50 UTC
Just curious, was this information ever passed upstream?  I've had
another report of Xvnc crashing when using Eclipse on the Eclipse
bugzilla.