Bug 11404

Summary: Xaw/Xt application coredumps regularly; didn't before
Product: [Retired] Red Hat Raw Hide Reporter: Jonathan Kamens <jik>
Component: XFree86Assignee: Bernhard Rosenkraenzer <bero>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 1.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2000-05-30 18:27:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch for three different Xaw memory bugs none

Description Jonathan Kamens 2000-05-14 03:21:00 UTC
Xrn started coredumping for me regularly when I upgraded to XFree86
4.0-0.8.  Nothing about xrn changed, and I've run it under memory debuggers
for weeks at a time in the past without any problems.  I can only assume
that one of the libraries in XFree86 4.0-0.8 has a bug that wasn't in
previous XFree86 versions.

I've looked at the stack traces from a couple of crashes and they seem to
occur in different locations.  If we make the assumption that there's only
one bug rather than several, then it seems likely that the bug is a memory
corruption bug which thus causes a coredump at some point later after it
occurs.

I'll try to find the time to compile the X libraries with debugging enabled
and see if I can track down the problem.  In the meantime, I thought I
should let you know about it.

Comment 1 Jonathan Kamens 2000-05-15 00:18:59 UTC
I compiled the X libraries with debugging and ran xrn against the debugging
libraries.  I got a stack trace, but I can't make much sense of it because the
top frame on the stack is corrupt (which suggests memory corruption, the theory
I mentioned previously):

#0  0x80cf7e0 in ?? ()
#1  0x400434b7 in DestroyVScrollBar (ctx=0x81239a0) at Text.c:834
#2  0x400485f9 in XawTextDestroy (w=0x81239a0) at Text.c:3576
#3  0x40095e30 in Phase2Destroy (widget=0x81239a0) at Destroy.c:154
#4  0x40095c3e in Recursive (widget=0x81239a0, proc=0x40095d40 <Phase2Destroy>)
    at Destroy.c:88
#5  0x40096199 in XtPhase2Destroy (widget=0x81239a0) at Destroy.c:271
#6  0x4009629d in _XtDoPhase2Destroy (app=0x80914e8, dispatch_level=1)
    at Destroy.c:318
#7  0x4009ac5a in XtDispatchEvent (event=0xbffff7e8) at Event.c:1508
#8  0x4009b075 in XtAppMainLoop (app=0x80914e8) at Event.c:1644
#9  0x8060e2f in main (argc=1, argv=0xbffff8d4) at xrn.c:164

I'm going to try linking against static X libraries; sometimes that causes more
helpful stack traces to be produced when there's a crash.

Comment 2 Jonathan Kamens 2000-05-15 01:10:59 UTC
There appears to be at least one compiler bug causing the problems I am seeing.
I have gcc-2.95.3-0.20000323 installed, and I was able to duplicate the crashes
when I compiled XFree86 for myself.  I therefore suspect that whatever was used
to compile XFRee86-4.0-0.8 is either the same gcc version or another one with
the same bug.  Here is why I suspect that there's a compiler bug....

I linked xrn against the static X libraries which I compiled with -g and against
ElectricFence.  It got a segv inside UpdateTextInLine in xc/lib/Xaw/Text.c.
When I examined why, I found that the variable "line" in this function had a
value higher than the number of lines available in the Text widget.  However,
this value is different from the value gdb says is being passed into the
function, and this difference shows up at the very first line of the function,
before anything is done that could trash the value!  I.e.:

Breakpoint 1, UpdateTextInLine (ctx=0x412aae38, line=14, x1=124, x2=680)
    at Text.c:1781
(gdb) print line
$1 = 42
(gdb)

Note that line 1781 is the first line of the function UpdateTextInLine, so the
value of "line" should be identical to what gdb says was passed into it, but it
isn't.

When I recompile Text.o with "-O" instead of "-02", the problem still occurs.
However, when I remove "-O" completely, the problem goes away.

When I remove the automatic initialization of the "lt" variable in
UpdateTextInLine and replace it with "lt = ctx->text.lt.info + line" as the
first line of the function, the problem goes away for "-O", but it still
persists with "-O2".

I am not enough of a compiler hacker to be able to debug this any further.  If
there is any additional information I can provide to help debug it, please let
me know.

Comment 3 Jonathan Kamens 2000-05-15 14:17:59 UTC
While I still believe that there is a compiler bug, I also just found an Xaw bug
after compiling Xaw without optimization to eliminate the compiler bug.  The
function UpdateTextInLine in xc/lib/Xaw/Text.c goes past the end of an array
when updating the last line in a text widget.  Here's a patch:

--- xc/lib/Xaw/Text.c~	Sun Aug 22 09:23:49 1999
+++ xc/lib/Xaw/Text.c	Mon May 15 10:12:58 2000
@@ -1791,7 +1791,9 @@
     XawTextSinkFindPosition(ctx->text.sink, lt->position,
 			    from_x, x1 - from_x,
 			    False, &left, &width, &height);
-    if (x2 >= lt->textWidth - from_x)
+    if (line == ctx->text.lt.lines)
+	right = -1;
+    else if (x2 >= lt->textWidth - from_x)
 	right = lt[1].position - 1;
     else {
 	from_x += width;
@@ -1800,7 +1802,7 @@
 				False, &right, &width, &height);
     }

-    if (right + 1 <= lt[1].position)
+    if ((right < 0) || (right + 1 <= lt[1].position))
 	++right;

     /* Mark text interval to be repainted */

Comment 4 Jonathan Kamens 2000-05-15 20:29:59 UTC
Curiouser and curiouser.  I've already discovered a compiler bug and an Xaw bug
in the course of tracking down the crashes which prompted me to file this bug
report.  Now I've come full circle -- despite fixing the Xaw bug and installing
libraries with a workaround for the compiler bug (I installed libraries compiled
with just "-g"), I'm back at the original segfault when I link xrn against
efence:

Starting program: /c/build/xrn/xrn

  Electric Fence 2.0.5 Copyright (C) 1987-1998 Bruce Perens.

Program received signal SIGSEGV, Segmentation fault.
0x40051f91 in TextSinkResize (w=0x4326ef8c) at Text.c:3057
(gdb) where
#0  0x40051f91 in TextSinkResize (w=0x4326ef8c) at Text.c:3057
#1  0x4004c753 in DestroyVScrollBar (ctx=0x433c3e38) at Text.c:834
#2  0x40052fcc in XawTextDestroy (w=0x433c3e38) at Text.c:3578
#3  0x400ad65d in Phase2Destroy (widget=0x433c3e38) at Destroy.c:154
#4  0x400ad39f in Recursive (widget=0x433c3e38,
    proc=0x400ad4c4 <Phase2Destroy>) at Destroy.c:88
#5  0x400adaab in XtPhase2Destroy (widget=0x433c3e38) at Destroy.c:271
#6  0x400adc4e in _XtDoPhase2Destroy (app=0x40344dc8, dispatch_level=1)
    at Destroy.c:318
#7  0x400b3b7c in XtDispatchEvent (event=0xbffff7e8) at Event.c:1508
#8  0x400b407f in XtAppMainLoop (app=0x40344dc8) at Event.c:1644
#9  0x8060edf in main (argc=1, argv=0xbffff8d4) at xrn.c:164

I have thus far been unable to track it down, but I'm pretty sure there's a real
bug here.

Comment 5 Jonathan Kamens 2000-05-16 02:08:59 UTC
OK, I found and fixed the bug leading to the crash whose stack trace I gave in
my last comments.  Then I encountered another crash, which I found and fixed.  I
will attach a patch file containing all of my current Xaw fixes, for the three
different bugs I've found, but note that I still think there's a good chance
that there's a compiler bug as I described earlier.  Whether or not that's the
case, the patch I'll attach should still be applied.

Comment 6 Jonathan Kamens 2000-05-16 02:09:59 UTC
Created attachment 245 [details]
Patch for three different Xaw memory bugs

Comment 7 Jonathan Kamens 2000-05-17 12:54:59 UTC
OK, I'm going to back off on the "compiler bug" claim.  After installing a new
Xaw library with the three fixes I've already submitted to you and once again
compiling the Xaw Text.c with "-O2 -m486 -fno-strength-reduce", I'm no longer
seeing the problem I reported previously with UpdateTextInLine.  So I suspect
that one of the three memory bugs I fixed was stomping on memory and and making
it seem as if the compiler was messing up.  If I discover more I'll let you
know.

Comment 8 Bernhard Rosenkraenzer 2000-06-15 16:00:19 UTC
Fixed in 4.0-21