21579 – Applix segvs on startup with glibc-2.2

Bug 21579 - Applix segvs on startup with glibc-2.2

Summary: Applix segvs on startup with glibc-2.2

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	glibc
Sub Component:
Version:	7.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Aaron Brown
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2000-12-01 13:22 UTC by Stephen Tweedie
Modified:	2016-11-24 15:15 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2000-12-08 23:20:36 UTC
Embargoed:

Attachments	(Terms of Use)

Description Stephen Tweedie 2000-12-01 13:22:21 UTC

Applix 4.4.1 (announcing itself as "4.41 (1021.221.13)") segvs on startup
with the glibc-2.2 errata applied.  The entire result of running applix is:

>>>
[sct@spock] ~ $ /opt/applix/applix 
[sct@spock] ~ $ axmain: signal Segmentation fault
axnet error, axmain already started. Try Again.
axmain: signal Segmentation fault
>>>

glibc-2.1.94-3 is fine.

Comment 1 Stephen Tweedie 2000-12-01 14:43:04 UTC

Additional info, in case it helps to track down an obvious error:

The failure is always in the same place: doing a stat.  The ltrace shows

[pid 4803] fopen("/net/opt/applix/axdata/fontmetri"..., "r") = 0x084c3938
[pid 4803] fileno(0x084c3938)                     = 11
[pid 4803] _fxstat(1, 11, 0xbfffd17c, 0x082e0030, 21 <unfinished ...>
[pid 4803] --- SIGSEGV (Segmentation fault) ---

and the corresponding strace gives

[pid  4852]
open("/net/opt/applix/axdata/fontmetrics/gallium/fs/aplxfont/fonts.dir",
O_RDONLY) = 11
[pid  4852] --- SIGSEGV (Segmentation fault) ---

so the _fxstat is failing before it gets as far as the system call.  Using
glibc-2.1.94-3, the ltrace shows

[pid 4833] fopen("/net/opt/applix/axdata/fontmetri"..., "r") = 0x084c3938
[pid 4833] fileno(0x084c3938)                     = 11
[pid 4833] _fxstat(1, 11, 0xbfffd17c, 0x082e0030, 21) = 0

at the same location --- the arguments are precisely the same but the call
fails.

Comment 2 Jakub Jelinek 2000-12-01 14:56:49 UTC

Can you please run gdb on the generated core?
It looks like __fxstat with vers == _STAT_VER_KERNEL which does just:
  if (vers == _STAT_VER_KERNEL)
    return INLINE_SYSCALL (fstat, 2, fd, CHECK_1 ((struct kernel_stat *) buf));
so I'm interested if you could find out crashing $pc and disas that routine
to make sure we're looking at the same code.

Comment 3 Stephen Tweedie 2000-12-01 15:29:55 UTC

There's no core file --- axmain has set up a sig11 handler.

However, sending the process a SIGSTOP lets me attach gdb to it, and I can get
as far as the segv that way.  I see:

>>>>
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0 in ?? ()
(gdb) bt
#0  0x0 in ?? ()
#1  0x83082f4 in fstat ()
#2  0x8307510 in FontFileReadDirectory ()
#3  0x82e0046 in FontFileInitFPE ()
[ lots more application stack frames ]
>>>>

which is calling fstat from deep in font server country.  Dumping the args list
for frame 1 on the stack shows the args (0x0000000b, 0x08307510), so we look
like we are in the right territory.  The fstat disassembly begins
(gdb) disas 0x83082f4
Dump of assembler code for function fstat:
0x83082e0 <fstat>:	push   %ebp
0x83082e1 <fstat+1>:	mov    %esp,%ebp
0x83082e3 <fstat+3>:	push   %esi
0x83082e4 <fstat+4>:	push   %ebx
0x83082e5 <fstat+5>:	mov    0x8(%ebp),%ebx
0x83082e8 <fstat+8>:	mov    0xc(%ebp),%esi
0x83082eb <fstat+11>:	push   %esi
0x83082ec <fstat+12>:	push   %ebx
0x83082ed <fstat+13>:	push   $0x1
0x83082ef <fstat+15>:	call   0x80918b4 <_fxstat>
0x83082f4 <fstat+20>:	add    $0xc,%esp

so we're definitely seeing the _fxstat entry point here.  _fxstat itself is

(gdb) disas 0x80918b4
Dump of assembler code for function _fxstat:
0x80918b4 <_fxstat>:	jmp    *0x8424b08
0x80918ba <_fxstat+6>:	push   $0x8b8
0x80918bf <_fxstat+11>:	jmp    0x8090734 <_init+52>

but

(gdb) x 0x8424b08
0x8424b08 <__DTOR_END__+1132>:	0x00000000
(gdb) 

so no wonder we have jumped into hyperspace.  The memory around 0x8424b08 all
looks valid so this isn't a complete memory wipe we're looking at.  Running from
scratch with a breakpoint at main shows that the jump vector at 0x8424b08 points
to <_fxstat+6> initially, so it looks like we're doing a fixup to NULL at some
point.

Comment 4 Stephen Tweedie 2000-12-01 15:41:40 UTC

Btw, according to the lstat, this is the process's first invocation of  _fxstat,
although there were several prior successful _xstat calls.

Comment 5 Jakub Jelinek 2000-12-01 16:11:14 UTC

_fxstat (and likewise _xstat and _lxstat) are not exported from glibc since
glibc 2.1, so I wonder why it worked with earlier glibc's.
Is Applix glibc 2.0 application, right? Even then, it is pretty strange,
because /usr/lib/libc_nonshared.a in RH5.2 referenced __fxstat (that symbol
is exported from glibc), so probably Applix just used its own magic fstat or
whatever (because libc_nonshared used vers 3 (aka _STAT_VER_LINUX) while
Applix uses vers 1 (aka _STAT_VER_KERNEL).
Can you perhaps run it both with glibc 2.2 and glibc-2.1.94-3 with
LD_DEBUG=all set in environment?

Comment 6 Stephen Tweedie 2000-12-01 16:23:25 UTC

glibc-2.1.94-3:

05099:	symbol=_fxstat;  lookup in file=/net/opt/applix/axdata/axmain
05099:	symbol=_fxstat;  lookup in file=/lib/libNoVersion.so.1
05099:	binding file /net/opt/applix/axdata/axmain to /lib/libNoVersion.so.1:
normal symbol `_fxstat'

glibc-2.2-5:

05071:	symbol=_fxstat;  lookup in file=/net/opt/applix/axdata/axmain
05071:	symbol=_fxstat;  lookup in file=/usr/X11R6.local/lib/libX11.so.6
05071:	symbol=_fxstat;  lookup in file=/lib/libdl.so.2
05071:	symbol=_fxstat;  lookup in file=/lib/libcrypt.so.1
05071:	symbol=_fxstat;  lookup in file=/usr/lib/libstdc++.so.2.8
05071:	symbol=_fxstat;  lookup in file=/lib/libm.so.6
05071:	symbol=_fxstat;  lookup in file=/lib/libc.so.6
05071:	symbol=_fxstat;  lookup in file=/lib/ld-linux.so.2
<then continues with next symbol --- no bind happens>

So we are finding _fxstat in /lib/libNoVersion.so.1 in glibc-2.1.94, but we
aren't even looking in that library with glibc-2.2.  _fxstat _is_ in the
libNoVersion for glibc-2.2-5, but ld.so isn't looking there.  I can attach the
full debug file from 2.2 if you want: it's only 140k compressed.

Comment 7 Jakub Jelinek 2000-12-01 17:20:10 UTC

I'll look at the libNoVersion loading code over the weekend.
Thanks for tracing this down.

Comment 8 Stephen Tweedie 2000-12-01 18:07:59 UTC

Do you want the LD_DEBUG log?

Comment 9 Jakub Jelinek 2000-12-04 14:07:01 UTC

No, I think I've nailed it down (_dl_map_object interface is changing all
the time), will see after glibc-2.2-6 build (after I nail down one more issue).

Comment 10 David Woodhouse 2000-12-08 22:44:44 UTC

ETA?

Comment 11 Jakub Jelinek 2000-12-08 23:20:32 UTC

Most probably tomorrow will push it through the build system,
I spent today some time fixing other glibc issues and with
the exception of one unreproduceable report all are fixed.

Comment 12 Jakub Jelinek 2000-12-19 09:34:33 UTC

Fixed in glibc-2.2-9 errata.

Note You need to log in before you can comment on or make changes to this bug.