Applix 4.4.1 (announcing itself as "4.41 (1021.221.13)") segvs on startup
with the glibc-2.2 errata applied. The entire result of running applix is:
[sct@spock] ~ $ /opt/applix/applix
[sct@spock] ~ $ axmain: signal Segmentation fault
axnet error, axmain already started. Try Again.
axmain: signal Segmentation fault
glibc-2.1.94-3 is fine.
Additional info, in case it helps to track down an obvious error:
The failure is always in the same place: doing a stat. The ltrace shows
[pid 4803] fopen("/net/opt/applix/axdata/fontmetri"..., "r") = 0x084c3938
[pid 4803] fileno(0x084c3938) = 11
[pid 4803] _fxstat(1, 11, 0xbfffd17c, 0x082e0030, 21 <unfinished ...>
[pid 4803] --- SIGSEGV (Segmentation fault) ---
and the corresponding strace gives
O_RDONLY) = 11
[pid 4852] --- SIGSEGV (Segmentation fault) ---
so the _fxstat is failing before it gets as far as the system call. Using
glibc-2.1.94-3, the ltrace shows
[pid 4833] fopen("/net/opt/applix/axdata/fontmetri"..., "r") = 0x084c3938
[pid 4833] fileno(0x084c3938) = 11
[pid 4833] _fxstat(1, 11, 0xbfffd17c, 0x082e0030, 21) = 0
at the same location --- the arguments are precisely the same but the call
Can you please run gdb on the generated core?
It looks like __fxstat with vers == _STAT_VER_KERNEL which does just:
if (vers == _STAT_VER_KERNEL)
return INLINE_SYSCALL (fstat, 2, fd, CHECK_1 ((struct kernel_stat *) buf));
so I'm interested if you could find out crashing $pc and disas that routine
to make sure we're looking at the same code.
There's no core file --- axmain has set up a sig11 handler.
However, sending the process a SIGSTOP lets me attach gdb to it, and I can get
as far as the segv that way. I see:
Program received signal SIGSEGV, Segmentation fault.
0x0 in ?? ()
#0 0x0 in ?? ()
#1 0x83082f4 in fstat ()
#2 0x8307510 in FontFileReadDirectory ()
#3 0x82e0046 in FontFileInitFPE ()
[ lots more application stack frames ]
which is calling fstat from deep in font server country. Dumping the args list
for frame 1 on the stack shows the args (0x0000000b, 0x08307510), so we look
like we are in the right territory. The fstat disassembly begins
(gdb) disas 0x83082f4
Dump of assembler code for function fstat:
0x83082e0 <fstat>: push %ebp
0x83082e1 <fstat+1>: mov %esp,%ebp
0x83082e3 <fstat+3>: push %esi
0x83082e4 <fstat+4>: push %ebx
0x83082e5 <fstat+5>: mov 0x8(%ebp),%ebx
0x83082e8 <fstat+8>: mov 0xc(%ebp),%esi
0x83082eb <fstat+11>: push %esi
0x83082ec <fstat+12>: push %ebx
0x83082ed <fstat+13>: push $0x1
0x83082ef <fstat+15>: call 0x80918b4 <_fxstat>
0x83082f4 <fstat+20>: add $0xc,%esp
so we're definitely seeing the _fxstat entry point here. _fxstat itself is
(gdb) disas 0x80918b4
Dump of assembler code for function _fxstat:
0x80918b4 <_fxstat>: jmp *0x8424b08
0x80918ba <_fxstat+6>: push $0x8b8
0x80918bf <_fxstat+11>: jmp 0x8090734 <_init+52>
(gdb) x 0x8424b08
0x8424b08 <__DTOR_END__+1132>: 0x00000000
so no wonder we have jumped into hyperspace. The memory around 0x8424b08 all
looks valid so this isn't a complete memory wipe we're looking at. Running from
scratch with a breakpoint at main shows that the jump vector at 0x8424b08 points
to <_fxstat+6> initially, so it looks like we're doing a fixup to NULL at some
Btw, according to the lstat, this is the process's first invocation of _fxstat,
although there were several prior successful _xstat calls.
_fxstat (and likewise _xstat and _lxstat) are not exported from glibc since
glibc 2.1, so I wonder why it worked with earlier glibc's.
Is Applix glibc 2.0 application, right? Even then, it is pretty strange,
because /usr/lib/libc_nonshared.a in RH5.2 referenced __fxstat (that symbol
is exported from glibc), so probably Applix just used its own magic fstat or
whatever (because libc_nonshared used vers 3 (aka _STAT_VER_LINUX) while
Applix uses vers 1 (aka _STAT_VER_KERNEL).
Can you perhaps run it both with glibc 2.2 and glibc-2.1.94-3 with
LD_DEBUG=all set in environment?
05099: symbol=_fxstat; lookup in file=/net/opt/applix/axdata/axmain
05099: symbol=_fxstat; lookup in file=/lib/libNoVersion.so.1
05099: binding file /net/opt/applix/axdata/axmain to /lib/libNoVersion.so.1:
normal symbol `_fxstat'
05071: symbol=_fxstat; lookup in file=/net/opt/applix/axdata/axmain
05071: symbol=_fxstat; lookup in file=/usr/X11R6.local/lib/libX11.so.6
05071: symbol=_fxstat; lookup in file=/lib/libdl.so.2
05071: symbol=_fxstat; lookup in file=/lib/libcrypt.so.1
05071: symbol=_fxstat; lookup in file=/usr/lib/libstdc++.so.2.8
05071: symbol=_fxstat; lookup in file=/lib/libm.so.6
05071: symbol=_fxstat; lookup in file=/lib/libc.so.6
05071: symbol=_fxstat; lookup in file=/lib/ld-linux.so.2
<then continues with next symbol --- no bind happens>
So we are finding _fxstat in /lib/libNoVersion.so.1 in glibc-2.1.94, but we
aren't even looking in that library with glibc-2.2. _fxstat _is_ in the
libNoVersion for glibc-2.2-5, but ld.so isn't looking there. I can attach the
full debug file from 2.2 if you want: it's only 140k compressed.
I'll look at the libNoVersion loading code over the weekend.
Thanks for tracing this down.
Do you want the LD_DEBUG log?
No, I think I've nailed it down (_dl_map_object interface is changing all
the time), will see after glibc-2.2-6 build (after I nail down one more issue).
Most probably tomorrow will push it through the build system,
I spent today some time fixing other glibc issues and with
the exception of one unreproduceable report all are fixed.
Fixed in glibc-2.2-9 errata.