Bug 754517 - tex segfaults on 64bit
Summary: tex segfaults on 64bit
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: texlive
Version: 16
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: Jindrich Novy
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 756977
TreeView+ depends on / blocked
 
Reported: 2011-11-16 17:49 UTC by Jeremiah
Modified: 2013-07-02 23:53 UTC (History)
7 users (show)

Fixed In Version: texlive-2007-66.fc16
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-21 16:59:10 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Build Log from Mock (15.29 KB, text/x-log)
2011-11-16 17:49 UTC, Jeremiah
no flags Details
Root Log from Mock (44.99 KB, text/x-log)
2011-11-16 17:49 UTC, Jeremiah
no flags Details

Description Jeremiah 2011-11-16 17:49:18 UTC
Created attachment 534063 [details]
Build Log from Mock

Description of problem:

tex segfaults when building dvipdfm on a 64bit build with:
+ tex dvipdfm
This is TeX, Version 3.141592 (Web2C 7.5.6)
/var/tmp/rpm-tmp.aDHHn2: line 59: 16512 Segmentation fault      tex dvipdfm

Version-Release number of selected component (if applicable):
2007-65.fc16

How reproducible:
try to build dvipdfm in a 64bit chroot, 32bit builds fine, but 64bit tex will segfault

Steps to Reproduce:
1. use Mock to try and build a 64bit dvipdfm package
  
Actual results:
+ tex dvipdfm
This is TeX, Version 3.141592 (Web2C 7.5.6)
/var/tmp/rpm-tmp.aDHHn2: line 59: 16512 Segmentation fault      tex dvipdfm

Expected results:
package build successfully and tex doesn't segfault

Additional info:

Comment 1 Jeremiah 2011-11-16 17:49:47 UTC
Created attachment 534064 [details]
Root Log from Mock

Comment 2 Lorenzo Buzzi 2011-11-17 07:51:29 UTC
I confirm the existance of this issue.
On a fresh clean installation of Fedora 16 x86-64, I tried:
+ tex <file.tex>
and
+ the system check in Kile

Both fails with segmentation fault.

I am available to send any further information on order to solve.

Comment 3 Lorenzo Buzzi 2011-11-29 15:15:34 UTC
The issue is still present even after updating from texlive-2007-38 to texlive-2007-40 (run 'yum update' today).

Comment 4 Karel Klíč 2011-11-29 17:35:34 UTC
/usr/bin/tex crashes on all input files I have tried (texlive-2007-65.fc16.x86_64). Backtrace is always the same:

#0  __memcpy_sse2 () at ../sysdeps/x86_64/memcpy.S:168
#1  0x00007ffff7604b8d in _IO_file_xsgetn (fp=0x9a8040, data=<optimized out>, n=176) at fileops.c:1427
#2  0x00007ffff75f8fe3 in _IO_fread (buf=<optimized out>, size=8, count=22, fp=0x9a8040) at iofread.c:44
#3  0x00000000004385cd in fread (__stream=<optimized out>, __n=22, __size=8, __ptr=0x7ffef6874018) at /usr/include/bits/stdio2.h:287
#4  do_undump (p=0x7ffef6874018 <Address 0x7ffef6874018 out of bounds>, item_size=8, nitems=22, in_file=<optimized out>) at texextra.c:1831
#5  0x00000000004086b9 in loadfmtfile () at texini.c:3073
#6  0x000000000040c79d in mainbody () at texini.c:4317
#7  0x0000000000401c4e in main (ac=<optimized out>, av=<optimized out>) at texextra.c:349

This is blocking builds of packages that use TeX, such as emacs-auctex.

Comment 5 Karel Klíč 2011-11-30 10:13:58 UTC
Valgrind output:
==13555== Memcheck, a memory error detector
==13555== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==13555== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==13555== Command: tex emacs-reference-booklet-3.tex
==13555== 
This is TeX, Version 3.141592 (Web2C 7.5.6)
==13555== Warning: set address range perms: large range [0x3941a040, 0xb9f8bb50) (undefined)
==13555== Invalid write of size 1
==13555==    at 0x5359E13: __GI_memcpy (memcpy.S:168)
==13555==    by 0x5343B8C: _IO_file_xsgetn (fileops.c:1427)
==13555==    by 0x5337FE2: fread (iofread.c:44)
==13555==    by 0x4385CC: do_undump (stdio2.h:287)
==13555==    by 0x4086B8: loadfmtfile (texini.c:3073)
==13555==    by 0x40C79C: mainbody (texini.c:4317)
==13555==    by 0x401C4D: main (texextra.c:349)
==13555==  Address 0xffffffffb941a048 is not stack'd, malloc'd or (recently) free'd
==13555== 
==13555== 
==13555== Process terminating with default action of signal 11 (SIGSEGV)
==13555==  Access not within mapped region at address 0xFFFFFFFFB941A048
==13555==    at 0x5359E13: __GI_memcpy (memcpy.S:168)
==13555==    by 0x5343B8C: _IO_file_xsgetn (fileops.c:1427)
==13555==    by 0x5337FE2: fread (iofread.c:44)
==13555==    by 0x4385CC: do_undump (stdio2.h:287)
==13555==    by 0x4086B8: loadfmtfile (texini.c:3073)
==13555==    by 0x40C79C: mainbody (texini.c:4317)
==13555==    by 0x401C4D: main (texextra.c:349)
==13555==  If you believe this happened as a result of a stack
==13555==  overflow in your program's main thread (unlikely but
==13555==  possible), you can try to increase the size of the
==13555==  main thread stack using the --main-stacksize= flag.
==13555==  The main thread stack size used in this run was 8388608.
==13555== 
==13555== HEAP SUMMARY:
==13555==     in use at exit: 2,168,109,218 bytes in 76,499 blocks
==13555==   total heap usage: 121,288 allocs, 44,789 frees, 2,171,444,993 bytes allocated
==13555== 
==13555== LEAK SUMMARY:
==13555==    definitely lost: 2,097 bytes in 120 blocks
==13555==    indirectly lost: 768 bytes in 58 blocks
==13555==      possibly lost: 0 bytes in 0 blocks
==13555==    still reachable: 2,168,106,353 bytes in 76,321 blocks
==13555==         suppressed: 0 bytes in 0 blocks
==13555== Rerun with --leak-check=full to see details of leaked memory
==13555== 
==13555== For counts of detected and suppressed errors, rerun with: -v
==13555== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)

Comment 6 Karel Klíč 2011-11-30 10:15:19 UTC
strace:

access("/var/lib/texmf/web2c/tex/tex.fmt", R_OK) = 0
stat("/var/lib/texmf/web2c/tex/tex.fmt", {st_mode=S_IFREG|0644, st_size=247113, ...}) = 0
open("/var/lib/texmf/web2c/tex/tex.fmt", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=247113, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d34ec7000
read(3, "W2TX\0\0\0\4tex\0\7\251^\327\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17"..., 4096) = 4096
mmap(NULL, 1724416, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d34cd5000
mmap(NULL, 1728512, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d342b4000
mmap(NULL, 2159484928, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4cb3742000
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4cb3641000
read(3, "\0\0)\343\0\0)\353\0\0*\17\0\0*9\0\0*k\0\0*\223\0\0*\303\0\0*\361"..., 4096) = 4096
read(3, "\0\0k\307\0\0k\316\0\0k\327\0\0k\342\0\0k\351\0\0k\362\0\0k\375\0\0l\6"..., 4096) = 4096
mmap(NULL, 2002944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4cb3458000
read(3, "naltyrelpenaltypredisplaypenalty"..., 24576) = 24576
read(3, "shcong@vereqnotinc@ncelrightleft"..., 4096) = 4096
--- {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f4c33742018} (Segmentation fault) ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

Comment 7 Karel Klíč 2011-11-30 11:03:17 UTC
Debugging:

[karel@redhat 09]$ gdb -args tex emacs-reference-booklet-3.tex 
GNU gdb (GDB) Fedora (7.3.50.20110722-10.fc16)

(gdb) break texini.c:3073
Breakpoint 1 at 0x40869d: file texini.c, line 3073.

(gdb) run
Starting program: /usr/bin/tex emacs-reference-booklet-3.tex
This is TeX, Version 3.141592 (Web2C 7.5.6)

Breakpoint 1, loadfmtfile () at texini.c:3073
3073	      undumpthings ( mem [p ], q + 2 - p ) ;
(gdb) p mem
$1 = (memoryword *) 0x7ffef6874018
(gdb) p q
$2 = 20
(gdb) p q + 2 - p
$3 = 22
(gdb) p mem[p]
Cannot access memory at address 0x7ffef6874018
(gdb) p mem
$4 = (memoryword *) 0x7ffef6874018
(gdb) p *mem
Cannot access memory at address 0x7ffef6874018
(gdb) p memmax
$5 = 1499999
(gdb) p memmin
$6 = 268435455
(gdb) p zmem
$7 = (memoryword *) 0x7ffef6874018
(gdb) p extramembot
$8 = -268435455
(gdb) p extramemtop
$9 = 0
(gdb) p yzmem
$10 = (memoryword *) 0x7fff76874010
(gdb) p mem
$11 = (memoryword *) 0x7ffef6874018
(gdb) p membot
$12 = 0
(gdb) p memtop
$13 = 1499999

extramembot obviously should not be negative.
Second run:

[karel@redhat 09]$ gdb -args tex emacs-reference-booklet-3.tex 
GNU gdb (GDB) Fedora (7.3.50.20110722-10.fc16)

(gdb) run
Starting program: /usr/bin/tex emacs-reference-booklet-3.tex

Breakpoint 1, main (ac=2, av=0x7fffffffe328) at texextra.c:340
340	{
(gdb) p extramembot
$1 = 0
(gdb) watch extramembot
Hardware watchpoint 2: extramembot
(gdb) cont
Continuing.
Hardware watchpoint 2: extramembot

Old value = 0
New value = -268435455
0x0000000000402066 in initialize () at texini.c:123
123	    mubytecswrite [i ]= -268435455L ;

texini.c:

122  {register integer for_end; i = 0 ;for_end = 128 ; if ( i <= for_end) do 
123    mubytecswrite [i ]= -268435455L ;
124   while ( i++ < for_end ) ;} 
125  mubytekeep = 0 ;

It writes 129 integers to mubytecswrite.
However, mubytecswrite is only 128 integers long!

texd.h, halfword is 32-bit int:
EXTERN halfword mubytecswrite[128]  ;
EXTERN integer mubyteskip  ;

The program would work if memory layout remains unchanged by compiler -- if mubyteskip would be located right after mubytecswrite, because it is zeroed on 
texini.c:125.

Comment 8 Karel Klíč 2011-11-30 11:09:33 UTC
So mubytecswrite initialization overwrites the value 0 in extramembot by storing -268435455 there.

The variables are neighbouring in memory indeed:

[karel@redhat ~]$ ls -l /usr/lib/debug/.build-id/e8/bcf8e080af0d967912f1dc219d2aee0e39e171
lrwxrwxrwx 1 root root 19 Nov 29 19:21 /usr/lib/debug/.build-id/e8/bcf8e080af0d967912f1dc219d2aee0e39e171 -> ../../../../bin/tex

[karel@redhat ~]$ eu-readelf --symbols /usr/lib/debug/.build-id/e8/bcf8e080af0d967912f1dc219d2aee0e39e171.debug

237: 000000000064d518      4 OBJECT  GLOBAL DEFAULT       25 pagetail
238: 000000000064d520    512 OBJECT  GLOBAL DEFAULT       25 mubytecswrite
239: 000000000064d720      4 OBJECT  GLOBAL DEFAULT       25 extramembot
240: 000000000064d724      4 OBJECT  GLOBAL DEFAULT       25 inputptr

Comment 9 Karel Klíč 2011-11-30 11:20:27 UTC
I think it might be caused by having the newer builds compiled with
LDFLAGS='-Wl,-z,relro '

relro option for linker reorders the sections in the RW segment, which breaks TeX's memory layout assumptions.

Checking...

Comment 10 Jindrich Novy 2011-11-30 11:42:53 UTC
Well, it is actually caused by off-by-one in the .ch files. It is now fixed in rawhide. F16 will be soon.

Comment 11 Karel Klíč 2011-11-30 12:26:23 UTC
Ok, thanks.

Comment 12 Jeff Mitchell 2011-12-06 14:48:51 UTC
Just ran into this today.

Karel, thanks for all the work you did debugging.

Jindrich, looking forward to those packages  :-)

Comment 13 Fedora Update System 2011-12-10 09:41:29 UTC
texlive-2007-66.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/texlive-2007-66.fc16

Comment 14 Fedora Update System 2011-12-11 21:59:45 UTC
Package texlive-2007-66.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing texlive-2007-66.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-16995/texlive-2007-66.fc16
then log in and leave karma (feedback).

Comment 15 Fedora Update System 2011-12-21 16:59:10 UTC
texlive-2007-66.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.