Bug 754517

Summary: tex segfaults on 64bit
Product: [Fedora] Fedora Reporter: Jeremiah <JMiahMan>
Component: texliveAssignee: Jindrich Novy <jnovy>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16CC: emmanuel.kowalski, jnovy, kklic, lorenzo.buzzi, mitchell, pertusus, pknirsch
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: texlive-2007-66.fc16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-21 16:59:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 756977    
Attachments:
Description Flags
Build Log from Mock
none
Root Log from Mock none

Description Jeremiah 2011-11-16 17:49:18 UTC
Created attachment 534063 [details]
Build Log from Mock

Description of problem:

tex segfaults when building dvipdfm on a 64bit build with:
+ tex dvipdfm
This is TeX, Version 3.141592 (Web2C 7.5.6)
/var/tmp/rpm-tmp.aDHHn2: line 59: 16512 Segmentation fault      tex dvipdfm

Version-Release number of selected component (if applicable):
2007-65.fc16

How reproducible:
try to build dvipdfm in a 64bit chroot, 32bit builds fine, but 64bit tex will segfault

Steps to Reproduce:
1. use Mock to try and build a 64bit dvipdfm package
  
Actual results:
+ tex dvipdfm
This is TeX, Version 3.141592 (Web2C 7.5.6)
/var/tmp/rpm-tmp.aDHHn2: line 59: 16512 Segmentation fault      tex dvipdfm

Expected results:
package build successfully and tex doesn't segfault

Additional info:

Comment 1 Jeremiah 2011-11-16 17:49:47 UTC
Created attachment 534064 [details]
Root Log from Mock

Comment 2 Lorenzo Buzzi 2011-11-17 07:51:29 UTC
I confirm the existance of this issue.
On a fresh clean installation of Fedora 16 x86-64, I tried:
+ tex <file.tex>
and
+ the system check in Kile

Both fails with segmentation fault.

I am available to send any further information on order to solve.

Comment 3 Lorenzo Buzzi 2011-11-29 15:15:34 UTC
The issue is still present even after updating from texlive-2007-38 to texlive-2007-40 (run 'yum update' today).

Comment 4 Karel Klíč 2011-11-29 17:35:34 UTC
/usr/bin/tex crashes on all input files I have tried (texlive-2007-65.fc16.x86_64). Backtrace is always the same:

#0  __memcpy_sse2 () at ../sysdeps/x86_64/memcpy.S:168
#1  0x00007ffff7604b8d in _IO_file_xsgetn (fp=0x9a8040, data=<optimized out>, n=176) at fileops.c:1427
#2  0x00007ffff75f8fe3 in _IO_fread (buf=<optimized out>, size=8, count=22, fp=0x9a8040) at iofread.c:44
#3  0x00000000004385cd in fread (__stream=<optimized out>, __n=22, __size=8, __ptr=0x7ffef6874018) at /usr/include/bits/stdio2.h:287
#4  do_undump (p=0x7ffef6874018 <Address 0x7ffef6874018 out of bounds>, item_size=8, nitems=22, in_file=<optimized out>) at texextra.c:1831
#5  0x00000000004086b9 in loadfmtfile () at texini.c:3073
#6  0x000000000040c79d in mainbody () at texini.c:4317
#7  0x0000000000401c4e in main (ac=<optimized out>, av=<optimized out>) at texextra.c:349

This is blocking builds of packages that use TeX, such as emacs-auctex.

Comment 5 Karel Klíč 2011-11-30 10:13:58 UTC
Valgrind output:
==13555== Memcheck, a memory error detector
==13555== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==13555== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==13555== Command: tex emacs-reference-booklet-3.tex
==13555== 
This is TeX, Version 3.141592 (Web2C 7.5.6)
==13555== Warning: set address range perms: large range [0x3941a040, 0xb9f8bb50) (undefined)
==13555== Invalid write of size 1
==13555==    at 0x5359E13: __GI_memcpy (memcpy.S:168)
==13555==    by 0x5343B8C: _IO_file_xsgetn (fileops.c:1427)
==13555==    by 0x5337FE2: fread (iofread.c:44)
==13555==    by 0x4385CC: do_undump (stdio2.h:287)
==13555==    by 0x4086B8: loadfmtfile (texini.c:3073)
==13555==    by 0x40C79C: mainbody (texini.c:4317)
==13555==    by 0x401C4D: main (texextra.c:349)
==13555==  Address 0xffffffffb941a048 is not stack'd, malloc'd or (recently) free'd
==13555== 
==13555== 
==13555== Process terminating with default action of signal 11 (SIGSEGV)
==13555==  Access not within mapped region at address 0xFFFFFFFFB941A048
==13555==    at 0x5359E13: __GI_memcpy (memcpy.S:168)
==13555==    by 0x5343B8C: _IO_file_xsgetn (fileops.c:1427)
==13555==    by 0x5337FE2: fread (iofread.c:44)
==13555==    by 0x4385CC: do_undump (stdio2.h:287)
==13555==    by 0x4086B8: loadfmtfile (texini.c:3073)
==13555==    by 0x40C79C: mainbody (texini.c:4317)
==13555==    by 0x401C4D: main (texextra.c:349)
==13555==  If you believe this happened as a result of a stack
==13555==  overflow in your program's main thread (unlikely but
==13555==  possible), you can try to increase the size of the
==13555==  main thread stack using the --main-stacksize= flag.
==13555==  The main thread stack size used in this run was 8388608.
==13555== 
==13555== HEAP SUMMARY:
==13555==     in use at exit: 2,168,109,218 bytes in 76,499 blocks
==13555==   total heap usage: 121,288 allocs, 44,789 frees, 2,171,444,993 bytes allocated
==13555== 
==13555== LEAK SUMMARY:
==13555==    definitely lost: 2,097 bytes in 120 blocks
==13555==    indirectly lost: 768 bytes in 58 blocks
==13555==      possibly lost: 0 bytes in 0 blocks
==13555==    still reachable: 2,168,106,353 bytes in 76,321 blocks
==13555==         suppressed: 0 bytes in 0 blocks
==13555== Rerun with --leak-check=full to see details of leaked memory
==13555== 
==13555== For counts of detected and suppressed errors, rerun with: -v
==13555== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)

Comment 6 Karel Klíč 2011-11-30 10:15:19 UTC
strace:

access("/var/lib/texmf/web2c/tex/tex.fmt", R_OK) = 0
stat("/var/lib/texmf/web2c/tex/tex.fmt", {st_mode=S_IFREG|0644, st_size=247113, ...}) = 0
open("/var/lib/texmf/web2c/tex/tex.fmt", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=247113, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d34ec7000
read(3, "W2TX\0\0\0\4tex\0\7\251^\327\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17"..., 4096) = 4096
mmap(NULL, 1724416, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d34cd5000
mmap(NULL, 1728512, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4d342b4000
mmap(NULL, 2159484928, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4cb3742000
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4cb3641000
read(3, "\0\0)\343\0\0)\353\0\0*\17\0\0*9\0\0*k\0\0*\223\0\0*\303\0\0*\361"..., 4096) = 4096
read(3, "\0\0k\307\0\0k\316\0\0k\327\0\0k\342\0\0k\351\0\0k\362\0\0k\375\0\0l\6"..., 4096) = 4096
mmap(NULL, 2002944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4cb3458000
read(3, "naltyrelpenaltypredisplaypenalty"..., 24576) = 24576
read(3, "shcong@vereqnotinc@ncelrightleft"..., 4096) = 4096
--- {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f4c33742018} (Segmentation fault) ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

Comment 7 Karel Klíč 2011-11-30 11:03:17 UTC
Debugging:

[karel@redhat 09]$ gdb -args tex emacs-reference-booklet-3.tex 
GNU gdb (GDB) Fedora (7.3.50.20110722-10.fc16)

(gdb) break texini.c:3073
Breakpoint 1 at 0x40869d: file texini.c, line 3073.

(gdb) run
Starting program: /usr/bin/tex emacs-reference-booklet-3.tex
This is TeX, Version 3.141592 (Web2C 7.5.6)

Breakpoint 1, loadfmtfile () at texini.c:3073
3073	      undumpthings ( mem [p ], q + 2 - p ) ;
(gdb) p mem
$1 = (memoryword *) 0x7ffef6874018
(gdb) p q
$2 = 20
(gdb) p q + 2 - p
$3 = 22
(gdb) p mem[p]
Cannot access memory at address 0x7ffef6874018
(gdb) p mem
$4 = (memoryword *) 0x7ffef6874018
(gdb) p *mem
Cannot access memory at address 0x7ffef6874018
(gdb) p memmax
$5 = 1499999
(gdb) p memmin
$6 = 268435455
(gdb) p zmem
$7 = (memoryword *) 0x7ffef6874018
(gdb) p extramembot
$8 = -268435455
(gdb) p extramemtop
$9 = 0
(gdb) p yzmem
$10 = (memoryword *) 0x7fff76874010
(gdb) p mem
$11 = (memoryword *) 0x7ffef6874018
(gdb) p membot
$12 = 0
(gdb) p memtop
$13 = 1499999

extramembot obviously should not be negative.
Second run:

[karel@redhat 09]$ gdb -args tex emacs-reference-booklet-3.tex 
GNU gdb (GDB) Fedora (7.3.50.20110722-10.fc16)

(gdb) run
Starting program: /usr/bin/tex emacs-reference-booklet-3.tex

Breakpoint 1, main (ac=2, av=0x7fffffffe328) at texextra.c:340
340	{
(gdb) p extramembot
$1 = 0
(gdb) watch extramembot
Hardware watchpoint 2: extramembot
(gdb) cont
Continuing.
Hardware watchpoint 2: extramembot

Old value = 0
New value = -268435455
0x0000000000402066 in initialize () at texini.c:123
123	    mubytecswrite [i ]= -268435455L ;

texini.c:

122  {register integer for_end; i = 0 ;for_end = 128 ; if ( i <= for_end) do 
123    mubytecswrite [i ]= -268435455L ;
124   while ( i++ < for_end ) ;} 
125  mubytekeep = 0 ;

It writes 129 integers to mubytecswrite.
However, mubytecswrite is only 128 integers long!

texd.h, halfword is 32-bit int:
EXTERN halfword mubytecswrite[128]  ;
EXTERN integer mubyteskip  ;

The program would work if memory layout remains unchanged by compiler -- if mubyteskip would be located right after mubytecswrite, because it is zeroed on 
texini.c:125.

Comment 8 Karel Klíč 2011-11-30 11:09:33 UTC
So mubytecswrite initialization overwrites the value 0 in extramembot by storing -268435455 there.

The variables are neighbouring in memory indeed:

[karel@redhat ~]$ ls -l /usr/lib/debug/.build-id/e8/bcf8e080af0d967912f1dc219d2aee0e39e171
lrwxrwxrwx 1 root root 19 Nov 29 19:21 /usr/lib/debug/.build-id/e8/bcf8e080af0d967912f1dc219d2aee0e39e171 -> ../../../../bin/tex

[karel@redhat ~]$ eu-readelf --symbols /usr/lib/debug/.build-id/e8/bcf8e080af0d967912f1dc219d2aee0e39e171.debug

237: 000000000064d518      4 OBJECT  GLOBAL DEFAULT       25 pagetail
238: 000000000064d520    512 OBJECT  GLOBAL DEFAULT       25 mubytecswrite
239: 000000000064d720      4 OBJECT  GLOBAL DEFAULT       25 extramembot
240: 000000000064d724      4 OBJECT  GLOBAL DEFAULT       25 inputptr

Comment 9 Karel Klíč 2011-11-30 11:20:27 UTC
I think it might be caused by having the newer builds compiled with
LDFLAGS='-Wl,-z,relro '

relro option for linker reorders the sections in the RW segment, which breaks TeX's memory layout assumptions.

Checking...

Comment 10 Jindrich Novy 2011-11-30 11:42:53 UTC
Well, it is actually caused by off-by-one in the .ch files. It is now fixed in rawhide. F16 will be soon.

Comment 11 Karel Klíč 2011-11-30 12:26:23 UTC
Ok, thanks.

Comment 12 Jeff Mitchell 2011-12-06 14:48:51 UTC
Just ran into this today.

Karel, thanks for all the work you did debugging.

Jindrich, looking forward to those packages  :-)

Comment 13 Fedora Update System 2011-12-10 09:41:29 UTC
texlive-2007-66.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/texlive-2007-66.fc16

Comment 14 Fedora Update System 2011-12-11 21:59:45 UTC
Package texlive-2007-66.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing texlive-2007-66.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-16995/texlive-2007-66.fc16
then log in and leave karma (feedback).

Comment 15 Fedora Update System 2011-12-21 16:59:10 UTC
texlive-2007-66.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.