Bug 183304

Summary: does not build on ppc64
Product: [Fedora] Fedora Reporter: Jens Petersen <petersen>
Component: emacsAssignee: Chip Coldwell <coldwell>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: jakub, roland, varekova
Target Milestone: ---   
Target Release: ---   
Hardware: ppc64   
OS: Linux   
Whiteboard:
Fixed In Version: 21.4-15 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-07-26 17:50:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
backport from upstream emacs-22; patch to fix unexelf.c (unexec) on ppc64 none

Description Jens Petersen 2006-02-28 02:42:23 UTC
Description of problem:
emacs doesn't currently build on ppc64 with the current toolchain.
Do we need a ppc64 build of emacs actually, or is emacs.ppc sufficient?

How reproducible:
every time

Steps to Reproduce:
1. ppc64$ rpmbuild -bb emacs.spec

Actual results:
Wrote /usr/src/redhat/BUILD/emacs-21.4/lib-src/fns-21.4.1.el
Dumping under names emacs and emacs-21.4.1
1142336 pure bytes used
./emacs -q -batch -f list-load-path-shadows
Fatal error (6).make[1]: *** [emacs] Aborted
make[1]: Leaving directory `/usr/src/redhat/BUILD/emacs-21.4/src'
(export PARALLEL; PARALLEL=0; cd leim; /usr/bin/make all - --jobserver-fds=3,4 -j \
  CC='gcc' CFLAGS='-DMAIL_USE_LOCKF -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mminimal-toc'
CPPFLAGS='-D_BSD_SOURCE  \' \
  LDFLAGS='-L/usr/lib64' MAKE='/usr/bin/make')
make[1]: Entering directory `/usr/src/redhat/BUILD/emacs-21.4/leim'
if [ -d quail ]; then true; else make quail; fi
if [ -f quail/CCDOSPY.elc ]; then true; else \
 EMACSLOADPATH=/usr/src/redhat/BUILD/emacs-21.4/leim/../lisp ../src/emacs -batch
--no-init-file --no-site-file --multibyte -l
/usr/src/redhat/BUILD/emacs-21.4/leim/../lisp/internati\onal/titdic-cnv \
  --eval '(batch-titdic-convert t)' -dir quail
/usr/src/redhat/BUILD/emacs-21.4/leim/CXTERM-DIC; fi
Fatal error (6)./bin/sh: line 1: 16151 Aborted                
EMACSLOADPATH=/usr/src/redhat/BUILD/emacs-21.4/leim/../lisp ../src/emacs -batch
--no-init-file --no-site-file --multib\yte -l
/usr/src/redhat/BUILD/emacs-21.4/leim/../lisp/international/titdic-cnv --eval
'(batch-titdic-convert t)' -dir quail
/usr/src/redhat/BUILD/emacs-21.4/leim/CXTERM-DIC
make[1]: *** [quail/CCDOSPY.elc] Error 134
make[1]: Leaving directory `/usr/src/redhat/BUILD/emacs-21.4/leim'
make: *** [leim] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.57421 (%build)

Expected results:
successful build

Additional info:
The last successful build was emacs-21.4-5 apparently (May 2005).

# gdb /usr/src/redhat/BUILD/emacs-21.4/src/emacs
GNU gdb Red Hat Linux (6.3.0.0-1.34rh)
:
This GDB was configured as "ppc64-redhat-linux-gnu"...Using host libthread_db
library "/lib64/libthread_db.so.1".

(gdb) r
Starting program: /usr/src/redhat/BUILD/emacs-21.4/src/emacs

Program received signal SIGABRT, Aborted.
0x00000080006f1348 in .__kill () from /lib64/libc.so.6
(gdb) bt
#0  0x00000080006f1348 in .__kill () from /lib64/libc.so.6
#1  0x00000000100bdc3c in abort () at emacs.c:387
#2  0x0000000010123c10 in wrong_type_argument (predicate=1152921504879138344,
    value=8385533672634017024) at data.c:117
#3  0x00000000100e4880 in Fget_buffer (name=6) at buffer.c:268
#4  0x00000000100e5604 in Fget_buffer_create (name=8385533672634017024)
    at buffer.c:338
#5  0x000000001002745c in message_dolog (m=0x101ab1e0 "", nbytes=0, nlflag=1,
    multibyte=0) at xdisp.c:5657
#6  0x00000000100bf190 in main (argc=Variable "argc" is not available.
) at emacs.c:1317

Comment 1 Chip Coldwell 2006-07-18 20:31:24 UTC
Finally got a ppc64 system running FC6 (rawhide).  The origin of this bug is
pretty deep.  Here's the scenario.

Vmessages_buffer_name is declared in src/xdisp.c:474 as

static Lisp_Object Vmessages_buffer_name;

where a Lisp_Object is defined in src/lisp.h:221 as

typedef
union Lisp_Object
  {
    /* Used for comparing two Lisp_Objects;
       also, positive integers can be accessed fast this way.  */
    EMACS_INT i;

    struct
      {
	EMACS_INT val  : VALBITS;
	EMACS_INT type : GCTYPEBITS + 1;
      } s;
    struct
      {
	EMACS_UINT val : VALBITS;
	EMACS_INT type : GCTYPEBITS + 1;
      } u;
    struct
      {
	EMACS_UINT val		: VALBITS;
	enum Lisp_Type type	: GCTYPEBITS;
	/* The markbit is not really part of the value of a Lisp_Object,
	   and is always zero except during garbage collection.  */
	EMACS_UINT markbit	: 1;
      } gu;
  }
Lisp_Object;

Now set the environment up just like make did and run emacs under gdb:

# export EMACSLOADPATH=/usr/src/redhat/BUILD/emacs-21.4/leim/../lisp
# gdb ../src/emacs
GNU gdb Red Hat Linux (6.5-3.fc6rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "ppc64-redhat-linux-gnu"...Using host libthread_db
library "/lib64/libthread_db.so.1".

(gdb) set args -batch --no-init-file --no-site-file --multibyte -l
/usr/src/redhat/BUILD/emacs-21.4/leim/../lisp/international/titdic-cnv --eval
'(batch-titdic-convert t)' -dir quail
/usr/src/redhat/BUILD/emacs-21.4/leim/CXTERM-DIC
(gdb) break main
Breakpoint 1 at 0x100b8324: file emacs.c, line 714.
(gdb) run
Starting program: /usr/src/redhat/BUILD/emacs-21.4/src/emacs -batch
--no-init-file --no-site-file --multibyte -l
/usr/src/redhat/BUILD/emacs-21.4/leim/../lisp/international/titdic-cnv --eval
'(batch-titdic-convert t)' -dir quail
/usr/src/redhat/BUILD/emacs-21.4/leim/CXTERM-DIC

Breakpoint 1, main (argc=12, argv=0xfffffe89808, envp=0xfffffe89870)
    at emacs.c:714
714       int skip_args = 0;
(gdb) p/x Vmessages_buffer_name
$1 = 0x6c5f706f73697469

Eventually, this is the value that makes trouble.  Now, it seems to me that a
variable declared static (even if it is a union) without an explicit initializer
should be initialized to zero.  It's in the BSS section, right?

(gdb) p/x &Vmessages_buffer_name
$2 = 0x101dd108

Now let's see what our binary says:

# objdump -h emacs
[
emacs:     file format elf64-powerpc

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
 ... ]
 22 .data         004c0070  00000000101dcf90  00000000101dcf90  001ccf90  2**4
                  CONTENTS, ALLOC, LOAD, DATA
 23 .bss          00000000  000000001069d000  000000001069d000  0068d000  2**0
                  ALLOC
[ ... ]

Goodness! Vmessages_buffer_name has an address in the .data section, not the .bss!

Could this be a compiler bug?

Chip


Comment 2 Jakub Jelinek 2006-07-18 20:43:46 UTC
I bet that the emacs binary you are running is the dumped one already.
And, emacs during its dumping makes turns the .bss section into .data and
saves there whatever values that memory contained during dumping.

Comment 3 Chip Coldwell 2006-07-19 14:34:26 UTC
(In reply to comment #2)
> I bet that the emacs binary you are running is the dumped one already.
> And, emacs during its dumping makes turns the .bss section into .data and
> saves there whatever values that memory contained during dumping.

It seems to be more complicated than that.  I ran gdb on a i386 emacs
binary, and Vmessages_buffer_name is in the .data section of that one,
too.  Also, it is initialized to a reasonable value:

(gdb) p/x Vmessages_buffer_name
$3 = 0x382dc354
(gdb) x/s ((struct Lisp_String *)(Vmessages_buffer_name & ((1 << 28) - 1)))->data
0x82ad940:       "*Messages*"

(N.B. the most significant 4 bits of a Lisp_Object are a type tag.)  I can't
figure out how it is initialized, because as far as I can tell the function
responsible for doing so, syms_of_xdisp, hasn't been executed at the point
where I was checking this value.  Still digging.

Chip


Comment 4 Chip Coldwell 2006-07-19 17:43:20 UTC
(In reply to comment #3)
> I can't
> figure out how it is initialized, because as far as I can tell the function
> responsible for doing so, syms_of_xdisp, hasn't been executed at the point
> where I was checking this value.

RMS is too clever by half.  Emacs builds its own C-Run Time, ecrt0.o, and then
links with the -nostdlib switch.  gdb doesn't trace through the crtbegin/crtend
stuff.  That's why I'm not able to catch the initialization of 
Vmessages_buffer_name.

Ugh.

Chip


Comment 5 Chip Coldwell 2006-07-19 18:11:01 UTC
(In reply to comment #4)
> 
> Emacs builds its own C-Run Time, ecrt0.o, and then
> links with the -nostdlib switch. 

I take it back.  ecrt0.o is only built for some architectures, not including
ppc64.  However, emacs is linked with -nostdlib, and there must be something
funny going on to initialize all these variables.

Chip


Comment 6 Chip Coldwell 2006-07-19 19:34:39 UTC
(In reply to comment #2)
> I bet that the emacs binary you are running is the dumped one already.
> And, emacs during its dumping makes turns the .bss section into .data and
> saves there whatever values that memory contained during dumping.

Actually, you were right.  And, in fact, it seems like it is the process
of dumping that is messing up.  If I run 

TERM=vt100 ./temacs -l loadup

it works just fine.  But if I run

./temacs -batch -l loadup dump

the resulting emacs binary has problems.  With gcc not optimizing, I get
this error from emacs

# ./emacs
emacs: Wrong type argument: stringp, -336799784416090785

The argument it is complaining about is exactly the Vmessages_buffer_name
mentioned earlier.  This variable is correctly initialized by temacs when
I debug it.

Ugh.

Chip


Comment 7 Chip Coldwell 2006-07-20 15:15:13 UTC
(In reply to comment #6)
> 
> Actually, you were right.  And, in fact, it seems like it is the process
> of dumping that is messing up.

In fact, if I strip the temacs binary before running

./temacs -l loadup -batch dump

it will seg-fault at line 947 of unexelf.c as it unexec's the temacs binary.
This is during the process of inserting the new data section (that was
created from the old bss) just ahead of a new (empty) bss section.

Chip


Comment 8 Chip Coldwell 2006-07-21 17:55:40 UTC
After more playing around I discovered that I could get a seg-fault from

./temacs -l loadup -batch dump

on i386 if I build temacs without compiler optimization.

The command above seg-faults in the unexec function (file unexelf.c) while 
executing this line:

          memcpy (NEW_SECTION_H (nn).sh_offset + new_base,
                  (caddr_t) OLD_SECTION_H (n).sh_addr,
                  new_data2_size);

I unrolled the memcpy thus:

          p = NEW_SECTION_H (nn).sh_offset + new_base;
          q = (caddr_t) OLD_SECTION_H (n).sh_addr;
          for(i=0; i<new_data2_size; i++)

            p[i] = q[i];

ran the debugger and found the segfault happens when

(gdb) p/x q+i
$5 = 0x82f0000
(gdb) p/x i
$8 = 0xed160
(gdb) p/x new_bss_addr
$10 = 0x852a000

In the meantime, if I look in /proc/[PID]/maps I find this:

08048000-081fa000 r-xp 00000000 fd:00 1311728
/home/coldwell/rpm/BUILD/emacs-21.4/src/temacs
081fa000-08203000 rw-p 001b2000 fd:00 1311728
/home/coldwell/rpm/BUILD/emacs-21.4/src/temacs
08203000-082f0000 rw-p 08203000 00:00 0 
08337000-0852a000 rw-p 08337000 00:00 0          [heap]
b730e000-b7da3000 rw-p b730e000 00:00 0

The problem is that the Linux kernel has set up the process virtual
memory with a hole in it, and when the memcpy steps into this hole, it
seg-faults.

This is probably also the origin of the bug on PPC64.

Chip


Comment 9 Chip Coldwell 2006-07-26 17:27:26 UTC
Created attachment 133080 [details]
backport from upstream emacs-22; patch to fix unexelf.c (unexec) on ppc64

Comment 10 Chip Coldwell 2006-07-26 17:30:37 UTC
(In reply to comment #8)
>
> The problem is that the Linux kernel has set up the process virtual
> memory with a hole in it, and when the memcpy steps into this hole, it
> seg-faults.
> 
> This is probably also the origin of the bug on PPC64.

Not true.  The hole in the process virtual address space is the work of
ExecShield; it is avoided during the normal rpm build process on i386/x86_64
by specifying "setarch -R make ...".

The real problem was that PowerPC has the .plt in the .bss section.  I
backported a patch from the upstream emacs-22 that seems to fix the problem.
A brew build is in progress as I write this.

Chip



> 
> Chip
> 



Comment 11 Chip Coldwell 2006-07-26 17:50:28 UTC
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=135258

emacs-21.4-15 built.