Bug 183304
Summary: | does not build on ppc64 | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jens Petersen <petersen> | ||||
Component: | emacs | Assignee: | Chip Coldwell <coldwell> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | rawhide | CC: | jakub, roland, varekova | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | ppc64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 21.4-15 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-07-26 17:50:28 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jens Petersen
2006-02-28 02:42:23 UTC
Finally got a ppc64 system running FC6 (rawhide). The origin of this bug is pretty deep. Here's the scenario. Vmessages_buffer_name is declared in src/xdisp.c:474 as static Lisp_Object Vmessages_buffer_name; where a Lisp_Object is defined in src/lisp.h:221 as typedef union Lisp_Object { /* Used for comparing two Lisp_Objects; also, positive integers can be accessed fast this way. */ EMACS_INT i; struct { EMACS_INT val : VALBITS; EMACS_INT type : GCTYPEBITS + 1; } s; struct { EMACS_UINT val : VALBITS; EMACS_INT type : GCTYPEBITS + 1; } u; struct { EMACS_UINT val : VALBITS; enum Lisp_Type type : GCTYPEBITS; /* The markbit is not really part of the value of a Lisp_Object, and is always zero except during garbage collection. */ EMACS_UINT markbit : 1; } gu; } Lisp_Object; Now set the environment up just like make did and run emacs under gdb: # export EMACSLOADPATH=/usr/src/redhat/BUILD/emacs-21.4/leim/../lisp # gdb ../src/emacs GNU gdb Red Hat Linux (6.5-3.fc6rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "ppc64-redhat-linux-gnu"...Using host libthread_db library "/lib64/libthread_db.so.1". (gdb) set args -batch --no-init-file --no-site-file --multibyte -l /usr/src/redhat/BUILD/emacs-21.4/leim/../lisp/international/titdic-cnv --eval '(batch-titdic-convert t)' -dir quail /usr/src/redhat/BUILD/emacs-21.4/leim/CXTERM-DIC (gdb) break main Breakpoint 1 at 0x100b8324: file emacs.c, line 714. (gdb) run Starting program: /usr/src/redhat/BUILD/emacs-21.4/src/emacs -batch --no-init-file --no-site-file --multibyte -l /usr/src/redhat/BUILD/emacs-21.4/leim/../lisp/international/titdic-cnv --eval '(batch-titdic-convert t)' -dir quail /usr/src/redhat/BUILD/emacs-21.4/leim/CXTERM-DIC Breakpoint 1, main (argc=12, argv=0xfffffe89808, envp=0xfffffe89870) at emacs.c:714 714 int skip_args = 0; (gdb) p/x Vmessages_buffer_name $1 = 0x6c5f706f73697469 Eventually, this is the value that makes trouble. Now, it seems to me that a variable declared static (even if it is a union) without an explicit initializer should be initialized to zero. It's in the BSS section, right? (gdb) p/x &Vmessages_buffer_name $2 = 0x101dd108 Now let's see what our binary says: # objdump -h emacs [ emacs: file format elf64-powerpc Sections: Idx Name Size VMA LMA File off Algn ... ] 22 .data 004c0070 00000000101dcf90 00000000101dcf90 001ccf90 2**4 CONTENTS, ALLOC, LOAD, DATA 23 .bss 00000000 000000001069d000 000000001069d000 0068d000 2**0 ALLOC [ ... ] Goodness! Vmessages_buffer_name has an address in the .data section, not the .bss! Could this be a compiler bug? Chip I bet that the emacs binary you are running is the dumped one already. And, emacs during its dumping makes turns the .bss section into .data and saves there whatever values that memory contained during dumping. (In reply to comment #2) > I bet that the emacs binary you are running is the dumped one already. > And, emacs during its dumping makes turns the .bss section into .data and > saves there whatever values that memory contained during dumping. It seems to be more complicated than that. I ran gdb on a i386 emacs binary, and Vmessages_buffer_name is in the .data section of that one, too. Also, it is initialized to a reasonable value: (gdb) p/x Vmessages_buffer_name $3 = 0x382dc354 (gdb) x/s ((struct Lisp_String *)(Vmessages_buffer_name & ((1 << 28) - 1)))->data 0x82ad940: "*Messages*" (N.B. the most significant 4 bits of a Lisp_Object are a type tag.) I can't figure out how it is initialized, because as far as I can tell the function responsible for doing so, syms_of_xdisp, hasn't been executed at the point where I was checking this value. Still digging. Chip (In reply to comment #3) > I can't > figure out how it is initialized, because as far as I can tell the function > responsible for doing so, syms_of_xdisp, hasn't been executed at the point > where I was checking this value. RMS is too clever by half. Emacs builds its own C-Run Time, ecrt0.o, and then links with the -nostdlib switch. gdb doesn't trace through the crtbegin/crtend stuff. That's why I'm not able to catch the initialization of Vmessages_buffer_name. Ugh. Chip (In reply to comment #4) > > Emacs builds its own C-Run Time, ecrt0.o, and then > links with the -nostdlib switch. I take it back. ecrt0.o is only built for some architectures, not including ppc64. However, emacs is linked with -nostdlib, and there must be something funny going on to initialize all these variables. Chip (In reply to comment #2) > I bet that the emacs binary you are running is the dumped one already. > And, emacs during its dumping makes turns the .bss section into .data and > saves there whatever values that memory contained during dumping. Actually, you were right. And, in fact, it seems like it is the process of dumping that is messing up. If I run TERM=vt100 ./temacs -l loadup it works just fine. But if I run ./temacs -batch -l loadup dump the resulting emacs binary has problems. With gcc not optimizing, I get this error from emacs # ./emacs emacs: Wrong type argument: stringp, -336799784416090785 The argument it is complaining about is exactly the Vmessages_buffer_name mentioned earlier. This variable is correctly initialized by temacs when I debug it. Ugh. Chip (In reply to comment #6) > > Actually, you were right. And, in fact, it seems like it is the process > of dumping that is messing up. In fact, if I strip the temacs binary before running ./temacs -l loadup -batch dump it will seg-fault at line 947 of unexelf.c as it unexec's the temacs binary. This is during the process of inserting the new data section (that was created from the old bss) just ahead of a new (empty) bss section. Chip After more playing around I discovered that I could get a seg-fault from ./temacs -l loadup -batch dump on i386 if I build temacs without compiler optimization. The command above seg-faults in the unexec function (file unexelf.c) while executing this line: memcpy (NEW_SECTION_H (nn).sh_offset + new_base, (caddr_t) OLD_SECTION_H (n).sh_addr, new_data2_size); I unrolled the memcpy thus: p = NEW_SECTION_H (nn).sh_offset + new_base; q = (caddr_t) OLD_SECTION_H (n).sh_addr; for(i=0; i<new_data2_size; i++) p[i] = q[i]; ran the debugger and found the segfault happens when (gdb) p/x q+i $5 = 0x82f0000 (gdb) p/x i $8 = 0xed160 (gdb) p/x new_bss_addr $10 = 0x852a000 In the meantime, if I look in /proc/[PID]/maps I find this: 08048000-081fa000 r-xp 00000000 fd:00 1311728 /home/coldwell/rpm/BUILD/emacs-21.4/src/temacs 081fa000-08203000 rw-p 001b2000 fd:00 1311728 /home/coldwell/rpm/BUILD/emacs-21.4/src/temacs 08203000-082f0000 rw-p 08203000 00:00 0 08337000-0852a000 rw-p 08337000 00:00 0 [heap] b730e000-b7da3000 rw-p b730e000 00:00 0 The problem is that the Linux kernel has set up the process virtual memory with a hole in it, and when the memcpy steps into this hole, it seg-faults. This is probably also the origin of the bug on PPC64. Chip Created attachment 133080 [details]
backport from upstream emacs-22; patch to fix unexelf.c (unexec) on ppc64
(In reply to comment #8) > > The problem is that the Linux kernel has set up the process virtual > memory with a hole in it, and when the memcpy steps into this hole, it > seg-faults. > > This is probably also the origin of the bug on PPC64. Not true. The hole in the process virtual address space is the work of ExecShield; it is avoided during the normal rpm build process on i386/x86_64 by specifying "setarch -R make ...". The real problem was that PowerPC has the .plt in the .bss section. I backported a patch from the upstream emacs-22 that seems to fix the problem. A brew build is in progress as I write this. Chip > > Chip > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=135258 emacs-21.4-15 built. |