Bug 239344
Summary: | emacs crashes on startup | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ulrich Drepper <drepper> | ||||
Component: | glibc | Assignee: | Jakub Jelinek <jakub> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | |||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | rawhide | CC: | abartlet, adam, ash, bkb, cra, dann, daryll, dwalsh, dwmw2, grgustaf, herrold, ismail, jik, kim-rh, michal, mishu, mitr, mtasaka, ndbecker2, oliva, p.patruno, raj.khem, ralston, roland, sathya.satissh, stickster, tim, tromey, v.pomerol, warlord, wwoods | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 2.6-2 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-05-22 16:23:57 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 150226 | ||||||
Attachments: |
|
Description
Ulrich Drepper
2007-05-07 18:44:00 UTC
(In reply to comment #0) > > In gdb is looks like this: > > #0 0x0000003815c30565 in raise () from /lib64/libc.so.6 > #1 0x0000003815c31ee0 in abort () from /lib64/libc.so.6 > #2 0x0000003815c68c8b in __libc_message () from /lib64/libc.so.6 > #3 0x0000003815c714eb in _int_malloc () from /lib64/libc.so.6 > #4 0x0000003815c7240d in malloc () from /lib64/libc.so.6 > #5 0x000000000050ac5c in emacs_blocked_malloc (size=768, ptr=<value optimized > out>) at alloc.c:1244 > #6 0x0000003815c27e6d in _nl_intern_locale_data () from /lib64/libc.so.6 > #7 0x0000003815c286fa in _nl_load_locale_from_archive () from /lib64/libc.so.6 > #8 0x0000003815c2783d in _nl_find_locale () from /lib64/libc.so.6 > #9 0x0000003815c2721e in setlocale () from /lib64/libc.so.6 > #10 0x00000000004b285f in main (argc=2, argv=0x7fffad69fba8) at emacs.c:1077 > #11 0x0000003815c1da54 in __libc_start_main () from /lib64/libc.so.6 > #12 0x000000000040d9e9 in _start () This is interesting. It looks like the glibc setlocale is calling malloc, which is intercepted by a badly-behaving emacs malloc. In fact, one can easily verify that setlocale is calling malloc: #include <locale.h> #include <stdio.h> #include <malloc.h> void *mhook(size_t size, const void *caller) { printf("%ld %p\n", size, caller); return NULL; } int main(int argc, char *argv[]) { __malloc_hook = mhook; setlocale(LC_ALL, "ISO8859-1"); return 0; } When I run this I get (gdb) run Starting program: /home/boston/coldwell/foo 568 0x3b3625eb4a 44 0x3b3622d2c1 44 0x3b3622d2c1 Program exited normally. (gdb) p setlocale $1 = {<text variable, no debug info>} 0x3b36226e40 <setlocale> Can we try one thing quickly -- would you be willing to test emacs-22.0.99 (the latest pretest version) to see if this problem has already been solved upstream? Repo/rpms are here: http://people.redhat.com/coldwell/emacs/fedora/ Chip I downloaded the new RPMs. First, there is no emacs program included anymore. Second, the results are even worse. Now I don't get a backtrace, it just crashes. The backtrace is generated by glibc when it finds a memory problem so this means that it's a different problem. strace reports: [...] set_tid_address(0x2aaaaaad7090) = 10711 set_robust_list(0x2aaaaaad70a0, 0x18) = 0 rt_sigaction(SIGRTMIN, {0x3816c052a0, [], SA_RESTORER|SA_SIGINFO, 0x3816c0dcb0}, NULL, 8) = 0 rt_sigaction(SIGRT_1, {0x3816c05320, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x3816c0dcb0}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0 setrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0 open("/usr/lib/locale/locale-archive", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=65167440, ...}) = 0 mmap(NULL, 65167440, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2aaaaaad8000 close(3) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- The lines I removed are just ld.so doing it's work. The set_tid_address() call part of the initializers. Since the crash happens after locale-archive is opened my guess is that it is still in setlocale(), but now at a different place. (In reply to comment #2) > I downloaded the new RPMs. First, there is no emacs program included anymore. > > Second, the results are even worse. Now I don't get a backtrace, it just > crashes. I don't follow: if there's no emacs program, how can it crash? $ rpm -qlp emacs-22.0.99-1.fc7.x86_64.rpm /usr/bin/emacs-22.0.99 /usr/share/applications/gnu-emacs.desktop /usr/bin/emacs-22.0.99 is the binary; /usr/bin/emacs should be a symlink to it via /etc/alternatives/emacs. Is that not working, too? > set_tid_address(0x2aaaaaad7090) = 10711 > set_robust_list(0x2aaaaaad70a0, 0x18) = 0 > rt_sigaction(SIGRTMIN, {0x3816c052a0, [], SA_RESTORER|SA_SIGINFO, 0x3816c0dcb0}, > NULL, 8) = 0 > rt_sigaction(SIGRT_1, {0x3816c05320, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, > 0x3816c0dcb0}, NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 > getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0 > getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0 > setrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0 > open("/usr/lib/locale/locale-archive", O_RDONLY) = 3 > fstat(3, {st_mode=S_IFREG|0644, st_size=65167440, ...}) = 0 > mmap(NULL, 65167440, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2aaaaaad8000 > close(3) = 0 > --- SIGSEGV (Segmentation fault) @ 0 (0) --- Locale again. What happens if you $ setarch x86_64 -R /usr/bin/emacs-22.0.99 ? Chip There is no program "emacs" but there is "emacs-22.0.99". The use of setarch doesn't change anything. This is from ltrace: [...] setlocale(6, "" <unfinished ...> pthread_self(5, 0x3815c2dc3e, 56, 5, 4) = 0x2aaaaaad7000 pthread_mutex_lock(0x9c5e20, 0x3815c2dc3e, 56, 5, 4) = 0 mallopt(0xfffffffe, 0, 0x3816c0f638, 1, 4) = 1 malloc(5) = 0xa843d0 pthread_mutex_unlock(0x9c5e20, 5, 0xa843c0, 0x3815f4b9f0, 4) = 0 pthread_self(0xa843d0, 0x3815c2867e, 0xa843d0, 0, 0xfefefefefefefeff) = 0x2aaaaaad7000 pthread_mutex_lock(0x9c5e20, 0x3815c2867e, 0xa843d0, 0, 0xfefefefefefefeff) = 0 free(0xa843d0) = <void> pthread_mutex_unlock(0x9c5e20, 0xa843d0, 33, 0xa843c0, 0xa843d0) = 0 pthread_self(120, 0x3815c2868d, 192, 0x137a50, 0x2aaaaaadc808) = 0x2aaaaaad7000 pthread_mutex_lock(0x9c5e20, 0x3815c2868d, 192, 0x137a50, 0x2aaaaaadc808) = 0 mallopt(0xfffffffe, 0, 0x3816c0f638, 1, 0x2aaaaaadc808) = 1 malloc(120) = 0xe06bb0 pthread_mutex_unlock(0x9c5e20, 120, 0xe06ba0, 0x3815f4bab0, 0x2aaaaaadc808) = 0 pthread_self(12, 0x3815c76dd2, -256, 0x5c5b5c3d31535000, 0xfefefefefefefeff) = 0x2aaaaaad7000 pthread_mutex_lock(0x9c5e20, 0x3815c76dd2, -256, 0x5c5b5c3d31535000, 0xfefefefefefefeff) = 0 mallopt(0xfffffffe, 0, 0x3816c0f638, 1, 0xfefefefefefefeff) = 1 malloc(12) = 0xa84550 pthread_mutex_unlock(0x9c5e20, 12, 0xa84540, 0x3815f4b9f0, 0xfefefefefefefeff) = 0 pthread_self(768, 0x3815c27e6d, 88, 0, 0x9c5e20) = 0x2aaaaaad7000 pthread_mutex_lock(0x9c5e20, 0x3815c27e6d, 88, 0, 0x9c5e20) = 0 mallopt(0xfffffffe, 0, 0x3816c0f638, 1, 0x9c5e20) = 1 malloc(768 <unfinished ...> Inside the setlocale call libc calls malloc (as it should be). emacs's malloc intercepts the calls and plays around with it. All this mallopt() calls are from emacs. I get crashes on i386 as well. But at a different place. Having glibc.i686 0:2.5.90-22 Here is a backtrace: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208473920 (LWP 5371)] 0x00cfb7ee in _int_malloc () from /lib/libc.so.6 (gdb) bt #0 0x00cfb7ee in _int_malloc () from /lib/libc.so.6 #1 0x00cfd04e in malloc () from /lib/libc.so.6 #2 0x08145b1d in ?? () #3 0x00cfd005 in malloc () from /lib/libc.so.6 #4 0x4eb7891b in XOpenDisplay () from /usr/lib/libX11.so.6 #5 0x080c5721 in ceil () #6 0x08056a14 in ceil () #7 0x080f3b76 in ceil () #8 0x00ca8f30 in __libc_start_main () from /lib/libc.so.6 #9 0x08051b31 in ceil () FWIW, the crashes go away after reverting to glibc-2.5.90-21. (In reply to comment #7) > FWIW, the crashes go away after reverting to glibc-2.5.90-21. It's worth a lot. This might be a glibc regression. I'll ask Jakub to take a look. Chip Created attachment 154614 [details]
Disable mmap-ing from the start
Here's a patch which fixes things for me. emacs tries to work around some implementation details. Specifically, when dumping no mmap-ed pages must be present. The program calls mallopt() in an attempt to achieve that. The problem is that it is already too late by then. The remedy is to make sure that setting is known from the start. This is possible using an environment variable. Patch from comment #9 works for me. Thanks, Ulrich! Moreover emacs recompiled with the same patch does not seem to be affected by bug 224611 anymore. That with two caveats: - I did not try window resizing with emacs-22.0.95-1.fc7 before it became impossible to start it all so I do not know what actually affected bug 224611 - with the current Preference setup I am not really sure what is status of "Assistive Technology Preferences"; I _think_ that I am toggling that on subsequent logouts and logins but I do not see how to tell that for sure. (In reply to comment #10) > Here's a patch which fixes things for me. > > emacs tries to work around some implementation details. Specifically, when > dumping no mmap-ed pages must be present. The program calls mallopt() in an > attempt to achieve that. The problem is that it is already too late by then. > > The remedy is to make sure that setting is known from the start. This is > possible using an environment variable. Ulrich: Thank you very much for all the work you've put into debugging and fixing this problem. I scratch-built some new F7 packages that include your patch: http://people.redhat.com/coldwell/emacs/fedora/7 If someone could please verify that the problem is fixed in these packages I will commit the change. Also, I should probably notify upstream. Chip (In reply to comment #12) > If someone could please verify that the problem is fixed in these packages I > will commit the change. Also, I should probably notify upstream. Does not work for me. I'm downloading the sources now and will take a look. I recompiled the .src.rpm on my machine and all is fine now. I have no idea, something is screwed up in the build system you've been using. (The symlink "emacs" is still missing but that's a different bug.) The binary RPMs did not work for me either (Pentium 4, updated rawhide). Will try src.rpm. (In reply to comment #14) > I recompiled the .src.rpm on my machine and all is fine now. I have no idea, > something is screwed up in the build system you've been using. I wonder if I need a "BuildRequires: autoconf". Although that's strange, because the Makefile should get build from Makefile.in by the configure script. > (The symlink "emacs" is still missing but that's a different bug.) That I have not been able to reproduce (i.e. it works for me). Do you have /usr/sbin/alternatives? Chip (In reply to comment #16) > I wonder if I need a "BuildRequires: autoconf". Although that's strange, > because the Makefile should get build from Makefile.in by the configure script. I doubt autoconf makes a difference. Using MALLOC_MMAP_MAX_ is _still_ a hack. The problem is that emacs, after the first call to its own malloc, still enables mmap-based allocation for other callers. What glibc is used in the build root? I don't know exactly how the dumper works these days. If the dumped heap is connected with the heap in use in the emacs binary itself using the new glibc at build time is essential. We have changed to format of freed blocks. This is a change which mustn't make a difference to any program but dumpers are a special case. So, maybe try a BuildRequires for glibc >= 2.5.90-22. > That I have not been able to reproduce (i.e. it works for me). Do you have > /usr/sbin/alternatives? Yep, that code is there. (In reply to comment #17) > > Using MALLOC_MMAP_MAX_ is _still_ a hack. The problem is that emacs, after the > first call to its own malloc, still enables mmap-based allocation for other callers. > > What glibc is used in the build root? glibc-2.5.90-21 > If the dumped heap is connected with the heap in use in the emacs > binary itself using the new glibc at build time is essential. We have changed > to format of freed blocks. Did this change happen between glibc-2.5.90-21 and glibc-2.5.90-22? > So, maybe try a BuildRequires for glibc >= 2.5.90-22. Wouldn't it also need a Requires: glibc >= 2.5.90-22? Chip (In reply to comment #18) > Did this change happen between glibc-2.5.90-21 and glibc-2.5.90-22? Yes. > Wouldn't it also need a Requires: glibc >= 2.5.90-22? No, the old format is backward compatible. We just have two additional pointers in some cases. Anyway, if this is the problem then emacs is still seriously broken and this is a work around. And note that the MALLOC_MMAP_MAX_ setting is still necessary. (In reply to comment #19) > (In reply to comment #18) > > Did this change happen between glibc-2.5.90-21 and glibc-2.5.90-22? > > Yes. Is the same format used by the upstream glibc? > Anyway, if this is the problem then emacs is still seriously broken and this is > a work around. In particular, the dumper is seriously broken. It seems that it's always the dumper. Chip (In reply to comment #20) > Is the same format used by the upstream glibc? We would never diverge on something like this. I've committed the change upstream before the rawhide build. (In reply to comment #21) > (In reply to comment #20) > > Is the same format used by the upstream glibc? > > We would never diverge on something like this. I've committed the change > upstream before the rawhide build. Good. My concern was the reaction on the emacs-devel list. I'll point them to this bug. Chip emacs is still crashing and this is breaking builds that use emacs -batch -f batch-byte-compile and such. Nu? This effectively prevents building gettext and desktop-file-utils here at least. Probably other modules as well. Any news on when packages which work around or fix the problem can be expected? fwiw, a local rebuild of: http://people.redhat.com/coldwell/emacs/fedora/7/src/emacs-22.0.99-2.fc7.src.rpm works for me, too. (In reply to comment #24) > This effectively prevents building gettext and desktop-file-utils here at least. > Probably other modules as well. Any news on when packages which work around or > fix the problem can be expected? I'm working on mastering koji right now. The Fedora build root does have the right glibc version; brew dist-fc7 doesn't. ETA tomorrow sometime. Chip *** Bug 240124 has been marked as a duplicate of this bug. *** *** Bug 240150 has been marked as a duplicate of this bug. *** Same problem on i386. I can't get a good backtrace out of gdb: >gdb -core core.1290 GNU gdb Red Hat Linux (6.6-8.fc7rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu". (no debugging symbols found) Using host libthread_db library "/lib/libthread_db.so.1". Core was generated by `emacs'. Program terminated with signal 11, Segmentation fault. #0 0x0017a402 in __kernel_vsyscall () (gdb) where #0 0x0017a402 in __kernel_vsyscall () #1 0x44d0b396 in ?? () #2 0x080f573f in ?? () #3 0x0000050a in ?? () #4 0x0000000b in ?? () #5 0x00000000 in ?? () >rpm -qa \*debuginfo\*|sort atk-debuginfo-1.18.0-1.fc7 cairo-debuginfo-1.4.4-1.fc7 emacs-debuginfo-22.0.99-2.fc7 expat-debuginfo-1.95.8-9 fontconfig-debuginfo-2.4.2-2.fc7 freetype-debuginfo-2.3.4-1.fc7 giflib-debuginfo-4.1.3-8 glib2-debuginfo-2.12.11-1.fc7 glibc-debuginfo-2.5.90-22 glibc-debuginfo-common-2.5.90-22 gtk2-debuginfo-2.10.11-5.fc7 libICE-debuginfo-1.0.3-1.fc7 libjpeg-debuginfo-6b-37 libpng-debuginfo-1.2.16-1.fc7 libSM-debuginfo-1.0.2-1 libtiff-debuginfo-3.8.2-7.fc7 libX11-debuginfo-1.0.3-8.fc7 libXau-debuginfo-1.0.3-1.fc7 libXcursor-debuginfo-1.1.8-1 libXdmcp-debuginfo-1.0.2-2.fc7 libXext-debuginfo-1.0.1-2.1 libXfixes-debuginfo-4.0.3-1 libXft-debuginfo-2.1.12-1.fc7 libXi-debuginfo-1.0.4-1 libXinerama-debuginfo-1.0.2-1.fc7 libXpm-debuginfo-3.5.6-1 libXrandr-debuginfo-1.2.0-3.fc7 libXrender-debuginfo-0.9.2-1.fc7 ncurses-debuginfo-5.6-6.20070303.fc7 pango-debuginfo-1.16.4-1.fc7 zlib-debuginfo-1.2.3-10.fc7 Ok, trying again with gdb /usr/bin/emacs-22.0.99: Core was generated by `emacs'. Program terminated with signal 11, Segmentation fault. #0 0x0017a402 in __kernel_vsyscall () (gdb) where #0 0x0017a402 in __kernel_vsyscall () #1 0x44d0b396 in kill () from /lib/libc.so.6 #2 0x080f573f in fatal_error_signal (sig=11) at emacs.c:397 #3 <signal handler called> #4 0x44d4a9fd in _int_malloc (av=0x44e33120, bytes=1336) at malloc.c:4406 #5 0x44d4c04e in *__GI___libc_malloc (bytes=1336) at malloc.c:3535 #6 0x08147c8d in emacs_blocked_malloc (size=1336, ptr=0x44ed1144) at alloc.c:1244 #7 0x44d4be90 in __libc_calloc (n=1, elem_size=1336) at malloc.c:3845 #8 0x44ed1144 in XOpenDisplay (display=0xbfb62c2d ":0.0") at OpenDis.c:144 #9 0x080c6601 in x_display_ok (display=0xbfb62c2d ":0.0") at xterm.c:10482 #10 0x08056c12 in init_display () at dispnew.c:6798 #11 0x080f51a6 in main (argc=Cannot access memory at address 0x7266206f ) at emacs.c:1658 #12 0x44cf7f30 in __libc_start_main (main=0x80f4120 <main>, argc=1, ubp_av=0xbfb62154, init=0x81a27e0 <__libc_csu_init>, fini=0x81a27d0 <__libc_csu_fini>, rtld_fini=0x44cd1660 <_dl_fini>, stack_end=0xbfb6214c) at libc-start.c:222 #13 0x08051b91 in _start () BTW, I've seen this on x86 and x86_64, with current rawhide. *** Bug 240262 has been marked as a duplicate of this bug. *** The 22.0.99-2 still doesn't work. The symptoms changed, though. Now glibc prints *** glibc detected *** emacs-22.0.99: double free or corruption (!prev): 0x0000000000e674a0 *** and then hangs in the attempt to create a backtrace on a mutex. I'll download the 22.0.99 sources and see whether this is anything new from the 22.0.95 version I tested. I rebuilt emacs-22.0.99-3.fc7.src.rpm myself it it works just fine. I haven't seen any BuildRequires line on the .spec file so it might just be that some of the build roots are not up-to-date. (In reply to comment #34) > I rebuilt emacs-22.0.99-3.fc7.src.rpm myself it it works just fine. I haven't > seen any BuildRequires line on the .spec file so it might just be that some of > the build roots are not up-to-date. Thanks. I've just done a build in koji for all archs -- the x86_64 rpms are at http://koji.fedoraproject.org/koji/taskinfo?taskID=9324 can you please install that one and verify that the problem is fixed? If so, I'll ask rel-eng to push this package out to the mirrors. Chip $ uname -a Linux datatype 2.6.21-1.3142.fc7 #1 SMP Mon May 7 21:07:42 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux $ rpm -q emacs emacs-22.0.99-3.fc7 $ emacs-22.0.99 *** glibc detected *** emacs-22.0.99: double free or corruption (!prev): 0x0000000000e674a0 *** And it just hangs... (In reply to comment #36) > $ uname -a > Linux datatype 2.6.21-1.3142.fc7 #1 SMP Mon May 7 21:07:42 EDT 2007 x86_64 > x86_64 x86_64 GNU/Linux > > $ rpm -q emacs > emacs-22.0.99-3.fc7 > > $ emacs-22.0.99 > *** glibc detected *** emacs-22.0.99: double free or corruption (!prev): > 0x0000000000e674a0 *** > > > > And it just hangs... Could you please try rebuilding from the .src.rpm yourself? The .src.rpm is here http://koji.fedoraproject.org/koji/taskinfo?taskID=9323 Chip, I think you missed the crucial info from comment #33 and #34. I did try your latest build and it's no good. Rebuilding it on my machine makes it work. The only issue I see is that you have no BuildRequires for glibc >= 2.5.90-22 in the .spec file. Yes, it works when I build myself, on x86_64, with glibc-devel 2.5.90-22. (In reply to comment #38) > Chip, I think you missed the crucial info from comment #33 and #34. I did try > your latest build and it's no good. Rebuilding it on my machine makes it work. > The only issue I see is that you have no BuildRequires for glibc >= 2.5.90-22 > in the .spec file. Right, but such a BuildRequires would have been satisfied anyway: $ koji latest-pkg dist-fc7-build glibc Build Tag Built by ---------------------------------------- -------------------- ---------------- glibc-2.6-1 dist-fc7 roland Chip *** Bug 240420 has been marked as a duplicate of this bug. *** In comment #11 I noted that emacs-22.0.95-1.fc7 with a patch from Uli and recompiled with glibc-2.5.90-22 does behave. Today an update showed up to glibc-2.6-1 and we are back to a square one: *** glibc detected *** emacs: malloc(): memory corruption: 0x0000000002904d10 *** ..... 7fff14378000-7fff1438e000 rw-p 7fff14378000 00:00 0 [stack] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vdso] Aborted (core dumped) .... Core was generated by `emacs'. Program terminated with signal 6, Aborted. #0 0x00002aaaabde15b5 in raise () from /lib64/libc.so.6 (gdb) where #0 0x00002aaaabde15b5 in raise () from /lib64/libc.so.6 #1 0x00002aaaabde3060 in abort () from /lib64/libc.so.6 #2 0x00002aaaabe19e1b in __libc_message () from /lib64/libc.so.6 #3 0x00002aaaabe21d8c in _int_malloc () from /lib64/libc.so.6 #4 0x00002aaaabe2369d in malloc () from /lib64/libc.so.6 #5 0x000000000050b9cc in emacs_blocked_malloc (size=432, ptr=<value optimized out>) at alloc.c:1244 #6 0x00002aaaabdd8ebd in _nl_intern_locale_data () from /lib64/libc.so.6 #7 0x00002aaaabdd974a in _nl_load_locale_from_archive () from /lib64/libc.so.6 #8 0x00002aaaabdd888d in _nl_find_locale () from /lib64/libc.so.6 #9 0x00002aaaabdd826e in setlocale () from /lib64/libc.so.6 #10 0x00000000004b3502 in main (argc=1, argv=0x7fff1438ac58) at emacs.c:1077 #11 0x00002aaaabdceaa4 in __libc_start_main () from /lib64/libc.so.6 #12 0x000000000040e6d9 in _start () All familar stuff. Recompiling emacs on every glibc change does not sound like a very good proposition. (In reply to comment #42) > In comment #11 I noted that emacs-22.0.95-1.fc7 with a patch from Uli > and recompiled with glibc-2.5.90-22 does behave. Today an update showed > up to glibc-2.6-1 Very strange that a glibc update from 2.5.90-22 to 2.6-1 showed up when F7 is presumably in a deep freeze right now. Nonetheless, > Recompiling emacs on every glibc change does not sound like a very > good proposition. your point is well taken. I'm up to my eyebrows in unexec right now, grokking more and more of it with every passing hour .... Chip (In reply to comment #42) > In comment #11 I noted that emacs-22.0.95-1.fc7 with a patch from Uli > and recompiled with glibc-2.5.90-22 does behave. Today an update showed > up to glibc-2.6-1 and we are back to a square one: > > *** glibc detected *** emacs: malloc(): memory corruption: 0x0000000002904d10 *** > ..... > 7fff14378000-7fff1438e000 rw-p 7fff14378000 00:00 0 [stack] > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vdso] > Aborted (core dumped) > .... > Core was generated by `emacs'. > Program terminated with signal 6, Aborted. > #0 0x00002aaaabde15b5 in raise () from /lib64/libc.so.6 > (gdb) where > #0 0x00002aaaabde15b5 in raise () from /lib64/libc.so.6 > #1 0x00002aaaabde3060 in abort () from /lib64/libc.so.6 > #2 0x00002aaaabe19e1b in __libc_message () from /lib64/libc.so.6 > #3 0x00002aaaabe21d8c in _int_malloc () from /lib64/libc.so.6 > #4 0x00002aaaabe2369d in malloc () from /lib64/libc.so.6 > #5 0x000000000050b9cc in emacs_blocked_malloc (size=432, > ptr=<value optimized out>) at alloc.c:1244 BTW, the line alloc.c:1244 contains value = (void *) malloc (size); and we can see from the stack trace that size=432. So it's clearly something else corrupting the glibc allocator state that causes this library call to fail (since generally we wouldn't expect malloc(432) to result in a seg-fault). Chip (In reply to comment #43) > > #5 0x000000000050b9cc in emacs_blocked_malloc (size=432, > > ptr=<value optimized out>) at alloc.c:1244 > > BTW, the line alloc.c:1244 contains > > value = (void *) malloc (size); > > and we can see from the stack trace that size=432. So it's clearly something > else corrupting the glibc allocator state More to the point, the allocator is discovering memory corruption and calling abort. This is an assert failing. Chip (In reply to comment #45) > > More to the point, the allocator is discovering memory corruption and calling > abort. This is an assert failing. One of the asserts in do_check_remalloced_chunk. Chip OK, here's a theory as to what might be going on here. The function dump_emacs contains malloc_state_ptr = malloc_get_state (); just before the call to unexec. Then malloc_initialize_hook (bound to the weak symbol __malloc_initialize_hook) does if (initialized) { [ ... ] malloc_set_state (malloc_state_ptr); #ifndef XMALLOC_OVERRUN_CHECK free (malloc_state_ptr); #endif } The "initialized" variable is zero in the dumping emacs and nonzero in the dumped emacs. The intention of this is to save the state of the glibc allocator in the dumping emacs and restore it when the dumped emacs starts back up again. I believe that for some reason this procedure is failing, so that when the dumped emacs starts, the glibc allocator is not in a consistent state. After rebuilding 22.0.99-2.fc7 on my own system with glibc-2.5.90-22, emacs runs properly. I just now updated to glibc-2.6-1, and emacs continues to run properly without needing a local rebuild. emacs-22.0.99-2.fc7 glibc-2.6-1 If information will be usefull, after rebuilding 22.0.99-3.fc7 on my own system with glibc-2.5.90-22, emacs runs properly, using command emacs-22.0.99. Rebuilding emacs-23.0.0.1 with glibc-2.6-1 seems to fix it. (In reply to comment #50) > Rebuilding emacs-23.0.0.1 with glibc-2.6-1 seems to fix it. That's not interesting information. Rebuilding emacs-22.0.99 with glibc-2.6-1 locally also fixed the problem. I do not have any reason to believe that emacs-23.0.0.1 has fixed the underlying bug. I really do not have a good idea of what the underlying bug is yet, and I would rather not lower the priority of this bugzilla until I do. Chip Could you also change the hardware from x86_64 to "all"? This problem exists on i386, too. *** Bug 240579 has been marked as a duplicate of this bug. *** (In reply to comment #42) > In comment #11 I noted that emacs-22.0.95-1.fc7 with a patch from Uli > and recompiled with glibc-2.5.90-22 does behave. Today an update showed > up to glibc-2.6-1 and we are back to a square one: > > *** glibc detected *** emacs: malloc(): memory corruption: 0x0000000002904d10 *** > ..... > 7fff14378000-7fff1438e000 rw-p 7fff14378000 00:00 0 [stack] > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vdso] > Aborted (core dumped) > .... > Core was generated by `emacs'. Do you still have this core (or could you produce another like it)? Is it somewhere I could get it? Chip > Do you still have this core (or could you produce another like it)? Producing such core is quite easy. 'ulimit -c unlimited' and start emacs not recompiled for a particular glibc. Does not work for you? > Is it somewhere I could get it? Not that easily. It is 35 Megs of data and its is for emacs-22.0.95-1.fc7 I recompiled myself on x86_64. If you still think that this will be useful please contact me off bugzilla. Does this bug also happen when emacs is built without gtk? (Take out the --with-gtk flag from the spec file, or don't pass it to configure if you build emacs that way). I don't know if this is relevant, but when built with Gtk emacs does some pthread calls before and after each malloc and free calls. (In reply to comment #56) > Does this bug also happen when emacs is built without gtk? (Take out the > --with-gtk flag from the spec file, or don't pass it to configure if you build > emacs that way). Well, I tried it. I built two emacs packages in koji without Ulrich's patch, one --without-gtk and one --with-gtk and tried them both on an F7 box. Both worked just fine. I'm perplexed. In the meantime, there has been a new pretest tarball released (22.0.990). Chip (In reply to comment #57) > (In reply to comment #56) > > Does this bug also happen when emacs is built without gtk? (Take out the > > --with-gtk flag from the spec file, or don't pass it to configure if you build > > emacs that way). > > Well, I tried it. I built two emacs packages in koji without Ulrich's patch, > one --without-gtk and one --with-gtk and tried them both on an F7 box. Both > worked just fine. I'm perplexed. > > In the meantime, there has been a new pretest tarball released (22.0.990). > > Chip > Rebuilding with gcc-2.6-1 seems to fix it. Strange, but true (for me anyway). Neal, it will fix it, yes, but only until something else gets upgraded that messes again with the heap or something else. Then, emacs will start crashing again. The update doesn't even have to be to glibc. Maybe it doesn't even take an upgrade, just a prelink. There are multiple bugs on the glibc side (unfortunately malloc_set_state the way it is used by emacs makes many internal details of the malloc implementation part of the ABI): 1) the 2007-04-30 Ulrich Drepper <drepper> Jakub Jelinek <jakub> [BZ #4349] * malloc/malloc.c: Keep separate list for first blocks on the bin lists with a given size. This helps skipping over list elements we know won't fit in two places. Inspired by a patch by Tomash Brechko <tomash.brechko>. change affects the ABI, as if emacs is dumped with glibc before this patch and run with glibc after this patch, the fd_nextsize/bk_nextsize pointers are left uninitialized, but the new glibc relies on them having correct values. This is solvable by recomputing these pointers in malloc_set_state. 2) I made an accidental commit that enabled MALLOC_DEBUG (in the 2007-05-07 Ulrich Drepper <drepper> Jakub Jelinek <jakub> * malloc/arena.c (heap_info): Add mprotect_size field, adjust pad. (new_heap): Initialize mprotect_size. (grow_heap): When growing, only mprotect from mprotect_size till new_size if mprotect_size is smaller. When shrinking, use PROT_NONE MMAP for __libc_enable_secure only, otherwise use MADV_DONTNEED. check-in) and 2007-05-13 Ulrich Drepper <drepper> * malloc/malloc.c [MALLOC_DEBUG]: Keep track of current maximum number of mmaps. n_mmaps_max is the target. * malloc/hooks.c: Likewise. * malloc/arena.c: Likewise. change added a field to struct malloc_save_state without bumping version number. That together means another ABI change for malloc_save_state. We certainly have to revert the MALLOC_DEBUG setting (that's expensive) and something has to be done about the 05-13 change (state changing shouldn't depend on MALLOC_DEBUG). *** Bug 240750 has been marked as a duplicate of this bug. *** Shouldn't fixing these glibc issues (eg. by Jakub's patch discussed in http://sourceware.org/ml/libc-hacker/2007-05/msg00015.html) by an F7 release blocker? Sorry, I'm a dumbass, I just realized it already is, sorry for the noise. *** Bug 237177 has been marked as a duplicate of this bug. *** *** Bug 240816 has been marked as a duplicate of this bug. *** People can try this build: http://koji.fedoraproject.org/koji/buildinfo?buildID=6952 Jakub hasn't moved it yet but the build seems fine. I'm running it here already. Please report results in any case. The bits from koji seem to work ok here too. Thanks! Jesse (In reply to comment #66) > People can try this build: > > http://koji.fedoraproject.org/koji/buildinfo?buildID=6952 This works: glibc-2.6-2 emacs-22.0.99-2.fc7 With glibc-2.6-1 I get a crash on startup. Thanks! Also tested emacs-22.0.95-1 built on glibc-2.5.90-22 glibc-2.6-2 : works! emacs-22.0.990-2 built on glibc-2.6-2 : works! Thanks very much, Jakub. Chip |