Bug 81259

Summary: crash in malloc/mallopt
Product: [Retired] Red Hat Linux Reporter: Aleksey Nogin <aleksey>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 9CC: fweimer, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-04-17 18:03:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 79578    

Description Aleksey Nogin 2003-01-07 07:21:13 UTC
I've just got the following stack trace in a "frozen" Mozilla:

#0  0x420e0ca3 in pthread_setcanceltype () from /lib/i686/libc.so.6
#1  0x40e395a2 in nsProfileLock::Unlock() ()
   from /usr/lib/mozilla-1.3b/components/libprofile.so
#2  0x40e389ba in nsProfileLock::RemovePidLockFiles() ()
   from /usr/lib/mozilla-1.3b/components/libprofile.so
#3  0x40e389ef in nsProfileLock::FatalSignalHandler(int, siginfo*, void*) ()
   from /usr/lib/mozilla-1.3b/components/libprofile.so
#4  <signal handler called>
#5  0x42070f2c in mallopt () from /lib/i686/libc.so.6
#6  0x42070262 in malloc () from /lib/i686/libc.so.6
#7  0x4009b063 in JS_ArenaAllocate () from /usr/lib/libmozjs.so
#8  0x400e9686 in js_NewBufferTokenStream () from /usr/lib/libmozjs.so
#9  0x400e9563 in js_NewTokenStream () from /usr/lib/libmozjs.so
#10 0x40099821 in JS_CompileUCFunctionForPrincipals () from /usr/lib/libmozjs.so

[ omiting the remaining 82 frames as they are probably irrelevant ]

This is not the Phoebe Mozilla, but a BuildID 2002123101 compiled (with gcc 3.2)
on Red Hat 8.0 and is running on Phoebe. Still, malloc shouldn't segfault and
pthread_setcanceltype (whatever that is) shouldn't freeze...

Again, this is Phoebe (upgraded from 8.0) with
kernel-2.4.20-2.2 (i686 UP, booted with acpi=off)
glibc-2.3.1-21 (i686)

P.S. Potentially related - bug 80370.

Comment 1 Aleksey Nogin 2003-01-07 12:19:18 UTC
Just had another one of those:

(gdb) bt
#0  0x420e0ca3 in pthread_setcanceltype () from /lib/i686/libc.so.6
#1  0x40e545a2 in nsProfileLock::Unlock() ()
   from /usr/lib/mozilla-1.3b/components/libprofile.so
#2  0x40e539ba in nsProfileLock::RemovePidLockFiles() ()
   from /usr/lib/mozilla-1.3b/components/libprofile.so
#3  0x40e539ef in nsProfileLock::FatalSignalHandler(int, siginfo*, void*) ()
   from /usr/lib/mozilla-1.3b/components/libprofile.so
#4  <signal handler called>
#5  0x42071452 in mallopt () from /lib/i686/libc.so.6
#6  0x420703d6 in free () from /lib/i686/libc.so.6
#7  0x42068db0 in fopen () from /lib/i686/libc.so.6
#8  0x482f780f in native_ShockwaveFlash_IsPlaying ()
   from /usr/lib/mozilla/plugins/libflashplayer.so
#9  0x482f67e1 in native_ShockwaveFlash_IsPlaying ()
   from /usr/lib/mozilla/plugins/libflashplayer.so
#10 0x482c9a16 in nsFtpState::mSessionStartTime ()
   from /usr/lib/mozilla/plugins/libflashplayer.so


Comment 2 Bill Nottingham 2003-01-08 04:58:18 UTC
Please try a later glibc build from rawhide.

Comment 3 Bill Nottingham 2003-01-08 04:59:07 UTC
Also, do you have any plugins installed and active on pages where it hangs, such
as java?

Comment 4 Jakub Jelinek 2003-01-09 01:15:38 UTC
Ideally please retry with glibc-2.3.1-32 or later from rawhide.

Comment 5 Aleksey Nogin 2003-01-11 02:20:04 UTC
Using glibc-2.3.1-32 (i686) and kernel-2.4.20-2.10 (UP, 686) I still see this
crash. Not in Mozilla (at least not yet), but this time this is in an OCaml
program. gdb shows:

(gdb) bt
#0  0x4207393c in mallopt () from /lib/tls/libc.so.6
#1  0x42072c72 in malloc () from /lib/tls/libc.so.6
#2  0x08058006 in stat_alloc ()

and it is 100% reproducible (at least for me - it would probably take a lot of
effort to replicate it on another machine).

Comment 6 Aleksey Nogin 2003-01-11 03:55:35 UTC
Actually, I was wrong in comment #5 when I said that it will be hard to
reproduce elsewhere - it seems that the crash ihappens on the so early stage
that the program does not need any special input - just run it as it is. It's
~2Mb, so I can not attach it, but I posted it at http://nogin.org/bug81259.exe

P.S. Marking Severity:high because it completely prevents me from being able to
run some software I really need to be able to run (the
http://nogin.org/bug81259.exe is just an example, I seem to be having the same
problem with a number of OCaml programs)  ...

Comment 7 Jakub Jelinek 2003-01-20 18:47:55 UTC
This looks like at least bug81259 was compiled on RHL 8.0 or earlier, bug
it was linked on current rawhide, right?
Don't do that, it won't work.
My guess comes from the fact that it uses pthread_cond_*@GLIBC_2.3.2 symbols,
but it seems to malloc only 12 bytes big pthread_cond_t (that's what it used
to be and still is for pthread_cond_*@GLIBC_2.0; the new pthread_cond_t type
is 48 bytes long).

Comment 8 Aleksey Nogin 2003-01-20 19:16:44 UTC
Yes, Ocaml was compiled on 8.0. Not sure about linking, but I guess Ocaml
compiler does link in some pre-compiled binaries (e.g. compiled on 8.0) when
creating a custom runtime (e.g when running on Phoebe).

BTW, with glibc-2.3.1-32 kernel-2.4.20-2.15 I am still seeing weird Mozilla
hangs (just had at least two of them in the last hour):

(gdb) bt
#0  0xffffe002 in ?? ()
#1  0x410316e2 in nsProfileLock::Unlock() ()
   from /usr/lib/mozilla-1.3b/components/libprofile.so
#2  0x41030afa in nsProfileLock::RemovePidLockFiles() ()
   from /usr/lib/mozilla-1.3b/components/libprofile.so
#3  0x41030b2f in nsProfileLock::FatalSignalHandler(int, siginfo*, void*) ()
   from /usr/lib/mozilla-1.3b/components/libprofile.so
#4  0x4023a670 in funlockfile () from /lib/tls/libpthread.so.0
#5  0x42072de6 in free () from /lib/tls/libc.so.6
#6  0x4206b4c0 in fopen () from /lib/tls/libc.so.6
#7  0x4aeed80f in native_ShockwaveFlash_IsPlaying ()
   from /usr/lib/mozilla/plugins/libflashplayer.so
#8  0x4aeec7e1 in native_ShockwaveFlash_IsPlaying ()
   from /usr/lib/mozilla/plugins/libflashplayer.so
#9  0x4aebfa16 in completed.1 () from /usr/lib/mozilla/plugins/libflashplayer.so
#10 0x41b5afda in nsPluginHostImpl::GetPluginFactory(char const*, nsIPlugin**) ()
   from /usr/lib/mozilla-1.3b/components/libgkplugin.so
[...]

This is with Mozilla 2003011417 compiled on 8.0 and running on Phoebe 8.0.92+. 
Both recent crashes have involved the flash plugin, but I am not sure whether
this is flash-specific or not.

Comment 9 Jakub Jelinek 2003-01-20 19:20:35 UTC
For mozilla, I'd suggest LD_PRELOAD=/usr/lib/libefence.so.0, MALLOC_CHECK_=3
or some other memory allocation debugger.

Comment 10 Aleksey Nogin 2003-01-20 23:30:51 UTC
You were right - the Ocaml crash (comment #5) went away after I've recompiled
Ocaml compiler in Phoebe (this still means not binary compatibility for Ocaml
stuff, ouch).  Have not figured Mozilla crash yet (it's not clear how to
reporduce it),

Comment 11 Aleksey Nogin 2003-01-21 01:40:04 UTC
Applying EFence to Mozilla does not seem to be possible - Mozilla needs to much
memoryand together with efence it runs out of it before displaying a single
window (and I have 512M RAM + 1G swap, most of it free).

Comment 12 Aleksey Nogin 2003-01-21 07:32:53 UTC
Hm, even after recompiling the Ocaml compiler, I am still seeing crashes with
Ocaml binaries:

(gdb) bt
#0  0xffffe002 in ?? ()
#1  0x4002f18f in __pthread_initialize_minimal () from /lib/tls/libpthread.so.0
(gdb) thread apply all bt

Thread 2 (process 28128):
#0  0x084862a2 in Red_black_set_mem_560 ()
#1  0x00000b00 in ?? ()

Thread 1 (process 28130):
#0  0xffffe002 in ?? ()
#1  0x4002f18f in __pthread_initialize_minimal () from /lib/tls/libpthread.so.0


Comment 13 Aleksey Nogin 2003-01-21 22:00:21 UTC
Please ignore previous comment (comment #12) - there it seems to be crashing in
Red_black_set_mem, not in __pthread_initialize_minimal and it also crashes in
Red_black_set_mem, on 8.0, so it's possible it's a bug in Ocaml compiler
(theoretically, an Ocaml program that passed a type-checker is not supposed to
ever segfault), or somewhere else.

Comment 14 Ulrich Drepper 2003-04-17 18:03:45 UTC
This is no bug in glibc.  Object files are not portable from one release to another.