Bug 4437 - Child dumps core (sig11) before reaching code after fork
Child dumps core (sig11) before reaching code after fork
Product: Red Hat Linux
Classification: Retired
Component: glibc (Show other bugs)
i386 Linux
low Severity low
: ---
: ---
Assigned To: Cristian Gafton
Depends On:
  Show dependency treegraph
Reported: 1999-08-09 04:17 EDT by remco
Modified: 2015-09-01 21:41 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 1999-08-16 14:50:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description remco 1999-08-09 04:17:25 EDT
The crashing application is a feature-rich scheduler
tightly integrated with Oracle (8.1.5).  It does not
show this problem with linux-2.0.35 with glibc-2.0.7
(Oracle 8.0.5) and 8 other Unix flavours.

The scheduler crashes when it fork/execs two agents.
These agents connect to the Oracle database by means of
the BEQ-protocol (a helper process is fork/execed by
the oracle OCI-library).  The last fork() call returns
the child pid to the parent but the child process never
reaches the code after the fork.  The child dumps core
before that..

Here's a typescript of a gdb session opening the

(gdb) GNU gdb with Linux support
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are
welcome to change it and/or distribute copies of it under
certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show
warranty" for details.
This GDB was configured as "i386-redhat-linux"...
Core was generated by `jcs master agent LX05'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libpthread.so.0...done.
Reading symbols from /lib/libcrypt.so.1...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
Reading symbols from /lib/libnss_nisplus.so.2...done.
Reading symbols from /lib/libnsl.so.1...done.
Reading symbols from /lib/libnss_files.so.2...done.
Reading symbols from /lib/libnss_nis.so.2...done.
Reading symbols from /lib/libnss_dns.so.2...done.
Reading symbols from /lib/libresolv.so.2...done.
#0  __pthread_mutex_init (mutex=0x0, mutex_attr=0xbfff8024)
at spinlock.h:59
spinlock.h:59: No such file or directory.
(gdb) info stack
#0  __pthread_mutex_init (mutex=0x0, mutex_attr=0xbfff8024)
at spinlock.h:59
#1  0x40041950 in __fresetlockfiles () at lockfile.c:83
#2  0x4003fbc8 in fork () at ptfork.c:92
#3  0x80f47cd in rsdmfor ()
#4  0x80ee315 in rsssrsc ()
#5  0x804c129 in main ()
#6  0x40090cb3 in __libc_start_main (main=0x804c040 <main>,
    argv=0xbffffc84, init=0x804afdc <_init>, fini=0x849879c
    rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffffc7c)
    at ../sysdeps/generic/libc-start.c:78

The machine involved is a 512Mb RAM dual Pentium III
(2x450MHz).  I've tried the SMP, non-SMP kernel and
Linux-2.2.10ac12.  The "dmesg" utility shows no related

Please adjust the priority/severity to your taste ;-)
Comment 1 Cristian Gafton 1999-08-10 21:54:59 EDT
I will need a small example that I can use to test for the glibc
problem. Also, have you tried the newer glibc packages available from
Comment 2 remco 1999-08-11 18:22:59 EDT
I tried glibc from rawhide as you proposed, but it shows the
same problem.

Here's a patch I applied for glibc to provide a workaround:

==== CUT HERE ====
*** glibc.old/linuxthreads/lockfile.c   Wed Aug 11 19:52:27 1999
--- glibc/linuxthreads/lockfile.c       Wed Aug 11 19:48:52 1999
*** 80,86 ****
--- 80,90 ----
    __pthread_mutexattr_settype (&attr, PTHREAD_MUTEX_RECURSIVE_NP);

    for (fp = _IO_list_all; fp != NULL; fp = fp->_chain)
+ #if 0
      __pthread_mutex_init (fp->_lock, &attr);
+ #else
+     if (fp->_lock) __pthread_mutex_init (fp->_lock, &attr);
+ #endif

    __pthread_mutexattr_destroy (&attr);
==== CUT HERE ====

I still do not know why this problem occures.  Somehow the
_lock member gets set to 0, causing __pthread_mutex_init to
segfault.  I have been unable to isolate the problem in a
small piece of sample code..  Any ideas?

BTW. there's a core-file distributed with the glibc source tree.
Comment 3 Cristian Gafton 1999-08-16 14:50:59 EDT
patch applied in glibc-2.1.2-5 and later
Comment 4 openshift-github-bot 2015-09-01 21:41:24 EDT
Commit pushed to master at https://github.com/openshift/origin

Fix for issue #4437 - restarting the haproxy router still dispatches
connections to a downed backend.

Note You need to log in before you can comment on or make changes to this bug.