4437 – Child dumps core (sig11) before reaching code after fork

Bug 4437 - Child dumps core (sig11) before reaching code after fork

Summary: Child dumps core (sig11) before reaching code after fork

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	glibc
Sub Component:
Version:	6.0
Hardware:	i386
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Cristian Gafton
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	1999-08-09 08:17 UTC by remco
Modified:	2016-11-24 12:10 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	1999-08-16 18:50:03 UTC
Embargoed:

Attachments	(Terms of Use)

Description remco 1999-08-09 08:17:25 UTC

The crashing application is a feature-rich scheduler
tightly integrated with Oracle (8.1.5).  It does not
show this problem with linux-2.0.35 with glibc-2.0.7
(Oracle 8.0.5) and 8 other Unix flavours.

The scheduler crashes when it fork/execs two agents.
These agents connect to the Oracle database by means of
the BEQ-protocol (a helper process is fork/execed by
the oracle OCI-library).  The last fork() call returns
the child pid to the parent but the child process never
reaches the code after the fork.  The child dumps core
before that..

Here's a typescript of a gdb session opening the
core-file:

(gdb) GNU gdb 4.17.0.11 with Linux support
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are
welcome to change it and/or distribute copies of it under
certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show
warranty" for details.
This GDB was configured as "i386-redhat-linux"...
Core was generated by `jcs master agent LX05'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from
/home/oracle/product/lx815/lib/libskgxp8.so...done.
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libpthread.so.0...done.
Reading symbols from /lib/libcrypt.so.1...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
Reading symbols from /lib/libnss_nisplus.so.2...done.
Reading symbols from /lib/libnsl.so.1...done.
Reading symbols from /lib/libnss_files.so.2...done.
Reading symbols from /lib/libnss_nis.so.2...done.
Reading symbols from /lib/libnss_dns.so.2...done.
Reading symbols from /lib/libresolv.so.2...done.
#0  __pthread_mutex_init (mutex=0x0, mutex_attr=0xbfff8024)
at spinlock.h:59
spinlock.h:59: No such file or directory.
(gdb) info stack
#0  __pthread_mutex_init (mutex=0x0, mutex_attr=0xbfff8024)
at spinlock.h:59
#1  0x40041950 in __fresetlockfiles () at lockfile.c:83
#2  0x4003fbc8 in fork () at ptfork.c:92
#3  0x80f47cd in rsdmfor ()
#4  0x80ee315 in rsssrsc ()
#5  0x804c129 in main ()
#6  0x40090cb3 in __libc_start_main (main=0x804c040 <main>,
argc=2,
    argv=0xbffffc84, init=0x804afdc <_init>, fini=0x849879c
<_fini>,
    rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffffc7c)
    at ../sysdeps/generic/libc-start.c:78
(gdb)

The machine involved is a 512Mb RAM dual Pentium III
(2x450MHz).  I've tried the SMP, non-SMP kernel and
Linux-2.2.10ac12.  The "dmesg" utility shows no related
messages.

Please adjust the priority/severity to your taste ;-)

Comment 1 Cristian Gafton 1999-08-11 01:54:59 UTC

I will need a small example that I can use to test for the glibc
problem. Also, have you tried the newer glibc packages available from
rawhide?

Comment 2 remco 1999-08-11 22:22:59 UTC

I tried glibc from rawhide as you proposed, but it shows the
same problem.

Here's a patch I applied for glibc to provide a workaround:

==== CUT HERE ====
*** glibc.old/linuxthreads/lockfile.c   Wed Aug 11 19:52:27 1999
--- glibc/linuxthreads/lockfile.c       Wed Aug 11 19:48:52 1999
***************
*** 80,86 ****
--- 80,90 ----
    __pthread_mutexattr_settype (&attr, PTHREAD_MUTEX_RECURSIVE_NP);

    for (fp = _IO_list_all; fp != NULL; fp = fp->_chain)
+ #if 0
      __pthread_mutex_init (fp->_lock, &attr);
+ #else
+     if (fp->_lock) __pthread_mutex_init (fp->_lock, &attr);
+ #endif

    __pthread_mutexattr_destroy (&attr);
  #endif
==== CUT HERE ====

I still do not know why this problem occures.  Somehow the
_lock member gets set to 0, causing __pthread_mutex_init to
segfault.  I have been unable to isolate the problem in a
small piece of sample code..  Any ideas?


BTW. there's a core-file distributed with the glibc source tree.

Comment 3 Cristian Gafton 1999-08-16 18:50:59 UTC

patch applied in glibc-2.1.2-5 and later

Comment 4 openshift-github-bot 2015-09-02 01:41:24 UTC

Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/16c9e6aa7306fac2458923dc422c5f0b5682c43a
Fix for issue #4437 - restarting the haproxy router still dispatches
connections to a downed backend.

Note You need to log in before you can comment on or make changes to this bug.