Bug 88178

Summary: db4 CDB mode environment creation fails, probably due to glibc pthreds issue
Product: [Retired] Red Hat Raw Hide Reporter: matti aarnio <matti.aarnio>
Component: db4Assignee: Jeff Johnson <jbj>
Status: CLOSED WONTFIX QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.0CC: jorton
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-11-13 22:37:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
db4-cdbmode testcase none

Description matti aarnio 2003-04-07 12:40:15 UTC
Description of problem:
  With current rawhide files of glibc, gcc, its libs, binutils, autoconf
  and friends, etc.  (glibc-2.3.2-11.9)

Version-Release number of selected component (if applicable):
   Current rawhide files of glibc, compiler, tools, db4.

How reproducible:
  Consistent

Steps to Reproduce:
1. rawhide system
2. attached test program
3. run.
    
Actual results:
# ./db4-cdb-test 
 .. db_env_create err= 0  
db4-cdb-test: /var/tmp/__db.001: unable to initialize environment lock: Function
not implemented
 .. env->open() err= 38  

Expected results:
# ./db4-cdb-test 
 .. db_env_create err= 0  
 .. env->open() err= 0  
 .. db_create() err= 0  
 .. db->open() err= 2  (read-only opening presently non-existent database...)



Additional info:
Using RedHat 8.0 update glibc:  glibc-2.3.2-4.80.i686.rpm  (et.al.) the result
is:   the same.

At machine with  glibc-2.3.1-38   the binaries compiled at rawhide machine do
work just fine as intended.



An attempt to  rpmbuild -bb  of db4 package without optimizations to help
tracking of syscalls did fail miserably.  Possibly just rawhide indigestion...

Attempt at gdb stepping thru calls did yield a high-probability suspicion, that
pthreads_mutexattr_setpshared()  call fails with  "ENOSYS".   Reading glibc
source does not yield any definite indication of such being even possible!
Possibly I studied wrong source...

Comment 1 matti aarnio 2003-04-07 12:46:15 UTC
Created attachment 90948 [details]
db4-cdbmode testcase

header-comments possibly disagree, use this to compile:
  gcc -g -o db4-cdb-test db4-cdb-test.c -lpthread -ldb-4.0

This attempts to create a SleepyCat CDB environment in /var/tmp,
and join it.  Then open READ ONLY a database which does not (likely)
exist in there.

Comment 2 matti aarnio 2003-04-07 13:46:37 UTC
A bit edited gdb session with sources in place, and suitable 'directory' command
given to the gdb:

(gdb) step
__db_pthread_mutex_init (dbenv=0x80498c8, mutexp=0x80498c8, flags=0)
    at ../mutex/mut_pthread.c:68
68              ret = 0;
65      {
69              memset(mutexp, 0, sizeof(*mutexp));
79              if (LF_ISSET(MUTEX_THREAD) || F_ISSET(dbenv, DB_ENV_PRIVATE)) {
89              pthread_condattr_t condattr, *condattrp = NULL;
90              pthread_mutexattr_t mutexattr, *mutexattrp = NULL;
92              if (!F_ISSET(mutexp, MUTEX_THREAD)) {
93                      ret = pthread_mutexattr_init(&mutexattr);
94                      if (ret == 0)
95                              ret = pthread_mutexattr_setpshared(
97                      mutexattrp = &mutexattr;
100             if (ret == 0)
(gdb) print ret
$4 = 38
....

  Definitely smells of bad  pthead_mutexattr_setpshared()  thing..


Comment 3 matti aarnio 2003-04-09 19:01:45 UTC
Moving to glibc, as the bug seems to be there.

Comment 4 matti aarnio 2003-04-09 19:09:01 UTC
Known to happen at i386, (i686 version of glibc, actually).
Can't say anything about e.g. Alpha et.al.

Comment 5 Joe Orton 2003-04-27 08:56:06 UTC
I think this is really a db4 issue: db4 is configured to depend on
pthread_mutexattr_setpshared working, but it doesn't when running an earlier
kernel.  glibc is not doing anything wrong, it just started supporting
setpshared recently.

*** This bug has been marked as a duplicate of 86381 ***

Comment 6 matti aarnio 2003-04-28 16:57:44 UTC
The Kernel I am running there is:

  2.4.18-23.8.0smp

Which is RH development errata for RH 8.0.
That kernel is stable, anything latter is prone to hung up the box.

Running latter kernel (e.g. 2.4.20/21) _is_not_ an option at present.


Comment 7 matti aarnio 2003-06-12 07:01:55 UTC
With up to date rawhide packages of kernel, glibc, along with previous db4
(hmm.. overnight there are newer versions of kernel, at least)
  kernel-2.4.20-20.1.2007.nptl.i686
  glibc-2.3.2-48.i686
  db4-4.1.25-2
Apparently db4 was just "rebuilt" in between -2 and -3.
POSIX mutexes using nptl are already in -1.

What happens:
285                 err = prv->ZSE->env->open(prv->ZSE->env,
(gdb) next
router: unable to join the environment
289                 if (err) prv->ZSE->env->err(prv->ZSE->env, err, "envhome
<%s> open failed", prv->ZSE->envhome ? prv->ZSE->envhome : "NULL");
(gdb) list
284
285                 err = prv->ZSE->env->open(prv->ZSE->env,
286                                      prv->ZSE->envhome,
287                                      prv->ZSE->envflags,
288                                      prv->ZSE->envmode);
289                 if (err) prv->ZSE->env->err(prv->ZSE->env, err, "envhome
<%s> open failed", prv->ZSE->envhome ? prv->ZSE->envhome : "NULL");
290
291                 if (err) return err; /* Uhh.. */
292             }
293
(gdb) next
router: envhome </opt/mail/db> open failed: Resource temporarily unavailable
(gdb) print prv->ZSE->envhome
$2 = 0x8c09588 "/opt/mail/db"
(gdb) print/o prv->ZSE->envflags 
$3 = 044001  (DB_INIT_CDB | DB_INIT_MPOOL | DB_CREATE)
(gdb) print/o prv->ZSE->envmode 
$4 = 0600


So there is difference from original error diagnostics, but still the thing
refuses to function in presumably good thread environment.

Comment 8 matti aarnio 2003-06-12 11:08:08 UTC
The lattest bug appears to be different behaviour of  O_DIRECT  option
for open(2) in between FreeBSD, and Linux.

In FreeBSD that flag does not bring in Linux's special requirement of
write/read to be done in page size (or exact multiples), and memory areas
beginning at page boundary.

Lattest RedHat Rawhide db4-4.1.25-3.src.rpm  does contain  configure.ac
test that sees, if the O_DIRECT flag functions like FreeBSD expects it to.
Now if compilation machine kernel happens to IGNORE that flag (e.g. is old
enough!), that configuration test produces FreeBSD-like results in compilation,
and resulting binary (with kernel understanding that flag) fails to function!

Your compilation environment needs fixing, then packages need recompiling.

Comment 9 Ulrich Drepper 2004-09-28 06:29:15 UTC
Reassign to db4 again since this is a db4 build issue and no glibc
problems.

Comment 10 Jeff Johnson 2004-11-13 22:37:55 UTC
There are 2 problems here.

The earlier failiure is testing whether posix mutexes are
shreable. Only a kernel that supports futex has shared
posix mutexes. So db4-4.1.25 fails because it is compiled
with --enable-posixmutexes. Either run a kernel that supports
futexes, or build db4 without the --enable-posixmutexes
vonfigure option. Unfortunately, there are no other solutions.

The latter problem has to do with new fangled O_DIRECT semantics
in the kernel that break db-4.1.25 builds. There is a fix in
later db4-4.1.25 src.rpm's, and db-4.2.52 and later have incorporated
the changes with db4 configure to avoid using O_DIRECT.

So the end resuly here is basically WONTFIX in the sense that
the db4 problems are symptoms of other development, and
there is no known answer other than what I've outlined above.