Bug 91933 - Disable posix-mutex/NPTL support
Summary: Disable posix-mutex/NPTL support
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: db4
Version: 2
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jeff Johnson
QA Contact:
URL:
Whiteboard:
: 86381 109922 112159 112673 115306 124792 (view as bug list)
Depends On:
Blocks: FC3Target FC4Target
TreeView+ depends on / blocked
 
Reported: 2003-05-29 23:17 UTC by Enrico Scholz
Modified: 2007-11-30 22:10 UTC (History)
28 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-11-17 13:10:48 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
strace dump (20.37 KB, text/plain)
2003-10-31 02:03 UTC, Enrico Scholz
no flags Details
simple testcase (372 bytes, text/plain)
2003-11-05 17:57 UTC, Enrico Scholz
no flags Details
my spec with which it works (18.82 KB, text/plain)
2003-11-15 11:30 UTC, Tomas Janousek
no flags Details
patch to db4-4.1.25-14 db4.spec to disable nptl threads (and resulting hangs :-) (2.45 KB, patch)
2004-01-11 22:50 UTC, Chris Schanzle
no flags Details | Diff
output from 'ldd -v /lib/libdb-4.1.so' on machine that BerkeleyDB fails (1.14 KB, text/plain)
2004-03-10 13:04 UTC, Andrew Speer
no flags Details
output from 'ldd -v /lib/libdb-4.1.so' on machine that BerkeleyDB works on (1.22 KB, text/plain)
2004-03-10 13:06 UTC, Andrew Speer
no flags Details
glibc.spec diff against 101.4 (1.03 KB, patch)
2004-03-24 15:43 UTC, Toshio Kuratomi
no flags Details | Diff
add --disable-pthreadsmutexes option (2.68 KB, patch)
2004-05-04 01:21 UTC, Radu Greab
no flags Details | Diff
rpm spec patch for db4-4.2.52-3.1 (1.55 KB, patch)
2004-05-04 01:22 UTC, Radu Greab
no flags Details | Diff
wrapper script around db_stat (206 bytes, text/plain)
2004-05-07 02:12 UTC, Radu Greab
no flags Details
updated rpm spec file for #59 (3.87 KB, patch)
2004-05-07 02:13 UTC, Radu Greab
no flags Details | Diff
Patch against FC3 db4.spec (2.42 KB, patch)
2005-07-10 20:18 UTC, Pedro Lamarão
no flags Details | Diff

Description Enrico Scholz 2003-05-29 23:17:52 UTC
Description of problem:

db4 should never be compiled with '--enable-posixmutexes' because it causes too
much problems which can not be solved. 

The current (posix-enabled) db4 does not work with non-NPTL kernels and so
packages like subversion can not operate on systems with vanilla kernels or RHL
kernels for i[345]86 architectures.



Version-Release number of selected component (if applicable):

db4-4.1.25-1

Comment 1 Jeremy Katz 2003-10-21 19:39:10 UTC
4.1.25-12 has this

Comment 2 Bill Nottingham 2003-10-21 19:39:24 UTC
Fixed in current builds, have both nptl and non nptl db.

Comment 3 Enrico Scholz 2003-10-31 02:01:06 UTC
I still get

| svnadmin create /tmp/x1
| svn: Berkeley DB error
| svn: Berkeley DB error while creating environment for filesystem /tmp/x1/db:
| Invalid argument

--------

db4-4.1.25-14
subversion-0.32.1-1
glibc-2.3.2-101


Comment 4 Enrico Scholz 2003-10-31 02:03:32 UTC
Created attachment 95622 [details]
strace dump

I forgot: vanilla 2.4.22 kernel on an i686

Comment 5 Enrico Scholz 2003-11-05 17:57:12 UTC
Created attachment 95736 [details]
simple testcase

$ gcc locktest.c -l db -o locktest
$ mkdir .dbtest
$ rm -f .dbtest/* ; ./locktest 
open: Function not implemented
$ rm -f .dbtest/* ; LD_ASSUME_KERNEL=2.2.5 ./locktest 
open: Invalid argument

Comment 6 Joe Orton 2003-11-13 08:58:00 UTC
*** Bug 109922 has been marked as a duplicate of this bug. ***

Comment 7 Joe Orton 2003-11-13 22:34:56 UTC
109922 is another reporter running an i586 kernel, hence using
/lib/libdb-4.1.so - Nalin, I thought this was supposed to be Really
Fixed Now Once And For All?


Comment 8 Tomas Janousek 2003-11-15 11:30:13 UTC
Created attachment 95993 [details]
my spec with which it works

I've got it working. I edited the spec file as in the attachment (i added
--disable-posixmutexes, maybe some other changes, i do not remember, i played
with it too much :)
Also, i renamed /lib/tls to /lib/tls.zal (zal as zaloha - backup in english)
and /usr/lib/tls to /usr/lib/tls.zal
Iw works!! (2.4.22 kernel patched with grsec etc. - no NPTL)

Comment 9 Jeff Johnson 2003-12-13 23:00:50 UTC
Here's the locktest.c results w db-4.2.52 includu=ing +/- nptl
libraries and with the kernel abi-note:
    $ cc -o t -I/usr/include/db4 -ldb-4.2 t.c
    $ mkdir .dbtest
    $ ./t
    $ rm .dbtest/*
    $ LD_ASSUME_KERNEL=2.2.5 ./t
    open: Invalid argument


Comment 10 Jeff Johnson 2003-12-13 23:09:22 UTC
*** Bug 86381 has been marked as a duplicate of this bug. ***

Comment 11 Jeff Johnson 2003-12-13 23:14:48 UTC
Note 2.6.0 kernel though ...

Comment 12 Jeff Johnson 2003-12-15 18:55:40 UTC
*** Bug 112159 has been marked as a duplicate of this bug. ***

Comment 13 Tomas Janousek 2003-12-15 18:58:09 UTC
Seems that it's significant bug when duplicate bugs are appearing
often, doesn't it? So what about solving it? I posted working .spec,
so where's the problem?

Comment 14 Enrico Scholz 2003-12-16 04:18:54 UTC
@ comment #9:

same with db4-4.2.52-1:

$ rm -rf .dbtest; mkdir .dbtest; ./a.out 
open: Function not implemented

$ rm -rf .dbtest; mkdir .dbtest; LD_ASSUME_KERNEL=2.2.5 ./a.out 
open: Invalid argument

$ uname -sr
Linux 2.4.23ensc-2


Comment 15 Jeremy Van Veelen 2003-12-27 16:58:02 UTC
I've successfully compiled db4 using Tomas's spec file and it creates
subversion repositories just fine now. :)

My specs:
Linux h24-69-77-88 2.4.22-1.2115.nptl #1 Wed Oct 29 15:20:17 EST 2003
i686 i686 i386 GNU/Linux
running vanilla (stable updates only) fedora core 1

Comment 16 Chris Schanzle 2004-01-11 22:44:15 UTC
I'm disappointed this hasn't gotten more attention.  I think there is
a non-intel processor problem here.

Cyrus won't run on non-intel processers with the supplied nptl-enabled
db4 (tested on VIA C3 and Athlon XP).  Tried stock Fedora kernels OR
2.4.{22,23,24}.  Works fine with db4 compiled w/o nptl threads.  On an
Intel P4 laptop (Dell 5150), cyrus ran out of the box just fine.

An Athlon 3000+ system originally (months ago) installed and up2dated
just fine.  I wiped it clean and started fresh, and with a bunch of
updates now, up2date will just hang in a variety of spots.  Installing
the non-nptl db4 made everything happy.

I widdled down the changes to the spec required to get no_nptl version
of db4.  Will attach it.

Comment 17 Chris Schanzle 2004-01-11 22:50:36 UTC
Created attachment 96883 [details]
patch to db4-4.1.25-14 db4.spec to disable nptl threads (and resulting hangs :-)

Comment 18 Leonard den Ottolander 2004-01-12 13:27:24 UTC
Thanks for that patch. Might come in handy when I want to make rpm
running on my Cyrix P166 (bug #103078).


Comment 19 Leonard den Ottolander 2004-01-12 13:34:19 UTC
You might want to chop the whole fourth hunk from the patch as it is a
result of adding unnecessary spaces only.



Comment 20 Need Real Name 2004-01-19 01:33:44 UTC
Would someone be so kind to post some RPMs with the new spec?

Comment 21 Aleksey Nogin 2004-01-19 05:32:36 UTC
*** Bug 112673 has been marked as a duplicate of this bug. ***

Comment 22 Thomas Zehetbauer 2004-01-19 11:58:48 UTC
Got hit by this bug too, Fedora Core with db4-4.2.52-1 and vanilla
2.4.24 kernel; tried multiple versions of subversion; solved for one
machine by upgrading to kernel 2.6.1; unsolved for the other (because
2.6.1 is still broken http://bugme.osdl.org/show_bug.cgi?id=1855)

Comment 23 Tomas Janousek 2004-01-19 18:18:57 UTC
RPMS here...

http://tomi.nomi.cz/download/db4-no-nptl/

Comment 24 Need Real Name 2004-01-20 16:00:48 UTC
I have installed db4 from the RPMS that Tomas Janousek and I am getting:

[root@kbreit kbreit]# svnadmin recover /web/svn/
Acquiring exclusive lock on repository db.
Recovery is running, please stand by...
Recovery completed.
svn: Berkeley DB error
svn: Berkeley DB error while opening 'uuids' table for filesystem
/web/svn/db:
Invalid argument

This seems to be the same problem.  Is it?

Comment 25 Tomas Janousek 2004-01-20 16:06:17 UTC
Hmmm :(
But that RPMS work for me (2.4.24 + some patches)

Comment 26 Need Real Name 2004-01-20 16:09:34 UTC
It seems to be there was a corrupt db file.  I redid the repository
and it works.  I take back my old comment.

Comment 27 Thomas Zehetbauer 2004-01-20 16:11:33 UTC
@Thomas Janousek: These are RPMS of db4-1.25 but Fedora currently
comes with and depends on db4-4.2.52


Comment 28 Tomas Janousek 2004-01-20 16:14:06 UTC
To Thomas Zehetbauer:
sure about it? yum update; rpm -q db4 shows 4.2.25, doing any mistake?

Comment 29 Need Real Name 2004-01-20 16:29:45 UTC
Does 4.2.25 have the same posix-mutex/NTPL support issues that we've
been discussing here?

Comment 30 Thomas Zehetbauer 2004-01-20 17:14:14 UTC
Thomas Janousek: No, rpm -q db4 shows db4-4.2.52-1

mrproper: It seems that the current db4-4.2.52-1 build has
the same NPTL issue; strace shows that set_thread_area(...) fails;
svnadmin says "svn: Berkeley DB error while creating environment for
filesystem svn-test/db: Invalid argument"; the problem goes away with
kernel 2.6

Comment 31 Tomas Janousek 2004-01-20 19:07:23 UTC
To: Thomas Zehetbauer
Using Fedora Core 1, having db4-4.1.25, understood?

Comment 32 Toshio Kuratomi 2004-02-01 05:01:56 UTC
Does Comment #30 mean db4 will work correctly on i[345]86's with the
latest development db4 _and_ kernel 2.6 or that one problem has been
solved but other(s) remain (as in Thomas Zehetbauer's original Comment
#22)?

Comment 33 Toshio Kuratomi 2004-02-02 04:42:57 UTC
Okay, so here are some dumb questions that may be useful or may be
barking up the wrong tree:

1) Is the purpose of /lib vs /lib/tls to provide libraries for linking
on non-NPTL and NPTL enabled kernel revs/architectures?
2) Would simply adding --disable-posixmutexes when building the
dist/non-tls version of db4 while leaving posixmutexes enabled for the
dist/tls version make things work for everyone?

Related question:
1) Does the kernel NPTL really not work for i586 and lower
architectures?  Someone reported that he had success with the svnadmin
create command on his machine after recompiling _glibc_ to include
NPTL on his i586.  Has he created bugs in his glibc thread
implementation or is NPTL support available in the kernel for more
architectures and just needs to be enabled at the glibc level?

Comment 34 Joe Orton 2004-02-11 09:49:16 UTC
*** Bug 115306 has been marked as a duplicate of this bug. ***

Comment 35 Andrew Speer 2004-03-10 13:00:27 UTC
I have just come across this bug also, using Perl CPAN module 
BerkeleyDB. 
 
The module worked OK for me on one machine, but not on another (both 
identical FC 1 machines using db4-4.1.25-14 (or more precisely 
db4-devel-4.1.25-14, as that is what the CPAN module linked 
against).   
 
The only difference I can see in the machines is that one is a 
Genuine Intel Processor machine, and one is a VIA C3 processor. Not 
sure how this can be relevant, but it is the only diff I can see. 
Both are running FC1, and both had all updated installed thet were 
available at the time of this post. 
 
Looking at the "ldd -V" outputs I can see that they are different 
(supplied as attachments). That is about the limit of my fault 
finding ability for this sort of problem. If I can do any other 
testing to help please let me know. 
 
I installed Tomas Janousek's RPMS from the link supplied and my 
BerkeleyDB module sprang into life, so (for me) his RPMS fix the 
problem. 
 
Hope this info helps ... 
 

Comment 36 Andrew Speer 2004-03-10 13:04:58 UTC
Created attachment 98423 [details]
output from 'ldd -v /lib/libdb-4.1.so' on machine that BerkeleyDB fails

Comment 37 Andrew Speer 2004-03-10 13:06:05 UTC
Created attachment 98424 [details]
output from 'ldd -v /lib/libdb-4.1.so' on machine that BerkeleyDB works on

Comment 38 Leonard den Ottolander 2004-03-10 13:27:30 UTC
Andrew,

The CPU architecture most probably is relevant, because of the
supported instructions and registers (also see bug 103078). Do a
$ cat /proc/cpuinfo
and look for "tsc" and "cmov". Any of these missing? If so disable all
occurences of --enable-posixmutexes (see "comment #0").


Comment 39 Tomas Janousek 2004-03-10 13:31:10 UTC
Leonard,
my RPMS do exactly that - disable posixmutexes.

Comment 40 Andrew Speer 2004-03-12 01:18:37 UTC
Leonard,

You are correct about the missing flags:

Intel CPU flags:

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 mmx fxsr sse

Via CPU flags:

fpu de tsc msr cx8 mtrr pge mmx 3dnow

The cmov flag is missing from the VIA CPU flag set. 

I guess my question is whether there is any advantage in having
posix-mutexes enabled in the default package, because of problems
people will get when trying to use programs like Subversion etc on
non-Intel CPU's.

FYI I had the 2.6.0 kernel installed on the VIA box before I reported
this bug, and that was the kernel version under which I first saw the
bug. 

A part of the bug tracing process I downgraded to the 2.
4.22-1.2129.nptl kernel. So if it is of any use I can say that the
"official" db4 packages did not work on my VIA processor machine under
either 2.6 or 2.4.ntpl kernels. 

Since I have access to both types of CPU's and can install 2.4,
2.4.ntpl, and 2.6 kernels is there any testing you would like me to do
to see if we can get one pacakge that works with all 3 kernels, and on
both CPU types ?



Comment 41 Toshio Kuratomi 2004-03-24 15:24:53 UTC
After some googling, bugzilla searching, and package testing I think
that glibc is a partial culprit in this mess.  It seems that i486 and
above processors should be capable of handling NPTL, but the
i486=>i586 (and some non-Intel processors) get the i386 glibc rpm
instead of the i686 rpm.  Therefore they're getting a glibc that can't
handle NPTL while their processors are capable of it.

Here's what I did to test: downloaded the 101.4 glibc SRPM.  Edited a
few macros at the top of the spec to target the i486 for NPTL and TLS.
 rebuilt the rpm with rpmbuild -ba --target=i486 glibc.spec

Installed the rpms on my K6-MMX (with tsc but without cmov) and viola!
svnadmin create works.

I don't have an i486 around to test on.  Could other people try these
out and see if it works on their processors?  (But be careful to have
some statically linked recovery methods worked out -- if the glibc
doesn't work for you, there could be some pretty nasty recovery
situations.)  (And be sure to have the standard db4, not the nptl
disabled version otherwise the test won't prove much.) RPM location:
http://www.tiki-lounge.com/~toshio/nptl/

As rpm no longer runs on i386 (Bug #103078) I'd like to see glibc
compiled for i486 be the least common-denominator target instead of
i386.  (I ran across references to Debian and SuSE doing this in my
googling but haven't checked out the package archives to see what
exactly this means.)  This way we have a glibc which can perform these
functions across all platforms (Correct me if I'm overly optimistic).

It's a bit too hard for me to generate new glibc packages every time I
want to upgrade as it's a tremendous time and disk hog (~2GB of HD
space.)  And I don't understand what all the *arches macros in the
spec file mean.  I would be much relieved if someone who better
understood glibc were to take care of targetting the proper CPUs in
the proper macros.

So whaddya say?  Retarget this bug for glibc and make the next release
with another binary architecture?

Comment 42 Toshio Kuratomi 2004-03-24 15:43:04 UTC
Created attachment 98829 [details]
glibc.spec diff against 101.4

Complete spec is on the website mentioned above.

Have not tested the k6 additions yet, only the i486 ones.

Comment 43 Ian Soboroff 2004-03-25 15:49:03 UTC
I can verify this bug on Fedora 1 (testing) on my laptop, which is a
Transmeta-based Fujitsu P-2110.  I have the i686 glibc and DB
4.1.25-14 gives me the DB_PRIVATE error.  I tried Tomas's RPMs and
modifying the SPEC myself to just add --disable-posixmutexes, but that
didn't work for me.

I found success by building DB 4.2.52 from the source tarball from
Sleepycat, installing in /usr/local/db-4.2, and pointing my
application at that.  I didn't even specify a posixmutexes argument to
configure, but let it sort it out.  The only hitch was that db_dump185
wouldn't build.

Unless 4.2 has some magic fix for this problem, I suspect that the RPM
wizardry in the db-4.1.25 spec file is bogus for some configurations.

My application btw is a Java app which uses DB's Java API.  I haven't
even attempted to point, say, rpm at the new DB install.

Comment 44 Ivan Zilic Schmidt 2004-03-29 01:27:29 UTC
This is a very big problem. Cyrus iamp wont work with the original db4
rpms, i tried the proposed db4.1 worked fine with cyrus-imap but now i
can not compile sendmail-8.11.7. I get incompatible type for argument
4 of indirect function call when trying to compile sendmail. I could
downgrade to the 2.4.20 kernel, but that kernel doesnt recognize my
southbridge (via 8237) and my hardrive wont work with DMA. Im so
dissapointed. 5 years using linux and never saw something like this. 

Comment 45 Rex Dieter 2004-03-30 16:53:21 UTC
Regarding comment #42 and comment #43, I don't think rebuilding glibc
patched to provide tls support for pre i686 is a (fully) proper
solution.  

I did that (on a rh90 k6 box), and everything *seemed* fine for
awhile, at least until I tried to use mozilla, which now crashes silently.

Comment 46 Toshio Kuratomi 2004-03-30 17:52:40 UTC
Rex: How does mozilla crash?  Do you have any other debugging info? 
I'm using Mozilla right now without crashing so far but this is the
first time since I rebuilt libc.  I'll do my web browsing from there
for a while to see what happens but if you could tell me how long I
should try running it in order to make it crash on me I'd appreciate it.

The %changelog lists several fixes to glibc's NPTL since rh90 but I
don't know if one of those might have fixed things or not.

Comment #43: What is the DB_PRIVATE error?

Comment 47 Rex Dieter 2004-03-30 18:28:09 UTC
> how does mozilla crash?

mozilla(-1.6) dies silently on startup.  
$ LD_ASSUME_KERNEL=2.4.18 mozilla
works.
$ mozilla
doesn't.

Comment 48 Toshio Kuratomi 2004-03-30 20:11:28 UTC
I've got a fresh download of mozilla-1.6 from mozilla.org and my
recompiled glibc-2.3.2-101.4 on FC1 and there's no crash on startup. 
Maybe one of the fixes between rh9 glibc and 101.4 fixed it?

Comment 49 Enrico Scholz 2004-03-30 20:22:56 UTC
Recompiling glibc is not an option for me since NPTL is a kernel issue
too, and I have to use a vanilla kernel.

Comment 50 Tomas Janousek 2004-03-30 20:27:37 UTC
We all know that it is not needed to argue that you need to use
vanilla because any Linux distro _MUST_ allow use of _AT LEAST_
vanilla kernel!

Comment 51 Toshio Kuratomi 2004-03-30 21:00:26 UTC
What about kernel 2.6?  Will the NPTL in vanilla 2.6 coupled with a
recompiled glibc work (for FC2)?  If so, a glibc targetted for i486
would fix things for FC2.

I see your point that FC1 NPTL is problematic, though.  I agree that a
db4 update would be appropriate (or for Fedora Alternatives to
materialize.)

Comment 52 Tomas Janousek 2004-04-16 14:29:59 UTC
RPMs of non-NPTL db 4.2.52 at http://tomi.nomi.cz/download/db4-no-nptl/
(build from SRPM got for subversion 1.0.1 - thus for Fedora Core 1)

Comment 53 Chris Wilson 2004-04-23 22:28:47 UTC
I used Tomi's latest RPMs (comment #52) and now I can create 
repositories without problems (using vanilla 2.6.3). Before that, I 
always got "invalid argument", even using db42 which I build myself 
without NPTL support, by removing everything from %{nptl_arches} and 
%{nptl_java_arches}. 
 
Thanks Tomi! 
 
Redhat, PLEASE STOP MESSING ABOUT WITH NPTL! It's not safe and not 
kind! 
 
Cheers, Chris. 
 

Comment 54 Paul Jakma 2004-05-03 11:18:53 UTC
I have this problem with FC 1.92 and cyrus-imapd:

May  3 12:01:01 edwards sieve[20890]: DBERROR db4: Berkeley DB library
configured to support only DB_PRIVATE environments
May  3 12:01:01 edwards sieve[20890]: DBERROR: dbenv->open
'/var/lib/imap/db' failed: Invalid argument
May  3 12:01:01 edwards sieve[20890]: DBERROR: init() on berkeley

The machine is a Compaq Deskpro 4000 with an Intel Pentium 233MMx
(P55C? F00F bug), CPU flags are:

flags           : fpu vme de pse tsc msr mce cx8 mmx

No CMOV flag listed. It has an i386 glibc, as expected (there are no
i586 builds:

# rpm -q --qf '%{ARCH}\n' glibc-common
i386

Kernel: 2.6.3-1.96

Exporting LD_ASSUME_KERNEL=2.4.18 in /etc/sysconfig/cyrus-imapd does
not fix the problem unfortunately.

Is there any hope this will be fixed some time soon? I also have K6
machines which use i386 glibc's. And where is the problem exactly, in
the i386 compiled glibc or in the kernel, or both? What needs to be
recompiled to work around this problem?


Comment 55 Chris Wilson 2004-05-03 11:25:14 UTC
Paul, please try following Tomi's instructions in Comment #52 to 
build a new DB4 RPM with NPTL disabled, and then install it, and see 
if it fixes your problem (it did for me). 

Comment 56 Radu Greab 2004-05-04 01:19:20 UTC
I had to build db4-4.2.52-3.1 for RH 7.3, 8.0 and 9 in order to
install subversion on some systems and I encountered the same problem.

The solution for me was to patch db4's configure script to add an
option, --disable-pthreadsmutexes, to be used when configuring the
non-NPTL version of db4.

The actual type of mutexes used on non-NPTL builds,
POSIX/pthreads/library/private, is not good because this type of
mutexes works only for applications that open the database in the
DB_PRIVATE mode (only one process will access the database). The above
logtest.c works with the non-NPTL version if the DB_PRIVATE flag is
added to the open call. When --disable-pthreadsmutexes is used, db4 is
forced to ignore all types of POSIX threads mutexes and use another
type of mutexes, x86/gcc-assembly on x86. x86/gcc-assembly mutexes are
used for example on RH 7.3 and RH 8.0.

So, with the patch to the configure script and the spec file,
POSIX/pthreads/library mutexes are used by the NPTL version of db4,
while x86/gcc-assembly mutexes are used by the non-NPTL version (other
gcc-assembly mutexes should be used on the other platforms).

The resulting db4 rpm passed the tests (logtest.c above and "svnadmin
create") on the following:
RH 9: PPro (distribution i686 kernel, glibc i686 -> NPTL system), AMD
K6 (distribution i586 kernel, glibc i386 -> non-NPTL system), Athlon
(distribution kernel i686, glibc i386 -> non-NPTL system)
RH 8.0 PIII (non-NPTL system)
RH 7.3 PII (non-NPTL system)

I did not completely rebuild the rpm on FC1 and FC2, only run a
rpmbuild -bc to check that the correct type of mutexes is selected:
FC2 test2: 
$ grep db_cv_mutex rpm/BUILD/db-*/dist/dist-*/config.cache
rpm/BUILD/db-4.2.52/dist/dist-notls/config.cache:db_cv_mutex=${db_cv_mutex=x86/gcc-assembly}
rpm/BUILD/db-4.2.52/dist/dist-tls/config.cache:db_cv_mutex=${db_cv_mutex=POSIX/pthreads/library}

FC1:
$ grep db_cv_mutex rpm/BUILD/db-*/dist/dist-*/config.cache
db_cv_mutex=${db_cv_mutex=x86/gcc-assembly}
rpm/BUILD/db-4.1.25/dist/dist-notls/config.cache:db_cv_mutex=${db_cv_mutex=x86/gcc-assembly}
rpm/BUILD/db-4.1.25/dist/dist-tls/config.cache:db_cv_mutex=${db_cv_mutex=POSIX/pthreads/library}

If the patches are accepted, could Red Hat push upstream the configure
patch, after it is eventually cleaned/improved? Thanks!


Comment 57 Radu Greab 2004-05-04 01:21:26 UTC
Created attachment 99936 [details]
add --disable-pthreadsmutexes option

Comment 58 Radu Greab 2004-05-04 01:22:55 UTC
Created attachment 99937 [details]
rpm spec patch for db4-4.2.52-3.1

Comment 59 Radu Greab 2004-05-07 02:10:31 UTC
More testing revealed two problems with the resulting rpm when
gcc-assembly mutexes are used for the non-NPTL version of db4.

The cause of the problems is the way shared memory regions are
implemented and used by db4. The size and layout of these regions
depend on the mutex type. The shared regions are different between
the two versions of db4 because the size of the pthread mutexes is
different than the size of gcc-assembly mutexes. See
file:///usr/share/doc/db4-devel-4.2.52/ref/env/region.html
and db-4.2.52/dbinc/region.h for details.

When the regions are created in the per-process heap memory, there are
no problems because the memory is private to the process.

When the regions are created in the system memory without file backing
or system memory backed by the filesystem, we may encounter the first
problem. It is obvious that applications using different type of
mutexes should not access concurrently the same database
environment. But the same application may use different mutex
implementations (machine booted with NPTL kernel, then rebooted with
non-NPTL kernel) or more applications using different type of mutexes
access sequentially the same database environment. When the mutex type
becomes different, the __db.### files used to back the shared memory
regions become invalid and may cause the application to stall or
crash.

Example with the logtest.c program, on a NPTL system:
$ rm .dbtest/*
$ ./locktest
$ ls -l .dbtest/
total 32
-rw-rw-r--    1 machbuild machbuild    16384 May  6 21:47 __db.001
-rw-rw-r--    1 machbuild machbuild   270336 May  6 21:47 __db.002
-rw-rw-r--    1 machbuild machbuild   450560 May  6 21:47 __db.003
$ LD_ASSUME_KERNEL=2.4.1 ./locktest
open: Resource temporarily unavailable

The region files can be removed by applications when they shut down,
with DB_ENV->remove, but probably few applications do this.

So when the mutex type changes, the user has to manually remove the
region files if the applications don't do this themselves.


The second problem is smaller and easily worked around: db_stat, to
display environment statistics, makes use of the db4 internal data
structures describing regions and becomes dependent at compile time on
the mutex type used by the library. It seems that the other binaries
from db4-utils are not affected by this. So I made a wrapper script
around db_stat that detects if the system is NPTL or not and uses the
NPTL or non-NPTL db_stat.


I will attach this wrapper script and an updated rpm spec file for
review. I also built rpms for RH 9.0, FC 1 and FC 2 test 3, but I have
no place to host them online. If someone has the place and interest to
host them, please contact me.


The long term solution to these problems seems to be an improvement to
db4 to do runtime detection of available mutex types on Linux: if a
private database environment is not requested, then it should try to
use pthread mutexes on NPTL systems and gcc-assembly mutexes on
non-NPTL systems. The regions should record the type of mutexes used,
become independent on mutex size and allow on-the-fly conversion of
regions and backing files from one mutex type to another (of course
only if no other applications is using the database environment).


What is the opinion of Red Hat and others? Is this solution with
gcc-assembly mutexes for non-NPTL systems good enough? Is Red Hat
waiting/working on a longer term fix as envisioned above, other type
of fix, waiting for non-NPTL systems to become extinct?


PS: the updated rpm spec file patch has two minor fixes:
- cxx_common.h and cxx_except.h don't exist anymore
- the %files sections were including the %{_libdir}/libdb.so,
  %{_libdir}/libdb_cxx.so etc. links, but these links have not been
  created

I have read that Ulrich Drepper said the db4 should always be linked
with -lpthread and I added it to the build function because in the
case of gcc-assembly mutexes db4 does not link by default with
pthread.


Comment 60 Radu Greab 2004-05-07 02:12:16 UTC
Created attachment 100059 [details]
wrapper script around db_stat

Comment 61 Radu Greab 2004-05-07 02:13:22 UTC
Created attachment 100060 [details]
updated rpm spec file for #59

Comment 62 Joe Orton 2004-05-30 20:59:05 UTC
*** Bug 124792 has been marked as a duplicate of this bug. ***

Comment 63 Chris Croome 2004-06-08 15:13:32 UTC
I have just upgraded a RedHat 9 box that has a AMD K6 300 processor to
Fedora 2. 

I was using the subversions RPMS from David Summers
(http://summersoft.fay.ar.us/pub/subversion/latest/) and before doing
the upgrade I dumped all my repos.

After the upgrade I can't create new repos:

 # svnadmin create test
 svn: Berkeley DB error while creating environment for filesystem 
test/db:
 Invalid argument

I did try installing the RPMS from
http://www.tiki-lounge.com/~toshio/nptl/ however I get:

 # rpm -Uvh /home/chris/src/db42-*
 /etc/security/selinux/file_contexts: No such file or directory
 error: Failed dependencies:
        db4 = 4.2.52 is needed by (installed) pam-0.77-40
        db4 is needed by (installed) postfix-2.1.1-5.tls.fc2
        db4-devel is needed by (installed) apr-util-devel-0.9.5-0.1

So I'm going to re-read this thread and see if I can work out what to
do next... :-/


Comment 64 Aleksandar Milivojevic 2004-06-22 05:14:44 UTC
Radu,

Is there any reason for having different types of mutexes for NPTL and
non-NPTL systems?  Why not use gcc-assembly on both and ditch the
Posix ones completely as it was initially suggested?

BTW, I'm recompiling db4 RPMs on FC2 using your patches as I'm writing
this (kinda slow on 200MHz Pentium MMX).  Hopefully Cyrus IMPAD will
start working after this...

Comment 65 Radu Greab 2004-06-22 13:20:04 UTC
The POSIX mutexes are based on a standard while the gcc-assembly ones
are not. And I suppose that NPTL is both the present and the future.


Comment 66 Aleksandar Milivojevic 2004-06-24 03:59:06 UTC
Chris,

I've rebuilt db4 RPM's for Fedora Core 2 using (current) Radu's
patches.  It seems that everything is working fine with them (at least
on my Pentium MMX).  If you are interested, you can download them from:

   http://24.79.220.4/db4/

It's dynamically allocated IP that changes once or twice a year at
most.  I hope that Red Hat / Fedora folks will have offical fix before
it changes next time ;-)

Do something like "rpm -Uhv --replacepkgs --replacefiles" to install
those that you need (replacepkgs is needed since you already have the
same version installed, I'm not sure why replacefiles is needed).

Comment 67 Martin Gregory 2004-09-25 02:36:48 UTC
Can someone post some simple instructions about how to fix the problem?

I tried installing the db42 rpm posted above, but it was pretty clear
that it didn't replace the existing db4 libraries as far as svn was
concerened.

I tried various combinations of uninstalling db4 before installing
db42, but it didn't help.

I don't know what to do with an "rpm spec".

Help!  Thanks!

Comment 68 Aleksandar Milivojevic 2004-09-30 14:53:09 UTC
Martin, have you tried installing db4 packages from my web site?  See
comment #66 for details.  Those are the same as original db4 packages,
built using patches from Radu that disable NPTL on architectures that
do not support it.  Even the version number is the same (might be good
idea if it was different, but oh well).

Comment 69 David Rees 2004-10-13 16:12:01 UTC
I can verify that building new db4 packages with the db_stat_wrapper
and db-4.2.52-disable-pthreadsmutexes.patch allows me to build svn
repositories on a K6-2 machine now.

Comment 70 Matt England 2004-10-14 06:00:36 UTC
I could not get any of the above methods/processes to work on my 
system.  I instead used the following:

http://www.svnforum.org/forum/viewtopic.php?p=350#350

See more of the overview of this stuff at:

http://www.svnforum.org/forum/viewtopic.php?p=350

-Matt

Comment 71 Matt England 2004-10-14 13:21:14 UTC
[quote="mattengland"]http://rpmfind.rediris.es/rpm2html/redhat-8.0-i386/
db4-devel-4.0.14-14.i386.html
http://rpmfind.rediris.es/rpm2html/redhat-8.0-i386/db4-4.0.14-14.i386.
html[/quote]

Can anyone verify (officially or not) that these rpms are legitimate?

So far I have seen no problems with my subversion system that uses them.

-Matt

Comment 72 Matt England 2004-10-14 13:23:05 UTC
Oops, I should have edited this last post a little more thoroughly before 
posting, here's an update:

From http://www.svnforum.org/forum/viewtopic.php?p=350#350 :

http://rpmfind.rediris.es/rpm2html/redhat-8.0-i386/db4-devel-4.0.14-14.
i386.html
http://rpmfind.rediris.es/rpm2html/redhat-8.0-i386/db4-4.0.14-14.i386.
html

Can anyone verify (officially or not) that these rpms are legitimate?

So far I have seen no problems with my subversion system that uses them.

-Matt

Comment 73 Matt England 2004-10-14 13:24:51 UTC
Darnit, the above links to the rpmfind website still hard-wrap.  Sorry, I 
know of no way around this for this this comment-posting engine.  You'll 
have to hand-copy/cut-and-past the above URLs (or visit them from the 
snvforum.org link).

Thanks for any help,
-Matt

Comment 74 Jeff Johnson 2004-11-14 03:58:11 UTC
OK, it's approaching time to close this bug because it is
attached to the FC4 tracking bug, and I believe it is better
to make the choice early in the release cycle rather than later.

The choice for RHEL and Fedora Core is going to be compiling
with --enable-posixmutexes. That is what has been done for Red
Hat distros since RHL 9, and so is likelier to provide a
consistent, stable and predictably performing platform.

So unless I hear compelling reasons otherwise, I'm going
to close this WONTFIX mid-next week.



Comment 75 Jonathan Kamens 2004-11-14 04:08:31 UTC
Jeff, can you please clarify exactly what will break if what you
propose is done.  This ticket is simply too long and has too many
conflicting details in it for me to be able to figure it out.

Will db4 compiled as you've proposed work with non-NPTL kernels?

Will db4 compiled as you've proposed work with non-Intel processors?


Comment 76 Jeff Johnson 2004-11-14 14:05:00 UTC
Nothing will break, at least for Red Hat packages. I am
basically just saying that db4 will continue to be built
as it has since RHL9, because that's the best answer imho.

The issue is that the "official" 2.4 kernel does not support
NPTL. So db4 with --enable-posixmutexes as built by Red Hat
will break if/when a kernel that does not support NPTL is booted.
The behavior of db4 within applications can also change in
surprising ways if/when LD_ASSUME_KERNEL is used inappropriately.

You cannot simply bottom-line the problem and simply ask "work?".
The issue with the reproducer has to do with a dbenv. The major
applications that use a dbenv are rpm (which has internal db4
so that it can be built with or without posix mutexes to "work")
and subversion, (which could internalize db4 similarly to rpm,
but chooses to use system libdb instead). Other applications,
like perl/python/ruby usually do not need dbenv locking.

Shared posix mutexes as used by db4 cannot work without NPTL
(which is a set of measure 0 for Red Hat built packages, db4
in AS2.1 is built differently).

db4 compiled to use posix mutexes works fine on all platforms
that support NPTL. In fact, from my experience with rpm, and from
private mail from Sleepycat, and from the locking scheme rationale
written up within db4 sources, db4 works better with posix mutexes
than with other possible locking schemes, all platforms.

So the issue is whether to compile for portability and lowest
common denominator to minimize risk, or to continue with the
known good (for platforms that support NPTL, that is all 2.6
kernels, and all Red Hat platforms except AS2.1 where db-4.0.14
is compiled differently) scheme used since RHEL9.

No matter what, all Red Hat platforms since RHL9 have been
compiled with --enable-posixmutexes, and any change is very
unlikely to be backported and deployed everywhere. Changing
the locking scheme at this point is hardly feasible, there's
already a boatload of software that has db4 compiled with
--enable-posixmutexes.

So some decision has to be made, and I'm trying to make the
decision early in the fc4 devel cycle rather than later so
that discussion is possible.

And yes, this ticket has way too many issues to sort out,
the other reason to close. I encourage anyone who does have
specific problems with Red Hat packaging to open a separate ticket.




Comment 77 Radu Greab 2004-11-14 14:57:57 UTC
db4 not working completely on non-NPTL x86 machines is the single
issue in this ticket. Yes, db4 has been compiled with
--enable-posixmutexes since RHL9, causing subversion, cyrus-imapd and
maybe other applications to not work on non-NPTL x86 machines.

Now glibc from FC3 made NPTL available on i486 and i586 too, leaving
without NPTL support only i386 machines. If Red Hat says that non-NPTL
platforms are not supported anymore, then the ticket could be closed.
If not, a solution has to be found and implemented.

Jeff, I don't know if you read it, but in #56 I offered an alternative
to completely disable POSIX mutexes, as requested initially. The
alternative is to have one single db4 rpm containing db4 with
--enable-posixmutexes on NPTL platforms, and db4 with gcc-assembly
mutexes on non-NPTL platforms.

Comment 78 Jeff Johnson 2004-11-14 18:41:04 UTC
Yup, db4+posixmutexes not working on non-NPTL is the issue here.

And what I'm saying -- to make it perfectly clear -- is
that this bug will be closed WONTFIX because no Red Hat
platform is non-NPTL.

Disabling posix mutexes is no option either imho, as that introduces
yet another incompatible change variable into an already complicated
puzzle.

Adding Yet Another Build of db4 within the db4 package is possible,
but adds another level of complexity to an already complicated
problem.

There are two other possibilities that you do not mention
(but I suspect you know ;-):

a) internalizing db4 in important applications like svn and httpd
so that each application can choose whatever locking scheme suits
it best. This in fact is what Sleepycat recommends many years now.

b) building a separate package other than "db4" that compiles
db4 appropriately for non-NPTL applications. Much of the content
of this bug is coordinating a build of db4 that removes nptl,
well jknown.

I'll be happy to do b) and maintain outside of RHEL and FC if
you wish. It's far easier to do that than to try to achieve consensus
on how db4 should be built for all possible distros and kernels
and glibc and applications and ...

Comment 79 Aleksandar Milivojevic 2004-11-14 20:53:56 UTC
Jeff,

Looking at release notes for Core 3, they state that all Pentium
(i586) class processors are supported.  If that is the case, either
NPTL should be backported to all Pentium class processors (which would
automatically fix this bug), or applications and/or libraries should
be fixed in such a way so that they don't utilize NPTL on older Intel
processors.  In later case, IMHO, we should either have db4 library
that detects this during runtime, or we should have two separate
packages (one for i586, and another one for i686).

Either that, or change release notes for Core 4 to state that only
Celeron/Pentium III (i686) class processors and newer are supported. 
I have a Fedora Core 2 machine that has (currently) officially
supported Intel Pentium MMX processor.  And I'm hit by this bug.

Comment 80 Jonathan Kamens 2004-11-14 21:33:44 UTC
Jeff, I would love to be using an official Red Hat kernel with NPTL
support so that this problem would just go away for me.  The problem
is that I can't use the 2.4.x Red Hat kernels either because they lock
up on me or because they don't support modules that I need (to be
frank, it was so long ago that I figured out I couldn't use them that
I don't remember why anymore), and I can't use the 2.6.x Red Hat
kernels because every time I boot my system with them, it locks up
hard, sometimes within seconds and sometimes within days but never
more than a few days after I boot.  I have a long-standing open bug
about this which Red Hat has been unable to do anything about.

In contrast, with recent stock 2.4.#-pac# kernels I can run for weeks
without any lockups (I do occasionally look up because of an ide-scsi
or osst issue I haven't bothered to troubleshoot because it happens
rarely and because both the ide-scsi and osst code have changed
significantly in 2.6.x so it's not obvious that my troubleshooting
would be useful.

So by saying you're not going to make it possible to use Red Hat's db4
packages on non-NPTL machines, you're essentially saying that you will
no longer support my hardware, despite the fact that it is hardware
that certainly should be supported (SuperMicro S2DGU motherboard with
dual 550Mhz Pentium III Katmai CPUs).

I'd be happy to stop making a fuss about this ticket if Red Hat could
just fix bug 126936 so I could actually use a current 2.6.x kernel.  I
am willing and able to provide any information requested of me, and to
perform any steps requested of me, to help debug that problem, but so
far it does not seem like anyone has devoted any effort to figuring
out what information should be requested or what steps I should be
asked to perform.


Comment 81 Aleksandar Milivojevic 2004-11-15 00:09:52 UTC
I've looked a bit more into my machine after writing my last comment,
and re-read some previous comments.

What I currently have is:
   Intel Pentium MMX 200MHz processor
   CPU flags (/proc/cpuinfo): fpu vme de pse tsc msr mce cx8 mmx
   glibc-2.3.3-27.1.i386.rpm
   kernel-2.6.8-1.521.i586.rpm

When using stock db4 packages, Cyrus doesn't work.  When using Radu's
patched packages, Cyrus works.

By reading previous comments, the problem is that above does not give
me NPTL enabled environment.  The question is, which component is
problematic here?  glibc?  kernel?  both?  Leonard wrote in his
comment that for NPTL, tsc and cmov must be present in CPU flags (I'm
missing cmov).  Why is this the case?  I always thought that support
is first made for generic architecture, and than optimized for
greatest and latest in CPU world (otherwise, we end up with this kind
of problems that we have now).  Would it be too hard and/or time
consuming to implement it without using those two (and any other not
common on all i386 or all i586) in i386/i586 versions of RPMs?

Radu wrote that glibc from Fedora Core 3 has NPTL support for i586
(and currently unsupported i486, but not on generic i386).  Does this
mean that if I upgrade my current install to Core 3, the problem would
go away?  Would backport of this support to Core 2 solve the problem?
 Would we need backport of support in kernel (if it is needed)?  If
fixing glibc alone would work, and if backport wouldn't take too much
time, could this be done?

If the problem is fixed in Core 3, yeah sure, mark this bug as fixed,
closed, wontfix, whatever.  However, if it isn't, than it should
really be solved first.  I'll be upgrading my MMX machine to FC3, so
I'll soon find out.  Probably not next week, so I guess I'll miss
Jeff's deadline.

Yeah, I know.  Lot of questions (if anybody has time to answer them),
couple of personal opinions (if anybody cares to read them), and no
solutions.

Comment 82 Toshio Kuratomi 2004-11-15 01:49:46 UTC
Re Comment #75:
  Will db4 compiled as you've proposed work with non-NPTL kernels?
No.  However, this means vanilla 2.4 kernels only.  RH 2.4 and all 2.6
kernels have NPTL.  Since FC2+ is 2.6 based and FC1 is already EOL,
the kernel part of the equation seems irrelevant for discussions of
what to do about the bug in FC4.

  Will db4 compiled as you've proposed work with non-Intel processors?
Yes.  NPTL will work on any machine with an i486 or greater
instruction set.  The reason previous versions have not worked on
anything less than an i686 is that there was no i486+ glibc package. 
Only a i386 and a i686.  So i386, i486, and i586 machines were all
lumped together with a non-NPTL glibc.  In FC3, the "i386" glibc
package is compiled with i486 instructions so that NPTL can be
enabled.  This means FC3 is fixed for everything except i386 (which is
not supported according to the release notes and was discussed at
length on fedora-devel.)

Re Comment #80
I think fixing bug 126936 is the way to go.  FC1 has been EOL'd which
means there isn't any supported Fedora Core that shipped with a 2.4
kernel (let alone a non-NPTL 2.4 kernel).  Kernel 2.6 is the way
forward and fixing bugs there is more useful for the development of
the distribution than disabling posix-mutexes in FC4 where running a
2.4 kernel is enough of a hack that it almost qualifies as a separate
distro.  If you have to run a 2.4 kernel from outside the distro until
2.6 runs on your hardware, you can run a non-distro db4 package with
disabled NPTL as well.

Re Comment #81
I just installed FC3 glibc and dependents (glibc*i386.rpm libselinux*
nscd* nptl-devel*... 11 packages) on my FC2 AMD-K6 (Same cpuflags as
your P-MMX).  It resolves this bug.  (Hoping nothing else bites me
until I do a full upgrade :-)

Comment 83 Aleksandar Milivojevic 2004-11-15 02:28:14 UTC
Toshio, thanks for your answer.  I guess solution from FC3 would be
good for almost everybody.  Might be good idea to put big flashy
warning in glibc spec file not to change compilation flags until NPTL
is backported all the way back to i386.

BTW, if glibc in FC3 is compiled with i486 instruction set, wouldn't
it be more correct (consistent) if glibc package names are *.i486.rpm?
 That glibc package is not going to work correctly on an i386,
shouldn't that be reflected in packages' architecture?

I'm not subscribed to fedora-devel.  Does decision to compile glibc
with i486 instructions means that all other packages in distribution
will soon follow (distro is rather unusable on i386, if glibc requires
at least i486 for distro to function properly)?

Comment 84 Jeff Johnson 2004-11-15 03:18:29 UTC
Re: #79 From Aleksandar Milivojevic (alex)

The term "supported" by Red Hat has little to do with
whether db4 continues to be compiled with --enable-posixmutexes.

I am only saying that this bug is gonna be closed WONTFIX early
in the FC4 release cycle so that there is sufficient time for
discussion. There are too many issues here to resolve the bug,
and bugzilla is not the forum for discussing whether I -- as a
Red Hat employee -- am telling you that Red Hat no longer supports
your hardware. It's a job mon, I fix bugs, and I attempt to supply
RFE's. In the case of db4 supporting NPTL, all I can say is that
-- with both RHEL3 already, and RHEL4 almost, and all of FC{1,2,3},
already deployed with db4 compiled with --enable-posixmutexes -- that
reverting to another form of behavior makes little sense to me.
And attempting both +/- NPTL in the same package adds a level of
complexity that is difficult (if not impossible) to meet user
expectations, which are more or less
    I want to add -ldb to my build and have my application use
    Berkeley DB everywhere and "work" always.
I would *love* to be able to tell you that there is a solution
that does that. There isn't.

But I'll see if I can't buy your vote by getting #126936 expedited ;-)



Comment 85 Jeff Johnson 2004-11-15 03:29:23 UTC
Re: #81 From Aleksandar Milivojevic (alex)

Yes, a quite complicated mixture, its' entirely unclear whether
to blame the kernel, glibc, db4, me or Red Hat, isn't? ;-)

If I were you, I would embed db4 into Cyrus, compile to the
least common denominator, i.e. non-NPTL and so without
--enable-posixmutexes. That would clearly make you happy.

That is the underlying issue in choosing how system-wide
db4 should be compiled. Is the glass half-full or half-empty?
Everyone has a different glass. I do believe that the most
featureful choice (i.e. with NPTL) is (and was) the best decision
for compiling db4.

Backporting NPTL all the way back to i386 is possible iirc, but
the result is pig slow and painful to use. That's what I believe,
anyways, there are better forums to get more accurate answers than
here in db4 bugzilla ;-)

Comment 86 Aleksandar Milivojevic 2004-11-15 14:45:17 UTC
Jeff, first thanks for your answers.  I'll solve my problem (db4/Cyrus
on Pentium MMX) by upgrading to FC3.  Until then, I'll simply use
Radus' patched db4 packages on FC2 (which will keep my glass half full).

As for NPTL on i386.  I'm no expert on i386 assembly (my assembly
hacking stopped somewhere around i8086 and MC68000 era), however I
don't see why it wouldn't be possible to make efficient implementation
for i386.  Anyhow, if NPTL is needed for fully working (Red
Hat/Fedora) system, even pig slow is better than not working.  The
only question I see is, is it worth the effort to implement?  Is i486
glibc (glibc-*.i386.rpm in FC3 is really i486 glibc, right? and it is
going to stay that way? is glibc for FC2 going to be updated in this
way?) what will make everybody happy, or is there anybody who really
needs "pure" i386 glibc with NPTL support?

Anyhow, if we have i586 and i686 kernels only, maybe it would make
sense to have i586 and i686 glibc only (or even pure i586 system).

Comment 87 Jeff Johnson 2004-11-16 13:21:52 UTC
The issue involves more than assembly language. The kernel
provides futexes, which are a a small (like 4 byte) piece
of memory that is shared between processes for locking.
futexes are very very lightweight and fast, unlike
alternative means to implement inter-process locks.

Shared posix mutexes unify inter-process locks and inter-thread
locks, permitting one "standard" locking scheme for Berkeley DB
for both inter-process and inter-thread locks.

There's nothing preventing glibc/i486 with NPTL from being
released as an update for FC2 in principle. Meanwhile, I'm
pretty sure (but have no i486 and so cannot check) that the FC3
glibc can probably be installed on FC2. As always, glibc upgrades
should be approached carefully.




Comment 88 Aleksandar Milivojevic 2004-11-16 21:27:37 UTC
Jeff, thanks for taking time to clarify why there's no i386 NPTL glibc.

Anyhow, one thing keeps bugging me: if releasing i486 glibc with NPTL
for FC2 would fix this bug, why not do that and close it as
fixed/resolved/whatever?  Seems relatively simple and straightforward
to me.  Or am I too naive (wouldn't be the first time ;-) )?

Comment 89 Jeff Johnson 2004-11-16 21:50:56 UTC
Not too naive, NPTL is a quite complicated deploy ;-)

All of glibc and kernel and application and run-time environment (i.e.
don't use LD_ASSUME_KERNEL) and internal featureset (i.e. "i386"
no longer means ix86) are peices of the puzzle.

Now that the i486 kernel/glibc packages support NPTL in FC3, there
is an acceptable solution for almost everyone, with a few
gotchas like
   a) still no NPTL solution for *exactly* an i386 yet.
   b) still some problems that block deploy of NPTL on certain
   HW (that's my read of #126396).

But I trust that the remaining problems will be dealt with
to everyone's satisfaction.

No matter what, db4 just uses a wonderful technology -- NPTL --
not anything else, so this ain't really the best bug to suggest
alternative non-NPTL solutions for, say, kernel problems.

Nor do I think that db4 should be subject to the lowest common
denominator, i.e. built without shared posix mutexes, because
shared posix mutexes unify thread and process locks, and I suspect
that is going to be needed more and more by, say, java. So far,
db4 has been built to prefer inter-process locks over inter-thread
locks, but java is going to change the application mix imho.

So WONTFIX in a couple more days, and early, rather than late,
in the FC4 release cycle so that discussion is possible is my
goal here.


Comment 90 Jeff Johnson 2004-11-17 13:10:48 UTC
As warned, WONTFIX closure. If that is not satisfactory,
then by all means, discuss on fedora-devel.

Comment 91 taj 2005-02-23 22:00:23 UTC
This bug has regressed sometime in Febuary 2005.  rpm is showing the
same problem

x86_84
2.4.21-4.ELsmp
glibc-2.3.4-10
rpm-4.4.1-2
db4-4.3.27-1

(yes a strange combination)

I'll just leave the bug closed as WONTFIX but thought I'd drop a note
as things did work up until this last userspace look at rawhide.

Backing up to 4.3.3-8 returned functionality.



Comment 92 Naadir Jeewa 2005-04-10 16:16:39 UTC
The NPTL problem is going to affect more than just pure i386 systems, i.e. Xen
and UML.

Comment 93 Pedro Lamarão 2005-07-10 20:14:49 UTC
With the attached patch, Fedora Core 3 works normally under a UML kernel.
It was tested in a Linode: http://www.linode.com/

The diff is much smaller than the one already available in this bug, and has
three major modifications:

- it empties the three variables set in the top of the spec file; I couldn't
build it with java_arches so it was emptied too;
- it adds a line to mkdir the top-level lib/; this specfile does NOT build in
non-NPTL arch;
- it adds the configure option --with-mutex=x86/gcc-assembly to explicitly not
use NPTL mutexes.

I suspect this spec could/should be tweaked to be able to build Java, and
perhaps to choose a better alternative mutext implementation than x86/gcc-assembly.

Comment 94 Pedro Lamarão 2005-07-10 20:18:28 UTC
Created attachment 116576 [details]
Patch against FC3 db4.spec

This patch adds the configure option --with-mutex=x86/gcc-assembly

Comment 95 Aleksandar Milivojevic 2005-07-14 00:05:13 UTC
Hmmm...  But this is not needed on FC3.  Both glibc and kernel in FC3 have NPTL
support.

BTW, has anybody attempted backporting glibc fix from FC3 to FC2?

Comment 96 Michael Smith 2006-09-11 17:34:38 UTC
It seems this bug is still coming up for some. I experienced it with CentOS 4
(2.6.16) under Xen. The above patch again fixed it though. I have working RPM
for db4-4.2.52-7.1 (current CentOS) if anyone needs it.


Note You need to log in before you can comment on or make changes to this bug.