Description of problem: db4 should never be compiled with '--enable-posixmutexes' because it causes too much problems which can not be solved. The current (posix-enabled) db4 does not work with non-NPTL kernels and so packages like subversion can not operate on systems with vanilla kernels or RHL kernels for i[345]86 architectures. Version-Release number of selected component (if applicable): db4-4.1.25-1
4.1.25-12 has this
Fixed in current builds, have both nptl and non nptl db.
I still get | svnadmin create /tmp/x1 | svn: Berkeley DB error | svn: Berkeley DB error while creating environment for filesystem /tmp/x1/db: | Invalid argument -------- db4-4.1.25-14 subversion-0.32.1-1 glibc-2.3.2-101
Created attachment 95622 [details] strace dump I forgot: vanilla 2.4.22 kernel on an i686
Created attachment 95736 [details] simple testcase $ gcc locktest.c -l db -o locktest $ mkdir .dbtest $ rm -f .dbtest/* ; ./locktest open: Function not implemented $ rm -f .dbtest/* ; LD_ASSUME_KERNEL=2.2.5 ./locktest open: Invalid argument
*** Bug 109922 has been marked as a duplicate of this bug. ***
109922 is another reporter running an i586 kernel, hence using /lib/libdb-4.1.so - Nalin, I thought this was supposed to be Really Fixed Now Once And For All?
Created attachment 95993 [details] my spec with which it works I've got it working. I edited the spec file as in the attachment (i added --disable-posixmutexes, maybe some other changes, i do not remember, i played with it too much :) Also, i renamed /lib/tls to /lib/tls.zal (zal as zaloha - backup in english) and /usr/lib/tls to /usr/lib/tls.zal Iw works!! (2.4.22 kernel patched with grsec etc. - no NPTL)
Here's the locktest.c results w db-4.2.52 includu=ing +/- nptl libraries and with the kernel abi-note: $ cc -o t -I/usr/include/db4 -ldb-4.2 t.c $ mkdir .dbtest $ ./t $ rm .dbtest/* $ LD_ASSUME_KERNEL=2.2.5 ./t open: Invalid argument
*** Bug 86381 has been marked as a duplicate of this bug. ***
Note 2.6.0 kernel though ...
*** Bug 112159 has been marked as a duplicate of this bug. ***
Seems that it's significant bug when duplicate bugs are appearing often, doesn't it? So what about solving it? I posted working .spec, so where's the problem?
@ comment #9: same with db4-4.2.52-1: $ rm -rf .dbtest; mkdir .dbtest; ./a.out open: Function not implemented $ rm -rf .dbtest; mkdir .dbtest; LD_ASSUME_KERNEL=2.2.5 ./a.out open: Invalid argument $ uname -sr Linux 2.4.23ensc-2
I've successfully compiled db4 using Tomas's spec file and it creates subversion repositories just fine now. :) My specs: Linux h24-69-77-88 2.4.22-1.2115.nptl #1 Wed Oct 29 15:20:17 EST 2003 i686 i686 i386 GNU/Linux running vanilla (stable updates only) fedora core 1
I'm disappointed this hasn't gotten more attention. I think there is a non-intel processor problem here. Cyrus won't run on non-intel processers with the supplied nptl-enabled db4 (tested on VIA C3 and Athlon XP). Tried stock Fedora kernels OR 2.4.{22,23,24}. Works fine with db4 compiled w/o nptl threads. On an Intel P4 laptop (Dell 5150), cyrus ran out of the box just fine. An Athlon 3000+ system originally (months ago) installed and up2dated just fine. I wiped it clean and started fresh, and with a bunch of updates now, up2date will just hang in a variety of spots. Installing the non-nptl db4 made everything happy. I widdled down the changes to the spec required to get no_nptl version of db4. Will attach it.
Created attachment 96883 [details] patch to db4-4.1.25-14 db4.spec to disable nptl threads (and resulting hangs :-)
Thanks for that patch. Might come in handy when I want to make rpm running on my Cyrix P166 (bug #103078).
You might want to chop the whole fourth hunk from the patch as it is a result of adding unnecessary spaces only.
Would someone be so kind to post some RPMs with the new spec?
*** Bug 112673 has been marked as a duplicate of this bug. ***
Got hit by this bug too, Fedora Core with db4-4.2.52-1 and vanilla 2.4.24 kernel; tried multiple versions of subversion; solved for one machine by upgrading to kernel 2.6.1; unsolved for the other (because 2.6.1 is still broken http://bugme.osdl.org/show_bug.cgi?id=1855)
RPMS here... http://tomi.nomi.cz/download/db4-no-nptl/
I have installed db4 from the RPMS that Tomas Janousek and I am getting: [root@kbreit kbreit]# svnadmin recover /web/svn/ Acquiring exclusive lock on repository db. Recovery is running, please stand by... Recovery completed. svn: Berkeley DB error svn: Berkeley DB error while opening 'uuids' table for filesystem /web/svn/db: Invalid argument This seems to be the same problem. Is it?
Hmmm :( But that RPMS work for me (2.4.24 + some patches)
It seems to be there was a corrupt db file. I redid the repository and it works. I take back my old comment.
@Thomas Janousek: These are RPMS of db4-1.25 but Fedora currently comes with and depends on db4-4.2.52
To Thomas Zehetbauer: sure about it? yum update; rpm -q db4 shows 4.2.25, doing any mistake?
Does 4.2.25 have the same posix-mutex/NTPL support issues that we've been discussing here?
Thomas Janousek: No, rpm -q db4 shows db4-4.2.52-1 mrproper: It seems that the current db4-4.2.52-1 build has the same NPTL issue; strace shows that set_thread_area(...) fails; svnadmin says "svn: Berkeley DB error while creating environment for filesystem svn-test/db: Invalid argument"; the problem goes away with kernel 2.6
To: Thomas Zehetbauer Using Fedora Core 1, having db4-4.1.25, understood?
Does Comment #30 mean db4 will work correctly on i[345]86's with the latest development db4 _and_ kernel 2.6 or that one problem has been solved but other(s) remain (as in Thomas Zehetbauer's original Comment #22)?
Okay, so here are some dumb questions that may be useful or may be barking up the wrong tree: 1) Is the purpose of /lib vs /lib/tls to provide libraries for linking on non-NPTL and NPTL enabled kernel revs/architectures? 2) Would simply adding --disable-posixmutexes when building the dist/non-tls version of db4 while leaving posixmutexes enabled for the dist/tls version make things work for everyone? Related question: 1) Does the kernel NPTL really not work for i586 and lower architectures? Someone reported that he had success with the svnadmin create command on his machine after recompiling _glibc_ to include NPTL on his i586. Has he created bugs in his glibc thread implementation or is NPTL support available in the kernel for more architectures and just needs to be enabled at the glibc level?
*** Bug 115306 has been marked as a duplicate of this bug. ***
I have just come across this bug also, using Perl CPAN module BerkeleyDB. The module worked OK for me on one machine, but not on another (both identical FC 1 machines using db4-4.1.25-14 (or more precisely db4-devel-4.1.25-14, as that is what the CPAN module linked against). The only difference I can see in the machines is that one is a Genuine Intel Processor machine, and one is a VIA C3 processor. Not sure how this can be relevant, but it is the only diff I can see. Both are running FC1, and both had all updated installed thet were available at the time of this post. Looking at the "ldd -V" outputs I can see that they are different (supplied as attachments). That is about the limit of my fault finding ability for this sort of problem. If I can do any other testing to help please let me know. I installed Tomas Janousek's RPMS from the link supplied and my BerkeleyDB module sprang into life, so (for me) his RPMS fix the problem. Hope this info helps ...
Created attachment 98423 [details] output from 'ldd -v /lib/libdb-4.1.so' on machine that BerkeleyDB fails
Created attachment 98424 [details] output from 'ldd -v /lib/libdb-4.1.so' on machine that BerkeleyDB works on
Andrew, The CPU architecture most probably is relevant, because of the supported instructions and registers (also see bug 103078). Do a $ cat /proc/cpuinfo and look for "tsc" and "cmov". Any of these missing? If so disable all occurences of --enable-posixmutexes (see "comment #0").
Leonard, my RPMS do exactly that - disable posixmutexes.
Leonard, You are correct about the missing flags: Intel CPU flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse Via CPU flags: fpu de tsc msr cx8 mtrr pge mmx 3dnow The cmov flag is missing from the VIA CPU flag set. I guess my question is whether there is any advantage in having posix-mutexes enabled in the default package, because of problems people will get when trying to use programs like Subversion etc on non-Intel CPU's. FYI I had the 2.6.0 kernel installed on the VIA box before I reported this bug, and that was the kernel version under which I first saw the bug. A part of the bug tracing process I downgraded to the 2. 4.22-1.2129.nptl kernel. So if it is of any use I can say that the "official" db4 packages did not work on my VIA processor machine under either 2.6 or 2.4.ntpl kernels. Since I have access to both types of CPU's and can install 2.4, 2.4.ntpl, and 2.6 kernels is there any testing you would like me to do to see if we can get one pacakge that works with all 3 kernels, and on both CPU types ?
After some googling, bugzilla searching, and package testing I think that glibc is a partial culprit in this mess. It seems that i486 and above processors should be capable of handling NPTL, but the i486=>i586 (and some non-Intel processors) get the i386 glibc rpm instead of the i686 rpm. Therefore they're getting a glibc that can't handle NPTL while their processors are capable of it. Here's what I did to test: downloaded the 101.4 glibc SRPM. Edited a few macros at the top of the spec to target the i486 for NPTL and TLS. rebuilt the rpm with rpmbuild -ba --target=i486 glibc.spec Installed the rpms on my K6-MMX (with tsc but without cmov) and viola! svnadmin create works. I don't have an i486 around to test on. Could other people try these out and see if it works on their processors? (But be careful to have some statically linked recovery methods worked out -- if the glibc doesn't work for you, there could be some pretty nasty recovery situations.) (And be sure to have the standard db4, not the nptl disabled version otherwise the test won't prove much.) RPM location: http://www.tiki-lounge.com/~toshio/nptl/ As rpm no longer runs on i386 (Bug #103078) I'd like to see glibc compiled for i486 be the least common-denominator target instead of i386. (I ran across references to Debian and SuSE doing this in my googling but haven't checked out the package archives to see what exactly this means.) This way we have a glibc which can perform these functions across all platforms (Correct me if I'm overly optimistic). It's a bit too hard for me to generate new glibc packages every time I want to upgrade as it's a tremendous time and disk hog (~2GB of HD space.) And I don't understand what all the *arches macros in the spec file mean. I would be much relieved if someone who better understood glibc were to take care of targetting the proper CPUs in the proper macros. So whaddya say? Retarget this bug for glibc and make the next release with another binary architecture?
Created attachment 98829 [details] glibc.spec diff against 101.4 Complete spec is on the website mentioned above. Have not tested the k6 additions yet, only the i486 ones.
I can verify this bug on Fedora 1 (testing) on my laptop, which is a Transmeta-based Fujitsu P-2110. I have the i686 glibc and DB 4.1.25-14 gives me the DB_PRIVATE error. I tried Tomas's RPMs and modifying the SPEC myself to just add --disable-posixmutexes, but that didn't work for me. I found success by building DB 4.2.52 from the source tarball from Sleepycat, installing in /usr/local/db-4.2, and pointing my application at that. I didn't even specify a posixmutexes argument to configure, but let it sort it out. The only hitch was that db_dump185 wouldn't build. Unless 4.2 has some magic fix for this problem, I suspect that the RPM wizardry in the db-4.1.25 spec file is bogus for some configurations. My application btw is a Java app which uses DB's Java API. I haven't even attempted to point, say, rpm at the new DB install.
This is a very big problem. Cyrus iamp wont work with the original db4 rpms, i tried the proposed db4.1 worked fine with cyrus-imap but now i can not compile sendmail-8.11.7. I get incompatible type for argument 4 of indirect function call when trying to compile sendmail. I could downgrade to the 2.4.20 kernel, but that kernel doesnt recognize my southbridge (via 8237) and my hardrive wont work with DMA. Im so dissapointed. 5 years using linux and never saw something like this.
Regarding comment #42 and comment #43, I don't think rebuilding glibc patched to provide tls support for pre i686 is a (fully) proper solution. I did that (on a rh90 k6 box), and everything *seemed* fine for awhile, at least until I tried to use mozilla, which now crashes silently.
Rex: How does mozilla crash? Do you have any other debugging info? I'm using Mozilla right now without crashing so far but this is the first time since I rebuilt libc. I'll do my web browsing from there for a while to see what happens but if you could tell me how long I should try running it in order to make it crash on me I'd appreciate it. The %changelog lists several fixes to glibc's NPTL since rh90 but I don't know if one of those might have fixed things or not. Comment #43: What is the DB_PRIVATE error?
> how does mozilla crash? mozilla(-1.6) dies silently on startup. $ LD_ASSUME_KERNEL=2.4.18 mozilla works. $ mozilla doesn't.
I've got a fresh download of mozilla-1.6 from mozilla.org and my recompiled glibc-2.3.2-101.4 on FC1 and there's no crash on startup. Maybe one of the fixes between rh9 glibc and 101.4 fixed it?
Recompiling glibc is not an option for me since NPTL is a kernel issue too, and I have to use a vanilla kernel.
We all know that it is not needed to argue that you need to use vanilla because any Linux distro _MUST_ allow use of _AT LEAST_ vanilla kernel!
What about kernel 2.6? Will the NPTL in vanilla 2.6 coupled with a recompiled glibc work (for FC2)? If so, a glibc targetted for i486 would fix things for FC2. I see your point that FC1 NPTL is problematic, though. I agree that a db4 update would be appropriate (or for Fedora Alternatives to materialize.)
RPMs of non-NPTL db 4.2.52 at http://tomi.nomi.cz/download/db4-no-nptl/ (build from SRPM got for subversion 1.0.1 - thus for Fedora Core 1)
I used Tomi's latest RPMs (comment #52) and now I can create repositories without problems (using vanilla 2.6.3). Before that, I always got "invalid argument", even using db42 which I build myself without NPTL support, by removing everything from %{nptl_arches} and %{nptl_java_arches}. Thanks Tomi! Redhat, PLEASE STOP MESSING ABOUT WITH NPTL! It's not safe and not kind! Cheers, Chris.
I have this problem with FC 1.92 and cyrus-imapd: May 3 12:01:01 edwards sieve[20890]: DBERROR db4: Berkeley DB library configured to support only DB_PRIVATE environments May 3 12:01:01 edwards sieve[20890]: DBERROR: dbenv->open '/var/lib/imap/db' failed: Invalid argument May 3 12:01:01 edwards sieve[20890]: DBERROR: init() on berkeley The machine is a Compaq Deskpro 4000 with an Intel Pentium 233MMx (P55C? F00F bug), CPU flags are: flags : fpu vme de pse tsc msr mce cx8 mmx No CMOV flag listed. It has an i386 glibc, as expected (there are no i586 builds: # rpm -q --qf '%{ARCH}\n' glibc-common i386 Kernel: 2.6.3-1.96 Exporting LD_ASSUME_KERNEL=2.4.18 in /etc/sysconfig/cyrus-imapd does not fix the problem unfortunately. Is there any hope this will be fixed some time soon? I also have K6 machines which use i386 glibc's. And where is the problem exactly, in the i386 compiled glibc or in the kernel, or both? What needs to be recompiled to work around this problem?
Paul, please try following Tomi's instructions in Comment #52 to build a new DB4 RPM with NPTL disabled, and then install it, and see if it fixes your problem (it did for me).
I had to build db4-4.2.52-3.1 for RH 7.3, 8.0 and 9 in order to install subversion on some systems and I encountered the same problem. The solution for me was to patch db4's configure script to add an option, --disable-pthreadsmutexes, to be used when configuring the non-NPTL version of db4. The actual type of mutexes used on non-NPTL builds, POSIX/pthreads/library/private, is not good because this type of mutexes works only for applications that open the database in the DB_PRIVATE mode (only one process will access the database). The above logtest.c works with the non-NPTL version if the DB_PRIVATE flag is added to the open call. When --disable-pthreadsmutexes is used, db4 is forced to ignore all types of POSIX threads mutexes and use another type of mutexes, x86/gcc-assembly on x86. x86/gcc-assembly mutexes are used for example on RH 7.3 and RH 8.0. So, with the patch to the configure script and the spec file, POSIX/pthreads/library mutexes are used by the NPTL version of db4, while x86/gcc-assembly mutexes are used by the non-NPTL version (other gcc-assembly mutexes should be used on the other platforms). The resulting db4 rpm passed the tests (logtest.c above and "svnadmin create") on the following: RH 9: PPro (distribution i686 kernel, glibc i686 -> NPTL system), AMD K6 (distribution i586 kernel, glibc i386 -> non-NPTL system), Athlon (distribution kernel i686, glibc i386 -> non-NPTL system) RH 8.0 PIII (non-NPTL system) RH 7.3 PII (non-NPTL system) I did not completely rebuild the rpm on FC1 and FC2, only run a rpmbuild -bc to check that the correct type of mutexes is selected: FC2 test2: $ grep db_cv_mutex rpm/BUILD/db-*/dist/dist-*/config.cache rpm/BUILD/db-4.2.52/dist/dist-notls/config.cache:db_cv_mutex=${db_cv_mutex=x86/gcc-assembly} rpm/BUILD/db-4.2.52/dist/dist-tls/config.cache:db_cv_mutex=${db_cv_mutex=POSIX/pthreads/library} FC1: $ grep db_cv_mutex rpm/BUILD/db-*/dist/dist-*/config.cache db_cv_mutex=${db_cv_mutex=x86/gcc-assembly} rpm/BUILD/db-4.1.25/dist/dist-notls/config.cache:db_cv_mutex=${db_cv_mutex=x86/gcc-assembly} rpm/BUILD/db-4.1.25/dist/dist-tls/config.cache:db_cv_mutex=${db_cv_mutex=POSIX/pthreads/library} If the patches are accepted, could Red Hat push upstream the configure patch, after it is eventually cleaned/improved? Thanks!
Created attachment 99936 [details] add --disable-pthreadsmutexes option
Created attachment 99937 [details] rpm spec patch for db4-4.2.52-3.1
More testing revealed two problems with the resulting rpm when gcc-assembly mutexes are used for the non-NPTL version of db4. The cause of the problems is the way shared memory regions are implemented and used by db4. The size and layout of these regions depend on the mutex type. The shared regions are different between the two versions of db4 because the size of the pthread mutexes is different than the size of gcc-assembly mutexes. See file:///usr/share/doc/db4-devel-4.2.52/ref/env/region.html and db-4.2.52/dbinc/region.h for details. When the regions are created in the per-process heap memory, there are no problems because the memory is private to the process. When the regions are created in the system memory without file backing or system memory backed by the filesystem, we may encounter the first problem. It is obvious that applications using different type of mutexes should not access concurrently the same database environment. But the same application may use different mutex implementations (machine booted with NPTL kernel, then rebooted with non-NPTL kernel) or more applications using different type of mutexes access sequentially the same database environment. When the mutex type becomes different, the __db.### files used to back the shared memory regions become invalid and may cause the application to stall or crash. Example with the logtest.c program, on a NPTL system: $ rm .dbtest/* $ ./locktest $ ls -l .dbtest/ total 32 -rw-rw-r-- 1 machbuild machbuild 16384 May 6 21:47 __db.001 -rw-rw-r-- 1 machbuild machbuild 270336 May 6 21:47 __db.002 -rw-rw-r-- 1 machbuild machbuild 450560 May 6 21:47 __db.003 $ LD_ASSUME_KERNEL=2.4.1 ./locktest open: Resource temporarily unavailable The region files can be removed by applications when they shut down, with DB_ENV->remove, but probably few applications do this. So when the mutex type changes, the user has to manually remove the region files if the applications don't do this themselves. The second problem is smaller and easily worked around: db_stat, to display environment statistics, makes use of the db4 internal data structures describing regions and becomes dependent at compile time on the mutex type used by the library. It seems that the other binaries from db4-utils are not affected by this. So I made a wrapper script around db_stat that detects if the system is NPTL or not and uses the NPTL or non-NPTL db_stat. I will attach this wrapper script and an updated rpm spec file for review. I also built rpms for RH 9.0, FC 1 and FC 2 test 3, but I have no place to host them online. If someone has the place and interest to host them, please contact me. The long term solution to these problems seems to be an improvement to db4 to do runtime detection of available mutex types on Linux: if a private database environment is not requested, then it should try to use pthread mutexes on NPTL systems and gcc-assembly mutexes on non-NPTL systems. The regions should record the type of mutexes used, become independent on mutex size and allow on-the-fly conversion of regions and backing files from one mutex type to another (of course only if no other applications is using the database environment). What is the opinion of Red Hat and others? Is this solution with gcc-assembly mutexes for non-NPTL systems good enough? Is Red Hat waiting/working on a longer term fix as envisioned above, other type of fix, waiting for non-NPTL systems to become extinct? PS: the updated rpm spec file patch has two minor fixes: - cxx_common.h and cxx_except.h don't exist anymore - the %files sections were including the %{_libdir}/libdb.so, %{_libdir}/libdb_cxx.so etc. links, but these links have not been created I have read that Ulrich Drepper said the db4 should always be linked with -lpthread and I added it to the build function because in the case of gcc-assembly mutexes db4 does not link by default with pthread.
Created attachment 100059 [details] wrapper script around db_stat
Created attachment 100060 [details] updated rpm spec file for #59
*** Bug 124792 has been marked as a duplicate of this bug. ***
I have just upgraded a RedHat 9 box that has a AMD K6 300 processor to Fedora 2. I was using the subversions RPMS from David Summers (http://summersoft.fay.ar.us/pub/subversion/latest/) and before doing the upgrade I dumped all my repos. After the upgrade I can't create new repos: # svnadmin create test svn: Berkeley DB error while creating environment for filesystem test/db: Invalid argument I did try installing the RPMS from http://www.tiki-lounge.com/~toshio/nptl/ however I get: # rpm -Uvh /home/chris/src/db42-* /etc/security/selinux/file_contexts: No such file or directory error: Failed dependencies: db4 = 4.2.52 is needed by (installed) pam-0.77-40 db4 is needed by (installed) postfix-2.1.1-5.tls.fc2 db4-devel is needed by (installed) apr-util-devel-0.9.5-0.1 So I'm going to re-read this thread and see if I can work out what to do next... :-/
Radu, Is there any reason for having different types of mutexes for NPTL and non-NPTL systems? Why not use gcc-assembly on both and ditch the Posix ones completely as it was initially suggested? BTW, I'm recompiling db4 RPMs on FC2 using your patches as I'm writing this (kinda slow on 200MHz Pentium MMX). Hopefully Cyrus IMPAD will start working after this...
The POSIX mutexes are based on a standard while the gcc-assembly ones are not. And I suppose that NPTL is both the present and the future.
Chris, I've rebuilt db4 RPM's for Fedora Core 2 using (current) Radu's patches. It seems that everything is working fine with them (at least on my Pentium MMX). If you are interested, you can download them from: http://24.79.220.4/db4/ It's dynamically allocated IP that changes once or twice a year at most. I hope that Red Hat / Fedora folks will have offical fix before it changes next time ;-) Do something like "rpm -Uhv --replacepkgs --replacefiles" to install those that you need (replacepkgs is needed since you already have the same version installed, I'm not sure why replacefiles is needed).
Can someone post some simple instructions about how to fix the problem? I tried installing the db42 rpm posted above, but it was pretty clear that it didn't replace the existing db4 libraries as far as svn was concerened. I tried various combinations of uninstalling db4 before installing db42, but it didn't help. I don't know what to do with an "rpm spec". Help! Thanks!
Martin, have you tried installing db4 packages from my web site? See comment #66 for details. Those are the same as original db4 packages, built using patches from Radu that disable NPTL on architectures that do not support it. Even the version number is the same (might be good idea if it was different, but oh well).
I can verify that building new db4 packages with the db_stat_wrapper and db-4.2.52-disable-pthreadsmutexes.patch allows me to build svn repositories on a K6-2 machine now.
I could not get any of the above methods/processes to work on my system. I instead used the following: http://www.svnforum.org/forum/viewtopic.php?p=350#350 See more of the overview of this stuff at: http://www.svnforum.org/forum/viewtopic.php?p=350 -Matt
[quote="mattengland"]http://rpmfind.rediris.es/rpm2html/redhat-8.0-i386/ db4-devel-4.0.14-14.i386.html http://rpmfind.rediris.es/rpm2html/redhat-8.0-i386/db4-4.0.14-14.i386. html[/quote] Can anyone verify (officially or not) that these rpms are legitimate? So far I have seen no problems with my subversion system that uses them. -Matt
Oops, I should have edited this last post a little more thoroughly before posting, here's an update: From http://www.svnforum.org/forum/viewtopic.php?p=350#350 : http://rpmfind.rediris.es/rpm2html/redhat-8.0-i386/db4-devel-4.0.14-14. i386.html http://rpmfind.rediris.es/rpm2html/redhat-8.0-i386/db4-4.0.14-14.i386. html Can anyone verify (officially or not) that these rpms are legitimate? So far I have seen no problems with my subversion system that uses them. -Matt
Darnit, the above links to the rpmfind website still hard-wrap. Sorry, I know of no way around this for this this comment-posting engine. You'll have to hand-copy/cut-and-past the above URLs (or visit them from the snvforum.org link). Thanks for any help, -Matt
OK, it's approaching time to close this bug because it is attached to the FC4 tracking bug, and I believe it is better to make the choice early in the release cycle rather than later. The choice for RHEL and Fedora Core is going to be compiling with --enable-posixmutexes. That is what has been done for Red Hat distros since RHL 9, and so is likelier to provide a consistent, stable and predictably performing platform. So unless I hear compelling reasons otherwise, I'm going to close this WONTFIX mid-next week.
Jeff, can you please clarify exactly what will break if what you propose is done. This ticket is simply too long and has too many conflicting details in it for me to be able to figure it out. Will db4 compiled as you've proposed work with non-NPTL kernels? Will db4 compiled as you've proposed work with non-Intel processors?
Nothing will break, at least for Red Hat packages. I am basically just saying that db4 will continue to be built as it has since RHL9, because that's the best answer imho. The issue is that the "official" 2.4 kernel does not support NPTL. So db4 with --enable-posixmutexes as built by Red Hat will break if/when a kernel that does not support NPTL is booted. The behavior of db4 within applications can also change in surprising ways if/when LD_ASSUME_KERNEL is used inappropriately. You cannot simply bottom-line the problem and simply ask "work?". The issue with the reproducer has to do with a dbenv. The major applications that use a dbenv are rpm (which has internal db4 so that it can be built with or without posix mutexes to "work") and subversion, (which could internalize db4 similarly to rpm, but chooses to use system libdb instead). Other applications, like perl/python/ruby usually do not need dbenv locking. Shared posix mutexes as used by db4 cannot work without NPTL (which is a set of measure 0 for Red Hat built packages, db4 in AS2.1 is built differently). db4 compiled to use posix mutexes works fine on all platforms that support NPTL. In fact, from my experience with rpm, and from private mail from Sleepycat, and from the locking scheme rationale written up within db4 sources, db4 works better with posix mutexes than with other possible locking schemes, all platforms. So the issue is whether to compile for portability and lowest common denominator to minimize risk, or to continue with the known good (for platforms that support NPTL, that is all 2.6 kernels, and all Red Hat platforms except AS2.1 where db-4.0.14 is compiled differently) scheme used since RHEL9. No matter what, all Red Hat platforms since RHL9 have been compiled with --enable-posixmutexes, and any change is very unlikely to be backported and deployed everywhere. Changing the locking scheme at this point is hardly feasible, there's already a boatload of software that has db4 compiled with --enable-posixmutexes. So some decision has to be made, and I'm trying to make the decision early in the fc4 devel cycle rather than later so that discussion is possible. And yes, this ticket has way too many issues to sort out, the other reason to close. I encourage anyone who does have specific problems with Red Hat packaging to open a separate ticket.
db4 not working completely on non-NPTL x86 machines is the single issue in this ticket. Yes, db4 has been compiled with --enable-posixmutexes since RHL9, causing subversion, cyrus-imapd and maybe other applications to not work on non-NPTL x86 machines. Now glibc from FC3 made NPTL available on i486 and i586 too, leaving without NPTL support only i386 machines. If Red Hat says that non-NPTL platforms are not supported anymore, then the ticket could be closed. If not, a solution has to be found and implemented. Jeff, I don't know if you read it, but in #56 I offered an alternative to completely disable POSIX mutexes, as requested initially. The alternative is to have one single db4 rpm containing db4 with --enable-posixmutexes on NPTL platforms, and db4 with gcc-assembly mutexes on non-NPTL platforms.
Yup, db4+posixmutexes not working on non-NPTL is the issue here. And what I'm saying -- to make it perfectly clear -- is that this bug will be closed WONTFIX because no Red Hat platform is non-NPTL. Disabling posix mutexes is no option either imho, as that introduces yet another incompatible change variable into an already complicated puzzle. Adding Yet Another Build of db4 within the db4 package is possible, but adds another level of complexity to an already complicated problem. There are two other possibilities that you do not mention (but I suspect you know ;-): a) internalizing db4 in important applications like svn and httpd so that each application can choose whatever locking scheme suits it best. This in fact is what Sleepycat recommends many years now. b) building a separate package other than "db4" that compiles db4 appropriately for non-NPTL applications. Much of the content of this bug is coordinating a build of db4 that removes nptl, well jknown. I'll be happy to do b) and maintain outside of RHEL and FC if you wish. It's far easier to do that than to try to achieve consensus on how db4 should be built for all possible distros and kernels and glibc and applications and ...
Jeff, Looking at release notes for Core 3, they state that all Pentium (i586) class processors are supported. If that is the case, either NPTL should be backported to all Pentium class processors (which would automatically fix this bug), or applications and/or libraries should be fixed in such a way so that they don't utilize NPTL on older Intel processors. In later case, IMHO, we should either have db4 library that detects this during runtime, or we should have two separate packages (one for i586, and another one for i686). Either that, or change release notes for Core 4 to state that only Celeron/Pentium III (i686) class processors and newer are supported. I have a Fedora Core 2 machine that has (currently) officially supported Intel Pentium MMX processor. And I'm hit by this bug.
Jeff, I would love to be using an official Red Hat kernel with NPTL support so that this problem would just go away for me. The problem is that I can't use the 2.4.x Red Hat kernels either because they lock up on me or because they don't support modules that I need (to be frank, it was so long ago that I figured out I couldn't use them that I don't remember why anymore), and I can't use the 2.6.x Red Hat kernels because every time I boot my system with them, it locks up hard, sometimes within seconds and sometimes within days but never more than a few days after I boot. I have a long-standing open bug about this which Red Hat has been unable to do anything about. In contrast, with recent stock 2.4.#-pac# kernels I can run for weeks without any lockups (I do occasionally look up because of an ide-scsi or osst issue I haven't bothered to troubleshoot because it happens rarely and because both the ide-scsi and osst code have changed significantly in 2.6.x so it's not obvious that my troubleshooting would be useful. So by saying you're not going to make it possible to use Red Hat's db4 packages on non-NPTL machines, you're essentially saying that you will no longer support my hardware, despite the fact that it is hardware that certainly should be supported (SuperMicro S2DGU motherboard with dual 550Mhz Pentium III Katmai CPUs). I'd be happy to stop making a fuss about this ticket if Red Hat could just fix bug 126936 so I could actually use a current 2.6.x kernel. I am willing and able to provide any information requested of me, and to perform any steps requested of me, to help debug that problem, but so far it does not seem like anyone has devoted any effort to figuring out what information should be requested or what steps I should be asked to perform.
I've looked a bit more into my machine after writing my last comment, and re-read some previous comments. What I currently have is: Intel Pentium MMX 200MHz processor CPU flags (/proc/cpuinfo): fpu vme de pse tsc msr mce cx8 mmx glibc-2.3.3-27.1.i386.rpm kernel-2.6.8-1.521.i586.rpm When using stock db4 packages, Cyrus doesn't work. When using Radu's patched packages, Cyrus works. By reading previous comments, the problem is that above does not give me NPTL enabled environment. The question is, which component is problematic here? glibc? kernel? both? Leonard wrote in his comment that for NPTL, tsc and cmov must be present in CPU flags (I'm missing cmov). Why is this the case? I always thought that support is first made for generic architecture, and than optimized for greatest and latest in CPU world (otherwise, we end up with this kind of problems that we have now). Would it be too hard and/or time consuming to implement it without using those two (and any other not common on all i386 or all i586) in i386/i586 versions of RPMs? Radu wrote that glibc from Fedora Core 3 has NPTL support for i586 (and currently unsupported i486, but not on generic i386). Does this mean that if I upgrade my current install to Core 3, the problem would go away? Would backport of this support to Core 2 solve the problem? Would we need backport of support in kernel (if it is needed)? If fixing glibc alone would work, and if backport wouldn't take too much time, could this be done? If the problem is fixed in Core 3, yeah sure, mark this bug as fixed, closed, wontfix, whatever. However, if it isn't, than it should really be solved first. I'll be upgrading my MMX machine to FC3, so I'll soon find out. Probably not next week, so I guess I'll miss Jeff's deadline. Yeah, I know. Lot of questions (if anybody has time to answer them), couple of personal opinions (if anybody cares to read them), and no solutions.
Re Comment #75: Will db4 compiled as you've proposed work with non-NPTL kernels? No. However, this means vanilla 2.4 kernels only. RH 2.4 and all 2.6 kernels have NPTL. Since FC2+ is 2.6 based and FC1 is already EOL, the kernel part of the equation seems irrelevant for discussions of what to do about the bug in FC4. Will db4 compiled as you've proposed work with non-Intel processors? Yes. NPTL will work on any machine with an i486 or greater instruction set. The reason previous versions have not worked on anything less than an i686 is that there was no i486+ glibc package. Only a i386 and a i686. So i386, i486, and i586 machines were all lumped together with a non-NPTL glibc. In FC3, the "i386" glibc package is compiled with i486 instructions so that NPTL can be enabled. This means FC3 is fixed for everything except i386 (which is not supported according to the release notes and was discussed at length on fedora-devel.) Re Comment #80 I think fixing bug 126936 is the way to go. FC1 has been EOL'd which means there isn't any supported Fedora Core that shipped with a 2.4 kernel (let alone a non-NPTL 2.4 kernel). Kernel 2.6 is the way forward and fixing bugs there is more useful for the development of the distribution than disabling posix-mutexes in FC4 where running a 2.4 kernel is enough of a hack that it almost qualifies as a separate distro. If you have to run a 2.4 kernel from outside the distro until 2.6 runs on your hardware, you can run a non-distro db4 package with disabled NPTL as well. Re Comment #81 I just installed FC3 glibc and dependents (glibc*i386.rpm libselinux* nscd* nptl-devel*... 11 packages) on my FC2 AMD-K6 (Same cpuflags as your P-MMX). It resolves this bug. (Hoping nothing else bites me until I do a full upgrade :-)
Toshio, thanks for your answer. I guess solution from FC3 would be good for almost everybody. Might be good idea to put big flashy warning in glibc spec file not to change compilation flags until NPTL is backported all the way back to i386. BTW, if glibc in FC3 is compiled with i486 instruction set, wouldn't it be more correct (consistent) if glibc package names are *.i486.rpm? That glibc package is not going to work correctly on an i386, shouldn't that be reflected in packages' architecture? I'm not subscribed to fedora-devel. Does decision to compile glibc with i486 instructions means that all other packages in distribution will soon follow (distro is rather unusable on i386, if glibc requires at least i486 for distro to function properly)?
Re: #79 From Aleksandar Milivojevic (alex) The term "supported" by Red Hat has little to do with whether db4 continues to be compiled with --enable-posixmutexes. I am only saying that this bug is gonna be closed WONTFIX early in the FC4 release cycle so that there is sufficient time for discussion. There are too many issues here to resolve the bug, and bugzilla is not the forum for discussing whether I -- as a Red Hat employee -- am telling you that Red Hat no longer supports your hardware. It's a job mon, I fix bugs, and I attempt to supply RFE's. In the case of db4 supporting NPTL, all I can say is that -- with both RHEL3 already, and RHEL4 almost, and all of FC{1,2,3}, already deployed with db4 compiled with --enable-posixmutexes -- that reverting to another form of behavior makes little sense to me. And attempting both +/- NPTL in the same package adds a level of complexity that is difficult (if not impossible) to meet user expectations, which are more or less I want to add -ldb to my build and have my application use Berkeley DB everywhere and "work" always. I would *love* to be able to tell you that there is a solution that does that. There isn't. But I'll see if I can't buy your vote by getting #126936 expedited ;-)
Re: #81 From Aleksandar Milivojevic (alex) Yes, a quite complicated mixture, its' entirely unclear whether to blame the kernel, glibc, db4, me or Red Hat, isn't? ;-) If I were you, I would embed db4 into Cyrus, compile to the least common denominator, i.e. non-NPTL and so without --enable-posixmutexes. That would clearly make you happy. That is the underlying issue in choosing how system-wide db4 should be compiled. Is the glass half-full or half-empty? Everyone has a different glass. I do believe that the most featureful choice (i.e. with NPTL) is (and was) the best decision for compiling db4. Backporting NPTL all the way back to i386 is possible iirc, but the result is pig slow and painful to use. That's what I believe, anyways, there are better forums to get more accurate answers than here in db4 bugzilla ;-)
Jeff, first thanks for your answers. I'll solve my problem (db4/Cyrus on Pentium MMX) by upgrading to FC3. Until then, I'll simply use Radus' patched db4 packages on FC2 (which will keep my glass half full). As for NPTL on i386. I'm no expert on i386 assembly (my assembly hacking stopped somewhere around i8086 and MC68000 era), however I don't see why it wouldn't be possible to make efficient implementation for i386. Anyhow, if NPTL is needed for fully working (Red Hat/Fedora) system, even pig slow is better than not working. The only question I see is, is it worth the effort to implement? Is i486 glibc (glibc-*.i386.rpm in FC3 is really i486 glibc, right? and it is going to stay that way? is glibc for FC2 going to be updated in this way?) what will make everybody happy, or is there anybody who really needs "pure" i386 glibc with NPTL support? Anyhow, if we have i586 and i686 kernels only, maybe it would make sense to have i586 and i686 glibc only (or even pure i586 system).
The issue involves more than assembly language. The kernel provides futexes, which are a a small (like 4 byte) piece of memory that is shared between processes for locking. futexes are very very lightweight and fast, unlike alternative means to implement inter-process locks. Shared posix mutexes unify inter-process locks and inter-thread locks, permitting one "standard" locking scheme for Berkeley DB for both inter-process and inter-thread locks. There's nothing preventing glibc/i486 with NPTL from being released as an update for FC2 in principle. Meanwhile, I'm pretty sure (but have no i486 and so cannot check) that the FC3 glibc can probably be installed on FC2. As always, glibc upgrades should be approached carefully.
Jeff, thanks for taking time to clarify why there's no i386 NPTL glibc. Anyhow, one thing keeps bugging me: if releasing i486 glibc with NPTL for FC2 would fix this bug, why not do that and close it as fixed/resolved/whatever? Seems relatively simple and straightforward to me. Or am I too naive (wouldn't be the first time ;-) )?
Not too naive, NPTL is a quite complicated deploy ;-) All of glibc and kernel and application and run-time environment (i.e. don't use LD_ASSUME_KERNEL) and internal featureset (i.e. "i386" no longer means ix86) are peices of the puzzle. Now that the i486 kernel/glibc packages support NPTL in FC3, there is an acceptable solution for almost everyone, with a few gotchas like a) still no NPTL solution for *exactly* an i386 yet. b) still some problems that block deploy of NPTL on certain HW (that's my read of #126396). But I trust that the remaining problems will be dealt with to everyone's satisfaction. No matter what, db4 just uses a wonderful technology -- NPTL -- not anything else, so this ain't really the best bug to suggest alternative non-NPTL solutions for, say, kernel problems. Nor do I think that db4 should be subject to the lowest common denominator, i.e. built without shared posix mutexes, because shared posix mutexes unify thread and process locks, and I suspect that is going to be needed more and more by, say, java. So far, db4 has been built to prefer inter-process locks over inter-thread locks, but java is going to change the application mix imho. So WONTFIX in a couple more days, and early, rather than late, in the FC4 release cycle so that discussion is possible is my goal here.
As warned, WONTFIX closure. If that is not satisfactory, then by all means, discuss on fedora-devel.
This bug has regressed sometime in Febuary 2005. rpm is showing the same problem x86_84 2.4.21-4.ELsmp glibc-2.3.4-10 rpm-4.4.1-2 db4-4.3.27-1 (yes a strange combination) I'll just leave the bug closed as WONTFIX but thought I'd drop a note as things did work up until this last userspace look at rawhide. Backing up to 4.3.3-8 returned functionality.
The NPTL problem is going to affect more than just pure i386 systems, i.e. Xen and UML.
With the attached patch, Fedora Core 3 works normally under a UML kernel. It was tested in a Linode: http://www.linode.com/ The diff is much smaller than the one already available in this bug, and has three major modifications: - it empties the three variables set in the top of the spec file; I couldn't build it with java_arches so it was emptied too; - it adds a line to mkdir the top-level lib/; this specfile does NOT build in non-NPTL arch; - it adds the configure option --with-mutex=x86/gcc-assembly to explicitly not use NPTL mutexes. I suspect this spec could/should be tweaked to be able to build Java, and perhaps to choose a better alternative mutext implementation than x86/gcc-assembly.
Created attachment 116576 [details] Patch against FC3 db4.spec This patch adds the configure option --with-mutex=x86/gcc-assembly
Hmmm... But this is not needed on FC3. Both glibc and kernel in FC3 have NPTL support. BTW, has anybody attempted backporting glibc fix from FC3 to FC2?
It seems this bug is still coming up for some. I experienced it with CentOS 4 (2.6.16) under Xen. The above patch again fixed it though. I have working RPM for db4-4.2.52-7.1 (current CentOS) if anyone needs it.