From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Description of problem: We have a fairly unusual setup. We have several diskless PCs which mount their root file systems over NFS (using nfsroot). Each machine has a separate complete Fedora Core installation on the NFS server (which is a also a FC2 machine), cloned from a separate installation. We can run rpm/yum on the clients, and they behave like "real" installations. This worked very well with RedHat 7.3. Using FC2 we're seeing lockups and corruptions with rpm on these client systems. If you repeatedly run "rpm -qa" on the machine, then the /var/lib/rpm/__db.* files become corrupted: [root@xs13 tmp]# /bin/rpm -qa rpmdb: Locker is not valid error: db4 error(22) from db->close: Invalid argument error: cannot open Packages index using db3 - Invalid argument (22) error: cannot open Packages database in /var/lib/rpm no packages Sometimes rpm hangs with 100% CPU with no disk access. Sometimes it just crashes: [root@xs13 tmp]# /bin/rpm -qa ethtool-1.8-3.1 libstdc++-3.3.3-7 fedora-release-2-4 pcre-4.5-2 shadow-utils-4.0.3-21 zlib-1.2.1.1-2.1 grep-2.5.1-26 procps-3.2.0-1.1 Segmentation fault (core dumped) [root@xs13 tmp]# /bin/rpm -qa rpmdb: Lock table is out of available locks error: cannot open Pubkeys index using db3 - Cannot allocate memory (12) ethtool-1.8-3.1 [have to kill with -9 here] Is this an NFS problem, or can rpm simply not run over NFS? This is a shame as it's very nice to have diskless FC2 systems. One hack is to use a script to call rpm instead: ------------------------------------------------------- #!/bin/bash # short program to get around problems with rpm database on nfs root /usr/bin/lockfile /tmp/rpm-lock /bin/rm -f /var/lib/rpm/__db.* /bin/rpm.original "$@" /bin/rm -f /tmp/rpm-lock ------------------------------------------------------- Alternatively, perhaps putting the __db files on a memory disk might be a good idea, though I don't know how to do that. Version-Release number of selected component (if applicable): rpm-4.3.1-0.3 How reproducible: Always Steps to Reproduce: 1. Run FC2 over nfsroot 2. Run rpm -qa lots of times 3. Additional info:
Berkeley DB requires locks, which are quite painful to insure are transported reliably over NFS. See analysis at http://www.sleepycat.com. Basically, you're lucky that anything worked at all with an rpmdb across NFS, nothing else, with any version of rpm that uses any version or Berkeley DB. You have two obvious choices if you wish to pursue a remote rpmdb: a) make absolutely sure that fcntl is "POSIXLY compliant" across NFS (that is a whole different can-of-worms than what rpm+bdb "support"). At a minimum, you will need to recompile rpm (which has internal Berkely DB) to use fcntl rather than posix mutexes for locking. I can tell you what to do, but I cannot "support" fcntl locking in rpm as default, as shared posix mutexes to permit unified thread/process locks within rpm are a far more important feature for rpm imho. b) consider using sunrpc rather than NFS to access the remote rpmdb. See the berkeley_db_svc documentation at www.sleepycat.com. I know that sunrpc "works" for me anyways, at least for O_RDONLY access, pretty reliably, and I've tried a few O_RDWR installs as well. Again, I cannot "support" this functionality in rpm as there's a slew of known problems with sunrpc, not the least of which is that there are known security risks. But I will happily tell you what to do. I suspect that this pathway will require you to do a custom rpm build as well, as I only look at sunrpc in rpm every 9 to 10 months, and there's always something that needs fixing. I do not believe that you can solve the locking issues adequately with a dot-file-lock approach like sendmail uses wrapping rpm. And rpm will almost always segfault when inconsistent (because of concurrent access of two operations, one writing, the other reading) data is returned, leaving behind other baggage that needs to be cleaned up in __db* files. Shared memory is possible through rpm configuration instead is possible, but does not solve the locking problem adequately, because lock state needs to be transported to multiple clients, and only posixly correct fcntl locking or sunrpc to single server are known to be sufficiently general solutions. So name your poison please ;-)
Thanks for the useful answer. Only one client has access to the rpm database - an entire installation is replicated for each client. Would that make shared memory a possibility? I'd like to try out the fcntl locking approach if it's not too difficult for you to tell me what needs modifying. Looking at the rpm, source do I have to get configure to tell Berkeley DB to use the UNIX/fcntl locking? Should I modify the configure script in the RPM source to use --with-mutex=UNIX/fcntl instead of --enable-posixmutexes? If that doesn't work I can go down the rpc approach. Is it relatively simple to tell rpm to use a particular server and directory, rather than /var/lib/rpm?
If /var/lib/rpm is per-client, then you probably have some other choices that will work as well. Here's the line from /usr/lib/rpm/macros that controls most of the easily reconfigurable rpmdb options: %__dbi_cdb create cdb mpool mp_mmapsize=16Mb mp_size=1Mb Put a copy of that line in /etc/rpm/macros on whatever client you want to experiment with. The first thing to try is to disable all locking by doing %__dbi_cdb create in the line above. That should disable almost all the locking and baggage, and might be useful if you can check that rpm installs/upgrades/erases are always run as if the client were "single user", shared read access on unchanging data is always possible. Using sunrpc is perhaps a little easier to describe than how to rebuild rpm with fcntl locking (yes, basically --with-mutex=UNIX/fcntl is what is needed, but there's something else that needs doing too) Here's the config for sunrpc access on the client: %__dbi_cdb create cdb mpool mp_mmapsize=16Mb mp_size=1Mb client server=localhost (that all should be one line as above). To start the server, try doing rm -f /var/lib/rpm/__db* /usr/lib/rpm/rpmdb_svc -v -h /var/lib/rpm Then on the client, add the config to /etc/rpm/macros, and try a query like rpm -q popt The server should display something like: # /usr/lib/rpm/rpmdb_svc -v -h /var/lib/rpm Added home rpm in dir /var/lib Running recovery on /var/lib/rpm /usr/lib/rpm/rpmdb_svc: Ready to receive requests Closing dbp id 1102034693 Closing dbp id 1102034692 Closing env id 1102034691 You might have to adjust paths a bit, as both server and client should have the same path to the database iirc (see the sleepycat doco for the options, and setup, rpmdb_svc is exactly berkeley_db_svc) Does one of those solutions "work" for you?
Created attachment 107810 [details] Patch to use fcntl locking rather than posix-mutexes in rpm This patch worked for rpm built from CVS a couple weeks ago. Make the equivalent change to the rpm.spec, and apply the patch, if you want to use rpm to build rpm packages.
I tried the "%__dbi_cdb create" approach, but it still corrupts the rpm database after a while. I don't know why. The rpmdb_svc idea seems to work fine (running the server on the machine which actually has the disk with the database, on the client it corrupts the database again). Security is the slight problem here as the service takes a random port number, so it's hard to firewall. However the daemon takes a "-P password" option. Unfortunately there doesn't seem to be a way (glancing at the source) to tell rpm to pass a specific password to the server. Could this be added to rpm easily? I haven't tried the fcntl patch yet. I did try using the "fcntl_locking" flag on __dbi_cdb, but this has no apparent effect.
create has no locking whatsoever. if you cannot control for "single user" installs and upgrades through other means, then don't use that. (guess) the password is likely for an AES encrypted invironment rather than a client/server protocol challenge. I can probably add that rather easily if that is what is needed, haven't looked. The port used by the sunrpc service probably can be made more predictable for firewall rules too.
When I was using create, it managed to corrupt the database with just a single user, using rpm sequentially. It would be nice to have a port option on the server, and possibly a password option on rpm (if it's a client/server challenge). However, I've written a nasty script to grep the /usr/sbin/rpcinfo output after starting the rpc server, and construct an appropriate set of iptable rules to block access to the port from the external world. I haven't been able to break the rpm rpc setup yet :-) It all looks very nice.
If you are happy with sunrpc, well, so am I ;-) There's a couple minor issues that you may notice eventually. Because the environment (i.e. the __db* files) is opened opaquely with DB_JOIN by rpm, there's no way (well there is a way, I've jest been too lazy to implement) to tell the current values of certain flags like INIT_CDB. This shows up as a noisy (but afaict harmless) message on both client and server. Here's the server msg seen while running rpmdb_svc -v -h /var/lib/rpm ... /var/lib/rpm: illegal flag specified to DB->cursor and something similar on the client side. I can/will perhaps attempt to support sunrpc through reconfiguration within rpm, as the fix is just a couple lines of code in rpmdb/db3.c to not set a CDB specific flag (WRITECURSOR? I fergit ...)
I'm seeing some problems with the rpc server. Sometimes I get messages from the clients like: /etc/cron.daily/yum.cron: rpmdb: Berkeley DB: RPC: Timed out error: db4 error(-30993) from dbcursor->c_get: DB_NOSERVER: Fatal error, no RPC server error: error(-30993) getting "Ã_a_ÂÃs~I~\´Ã^_æÃ^?|o^H$$^P^H(&ãþsh ^H" records from Filemd5s index I think the server sometimes gets overloaded, and fails. I'm using one server for 32 machines, so this may be a problem. I tried the fcntl patch, but I think there's a problem with it: [root@xs6 rpm]# rpm -qa rpmdb: Berkeley DB library configured to support only DB_PRIVATE environments error: db4 error(22) from dbenv->open: Invalid argument error: cannot open Packages index using db3 - Invalid argument (22) error: cannot open Packages database in /var/lib/rpm no packages I suspect this is to do with the #if 0 commented out part in the patch.
No, the #if 0 is absolutely critical and correct. But almost certainly there's more that needs looking at with the fcntl patch. Yes, locking is per cursor, so a loaded server with multiple requests through sunrpc will occaisionally "cross cursors" (i.e. have a deadlock) and the error message is becaus DB_WRITECURSOR is not the right flag to pass when opening a dbcursor against a sunrpc remote database. AFAIK, the message is harmless however. And I know the fix, the test is a bit trickier only because a proper (imho) db client is supposed to know as little as possible about the server db configuration, and so there is no way in the BDB API to test an dbenv for how it was opened originally, see details at sleepycat, look for DB_JOIN. Meanwhile, the bits are in the dbenv structure, all that is needed is to test, and not pass DB_WRITECURSOR if sunrpc is in use.
Retooling this bug as an RFE for sunrpc access to a remote database.
Fedora Core 2 is now maintained by the Fedora Legacy project for security updates only. If this problem is a security issue, please reopen and reassign to the Fedora Legacy product. If it is not a security issue and hasn't been resolved in the current FC3 updates or in the FC4 test release, reopen and change the version to match.
Basic RPC support using db4 is in upstream RPM, Fedora does not utilise this functionality. Issues with this functionality should be discussed on rpm-list or on rpm-devel-list.