Bug 141614

Summary: RFE: support for sunrpc access to remote rpmdb
Product: [Fedora] Fedora Reporter: Jeremy Sanders <jss>
Component: rpmAssignee: Paul Nasrat <nobody+pnasrat>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: herrold, mattdm, nobody+pnasrat
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-27 21:03:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch to use fcntl locking rather than posix-mutexes in rpm none

Description Jeremy Sanders 2004-12-02 15:14:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.5)
Gecko/20041107 Firefox/1.0

Description of problem:
We have a fairly unusual setup. We have several diskless PCs which
mount their root file systems over NFS (using nfsroot). Each machine
has a separate complete Fedora Core installation on the NFS server
(which is a also a FC2 machine), cloned from a separate installation.
We can run rpm/yum on the clients, and they behave like "real"
installations.

This worked very well with RedHat 7.3. Using FC2 we're seeing lockups
and corruptions with rpm on these client systems. If you repeatedly
run "rpm -qa" on the machine, then the /var/lib/rpm/__db.* files
become corrupted:

[root@xs13 tmp]# /bin/rpm -qa
rpmdb: Locker is not valid
error: db4 error(22) from db->close: Invalid argument
error: cannot open Packages index using db3 - Invalid argument (22)
error: cannot open Packages database in /var/lib/rpm
no packages

Sometimes rpm hangs with 100% CPU with no disk access.

Sometimes it just crashes:
[root@xs13 tmp]# /bin/rpm -qa
ethtool-1.8-3.1
libstdc++-3.3.3-7
fedora-release-2-4
pcre-4.5-2
shadow-utils-4.0.3-21
zlib-1.2.1.1-2.1
grep-2.5.1-26
procps-3.2.0-1.1
Segmentation fault (core dumped)

[root@xs13 tmp]# /bin/rpm -qa
rpmdb: Lock table is out of available locks
error: cannot open Pubkeys index using db3 - Cannot allocate memory (12)
ethtool-1.8-3.1
[have to kill with -9 here]

Is this an NFS problem, or can rpm simply not run over NFS? This is a
shame as it's very nice to have diskless FC2 systems.

One hack is to use a script to call rpm instead:

-------------------------------------------------------
#!/bin/bash

# short program to get around problems with rpm database on nfs root

/usr/bin/lockfile /tmp/rpm-lock
/bin/rm -f /var/lib/rpm/__db.*
/bin/rpm.original "$@"
/bin/rm -f /tmp/rpm-lock
-------------------------------------------------------

Alternatively, perhaps putting the __db files on a memory disk might
be a good idea, though I don't know how to do that.


Version-Release number of selected component (if applicable):
rpm-4.3.1-0.3

How reproducible:
Always

Steps to Reproduce:
1. Run FC2 over nfsroot
2. Run rpm -qa lots of times
3.
    

Additional info:

Comment 1 Jeff Johnson 2004-12-02 19:20:15 UTC
Berkeley DB requires locks, which are quite painful to
insure are transported reliably over NFS. See analysis
at http://www.sleepycat.com.

Basically, you're lucky that anything worked at all
with an rpmdb across NFS, nothing else, with any version
of rpm that uses any version or Berkeley DB.

You have two obvious choices if you wish to pursue
a remote rpmdb:

a) make absolutely sure that fcntl is "POSIXLY compliant" across
NFS (that is a whole different can-of-worms than what rpm+bdb
"support"). At a minimum, you will need to recompile rpm (which
has internal Berkely DB) to use fcntl rather than posix mutexes
for locking. I can tell you what to do, but I cannot "support"
fcntl locking in rpm as default, as shared posix mutexes to
permit unified thread/process locks within rpm are a far more
important feature for rpm imho.

b) consider using sunrpc rather than NFS to access the remote rpmdb.
See the berkeley_db_svc documentation at www.sleepycat.com. I know
that sunrpc "works" for me anyways, at least for O_RDONLY access,
pretty reliably, and I've tried a few O_RDWR installs as well.
Again, I cannot "support" this functionality in rpm as there's
a slew of known problems with sunrpc, not the least of which
is that there are known security risks. But I will happily tell
you what to do. I suspect that this pathway will require you
to do a custom rpm build as well, as I only look at sunrpc in rpm
every 9 to 10 months, and there's always something that needs
fixing.

I do not believe that you can solve the locking issues adequately
with a dot-file-lock approach like sendmail uses wrapping rpm.

And rpm will almost always segfault when inconsistent (because of
concurrent access of two operations, one writing, the other reading)
data is returned, leaving behind other baggage that needs to be
cleaned up in __db* files.

Shared memory is possible through rpm configuration instead is
possible, but does not solve the locking problem adequately,
because lock state needs to be transported to multiple clients,
and only posixly correct fcntl locking or sunrpc to single
server are known to be sufficiently general solutions.

So name your poison please ;-)

Comment 2 Jeremy Sanders 2004-12-02 20:55:07 UTC
Thanks for the useful answer.

Only one client has access to the rpm database - an entire
installation is replicated for each client. Would that make shared
memory a possibility?

I'd like to try out the fcntl locking approach if it's not too
difficult for you to tell me what needs modifying. Looking at the rpm,
source do I have to get configure to tell Berkeley DB to use the
UNIX/fcntl locking? Should I modify the configure script in the RPM
source to use --with-mutex=UNIX/fcntl instead of --enable-posixmutexes?

If that doesn't work I can go down the rpc approach. Is it relatively
simple to tell rpm to use a particular server and directory, rather
than /var/lib/rpm?

Comment 3 Jeff Johnson 2004-12-03 00:50:27 UTC
If /var/lib/rpm is per-client, then you probably have some
other choices that will work as well.

Here's the line from /usr/lib/rpm/macros that controls
most of the easily reconfigurable rpmdb options:
    %__dbi_cdb   create cdb mpool mp_mmapsize=16Mb mp_size=1Mb

Put a copy of that line in /etc/rpm/macros on whatever client
you want to experiment with.

The first thing to try is to disable all locking by doing
    %__dbi_cdb   create
in the line above. That should disable almost all the locking
and baggage, and might be useful if you can check that rpm
installs/upgrades/erases are always run as if the client
were "single user", shared read access on unchanging data
is always possible.

Using sunrpc is perhaps a little easier to describe than how to
rebuild rpm with fcntl locking (yes, basically
    --with-mutex=UNIX/fcntl
is what is needed, but there's something else that needs doing too)

Here's the config for sunrpc access on the client:
%__dbi_cdb   create cdb mpool mp_mmapsize=16Mb mp_size=1Mb client
server=localhost
(that all should be one line as above).

To start the server, try doing
    rm -f /var/lib/rpm/__db*
    /usr/lib/rpm/rpmdb_svc -v -h /var/lib/rpm

Then on the client, add the config to /etc/rpm/macros, and
try a query like
    rpm -q popt

The server should display something like:
# /usr/lib/rpm/rpmdb_svc -v -h /var/lib/rpm
Added home rpm in dir /var/lib
Running recovery on /var/lib/rpm
/usr/lib/rpm/rpmdb_svc:  Ready to receive requests
Closing dbp id 1102034693
Closing dbp id 1102034692
Closing env id 1102034691

You might have to adjust paths a bit, as both server and client
should have the same path to the database iirc (see the sleepycat
doco for the options, and setup, rpmdb_svc is exactly berkeley_db_svc)

Does one of those solutions "work" for you?


Comment 4 Jeff Johnson 2004-12-03 01:03:50 UTC
Created attachment 107810 [details]
Patch to use fcntl locking rather than posix-mutexes in rpm

This patch worked for rpm built from CVS a couple
weeks ago. Make the equivalent change to the rpm.spec,
and apply the patch, if you want to use rpm
to build rpm packages.

Comment 5 Jeremy Sanders 2004-12-03 12:53:08 UTC
I tried the "%__dbi_cdb   create" approach, but it still corrupts the
rpm database after a while. I don't know why.

The rpmdb_svc idea seems to work fine (running the server on the
machine which actually has the disk with the database, on the client
it corrupts the database again). Security is the slight problem here
as the service takes a random port number, so it's hard to firewall.
However the daemon takes a "-P password" option. Unfortunately there
doesn't seem to be a way (glancing at the source) to tell rpm to pass
a specific password to the server. Could this be added to rpm easily?

I haven't tried the fcntl patch yet. I did try using the
"fcntl_locking" flag on __dbi_cdb, but this has no apparent effect.



Comment 6 Jeff Johnson 2004-12-03 15:19:33 UTC
create has no locking whatsoever. if you cannot control
for "single user" installs and upgrades through other
means, then don't use that.

(guess) the password is likely for an AES encrypted invironment
rather than a client/server protocol challenge. I can
probably add that rather easily if that is what is needed,
haven't looked.

The port used by the sunrpc service probably can be made more
predictable for firewall rules too.

Comment 7 Jeremy Sanders 2004-12-03 15:26:59 UTC
When I was using create, it managed to corrupt the database with just
a single user, using rpm sequentially.

It would be nice to have a port option on the server, and possibly a
password option on rpm (if it's a client/server challenge).

However, I've written a nasty script to grep the /usr/sbin/rpcinfo
output after starting the rpc server, and construct an appropriate set
of iptable rules to block access to the port from the external world.

I haven't been able to break the rpm rpc setup yet :-) It all looks
very nice.


Comment 8 Jeff Johnson 2004-12-08 03:13:56 UTC
If you are happy with sunrpc, well, so am I ;-)

There's a couple minor issues that you may notice eventually.

Because the environment (i.e. the __db* files) is opened
opaquely with DB_JOIN by rpm, there's no way (well there
is a way, I've jest been too lazy to implement) to tell
the current values of certain flags like INIT_CDB. This shows
up as a noisy (but afaict harmless) message on both client
and server. Here's the server msg seen while running
    rpmdb_svc -v -h /var/lib/rpm
    ...
    /var/lib/rpm: illegal flag specified to DB->cursor

and something similar on the client side.

I can/will perhaps attempt to support sunrpc through
reconfiguration within rpm, as the fix is just a couple
lines of code in rpmdb/db3.c to not set a CDB specific flag
(WRITECURSOR? I fergit ...)

Comment 9 Jeremy Sanders 2004-12-10 12:24:55 UTC
I'm seeing some problems with the rpc server. Sometimes I get messages
from the clients like:

/etc/cron.daily/yum.cron:

rpmdb: Berkeley DB: RPC: Timed out

error: db4 error(-30993) from dbcursor->c_get: DB_NOSERVER: Fatal
error, no
RPC server
error: error(-30993) getting "Ã_a_­ís~I~\´Ã^_æÃ^?|o^H$$^P^H(&ãþsh
^H" records from Filemd5s index

I think the server sometimes gets overloaded, and fails. I'm using one
server for 32 machines, so this may be a problem.

I tried the fcntl patch, but I think there's a problem with it:

[root@xs6 rpm]# rpm -qa
rpmdb: Berkeley DB library configured to support only DB_PRIVATE
environments
error: db4 error(22) from dbenv->open: Invalid argument
error: cannot open Packages index using db3 - Invalid argument (22)
error: cannot open Packages database in /var/lib/rpm
no packages

I suspect this is to do with the #if 0 commented out part in the patch.


Comment 10 Jeff Johnson 2004-12-10 18:50:38 UTC
No, the #if 0 is absolutely critical and correct. But
almost certainly there's more that needs looking at
with the fcntl patch.

Yes, locking is per cursor, so a loaded server with
multiple requests through sunrpc will occaisionally
"cross cursors" (i.e. have a deadlock) and the error
message is becaus DB_WRITECURSOR is not the right
flag to pass when opening a dbcursor against a sunrpc
remote database.

AFAIK, the message is harmless however. And I
know the fix, the test is a bit trickier only because
a proper (imho) db client is supposed to know as
little as possible about the server db configuration,
and so there is no way in the BDB API to test an
dbenv for how it was opened originally, see details
at sleepycat, look for DB_JOIN. Meanwhile, the bits
are in the dbenv structure, all that is needed is to test,
and not pass DB_WRITECURSOR if sunrpc is in use.

Comment 11 Jeff Johnson 2005-02-07 22:14:34 UTC
Retooling this bug as an RFE for sunrpc access to a remote database.

Comment 12 Matthew Miller 2005-04-26 15:09:54 UTC
Fedora Core 2 is now maintained by the Fedora Legacy project for
security updates only. If this problem is a security issue, please
reopen and reassign to the Fedora Legacy product. If it is not a
security issue and hasn't been resolved in the current FC3 updates or
in the FC4 test release, reopen and change the version to match.

Comment 13 Paul Nasrat 2005-09-27 21:03:59 UTC
Basic RPC support using db4 is in upstream RPM, Fedora does not utilise this
functionality.  Issues with this functionality should be discussed on rpm-list
or on rpm-devel-list.