Description of problem: ======================= The 4.1.1-1.8x source RPM was compiled on one of our RedHat 7.2 systems, and seems to work fine until I run two copies of script1.pl and two copies of script2.pl (see "Steps to Reproduce" below). After just a few seconds of normal operation (the scripts should generate a large number of messages about contention for the rpm database), the scripts start to log lots of error messages indicating that the rpm database has been corrupted (for details see "Actual Results" below). Sometimes, one of the rpm operations hangs (for a sample backtrace, see the "Actual Results" below). Version-Release number of selected component (if applicable): ============================================================= rpm-4.1.1-1.8x.src.rpm How reproducible: ================= Every time. Steps to Reproduce: =================== 1. Replace SYSTEM_RPM_V1 and SYSTEM_RPM_V2 in script2.pl (see below) with any two versions of a system rpm which exists in your local filesystem (the contents of the package shouldn't matter). 2. Run the scripts defined below as follows: ./script1.pl >& one1.out& ./script1.pl >& one2.out& ./script2.pl >& two1.out& ./script2.pl >& two2.out& script1.pl: ----------- #!/usr/bin/perl sub catch_zap { my $signame = shift; printf($output); die; } $SIG{INT} = \&catch_zap; while (1) { $output = `rpm -qa`; $now = scalar localtime(time()); $counter++; printf("$counter ($now)\n"); } script2.pl: ----------- #!/usr/bin/perl sub catch_zap { my $signame = shift; print "output1=[$output1]\n"; print "output2=[$output2]\n"; die; } $SIG{INT} = \&catch_zap; while (1) { ($output1,$output2) = ('-','-'); $counter++; $now = scalar localtime(time()); $output1 = `rpm -U --oldpackage SOME_SYSTEM_RPM_V1`; $output2 = `rpm -U SOME_SYSTEM_RPM_V2`; printf("$counter ($now)\n"); } Actual results: =============== After a few seconds (less than a minute on a 2GHz Celeron) of "normal" failure messages about contention for the rpm database, weird error messages begin to appear in the output files, eg: rpmdb: fatal region error detected; run recovery error: db4 error(-30982) from db->sync: DB_RUNRECOVERY: Fatal error, run database recovery Or this (though these lines may actually have been produced by one of the earlier versions of rpm): rpmdb: /var/lib/rpm/Packages: unexpected file type or format error: cannot open Packages index using db3 - Invalid argument (22) Sometimes (typically within a couple of minutes), one of the rpm processes hangs. In this case you will notice that the output file for the associated Perl script ceases to be updated. The backtrace of a hung rpm process typically looks something like this: #0 0x08117d90 in __memp_fget_rpmdb () #1 0x080f04f0 in __db_free_rpmdb () #2 0x080f105a in __db_doff_rpmdb () #3 0x080ffa55 in __ham_del_pair_rpmdb () #4 0x080f9f7d in __ham_c_del () #5 0x080e8770 in __db_c_del_rpmdb () #6 0x080905df in db3cdel () #7 0x0808cf61 in rpmdbRemove () #8 0x0806374e in rpmpsmStage () #9 0x08062f5e in rpmpsmStage () #10 0x080632a4 in rpmpsmStage () #11 0x0807dc92 in rpmtsRun () #12 0x0806f24a in rpmInstall () #13 0x08049554 in main () #14 0x08155672 in __libc_start_main () Sometimes, both versions of the system package which is upgraded & downgraded by script2.pl end up in the rpm database at the same time. Sometimes, the rpm database gets so mangled that the magic number on the /var/lib/rpm/Packages becomes corrupted, and 'file' reports the Packages file as 'data' instead of 'Berkeley DB'. Expected results: ================= The scripts should produce error messages about contention for the rpm database, but should never freeze for more than a few seconds. It should be possible to run the scripts for a whole week (disk space for output files permitting) without any rpm processes hanging or the rpm database becoming corrupted, or two versions of the same package being added to the rpm database. Additional information: ======================= I've tried this same experiment with several versions of rpm. All have failed, but in different ways: RPM Version Behavior of my 4 Perl scripts ----------- ----------------------------- 4.0.4x rpm hangs after ~20-30 minutes 4.0.5 rpm hangs after ~20-30 minutes (deadlock) 4.1.1 rpm database becomes corrupted (and rpm sometimes hangs) For details re 4.0.5 and 4.0.4x, see Bug 11480. Bug 89728 and Bug 12443 look similar, but that's based only on a superficial reading of their notes.
Well, we meet again ;-) But your script is still rather unrealistic. Your database can be recovered by doing cd /var/lib/rpm mv Packages Packages-ORIG db_dump Packages-ORIG | db_load Packages rpm --rebuilddb -vv Use rpmdb_dump and rpmdb_load in /usr/lib/rpm if you have. Locks can be displayed by doing cd /var/lib/rpm db_stat -CA Use /usr/lib/rpm/rpmdb_stat if you have that. What is the output?
This bug is invalid, because my tests were never run with rpm version 4.1.1 after all. It turns out that my organization is committed to an older (patched) version of glibc, which unfortunately rules out rpm versions 4.1.* and 4.2.*. The tests that I thought were being run with rpm version 4.1.1 (also mentioned in Bug 11400) were in fact being run with version 4.0.5. Oops. Here's my corrected table o' knowlege: rpm version glibc version behavior of my Perl scripts ----------- ------------- --------------------------- 4.0.4 2.2 rpm hangs after ~20-30 minutes (deadlock) 4.0.5 2.2 rpm database becomes corrupted (and rpm sometimes hangs) 4.1.1 2.3 ? Apologies for the confusion and thanks, Jeff, for your time. I would of course be interested to know what happens when my Perl scripts run for an hour with the latest & greatest version of rpm, but I won't have time to build a machine for this purpose at least for the next couple of months. Please let me know if someone else gets a chance to try this experiment.