Bug 486423

Summary: rpmdb locking broken by other-arch rpmquery
Product: [Fedora] Fedora Reporter: Jan Kratochvil <jan.kratochvil>
Component: rpmAssignee: Panu Matilainen <pmatilai>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 13CC: ffesti, jnovy, n3npq, pmatilai, tuju
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-27 10:06:34 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On: 480510    
Bug Blocks:    

Description Jan Kratochvil 2009-02-19 12:40:23 EST
rpmquery from rpm.i386 run by root on rpm.x86_64 is either refused or rpm.i386 makes rpmdb unusable for later runs of rpm.x86_64.  Both cases are not right.


Tried to use rpm.i386 on x86_64 F10 and it was refused:
# /tmp/rpm-4.6.0-1.fc10.i386/usr/bin/rpmquery -q rpm
rpmdb: /var/lib/rpm/__db.001: unable to find environment
error: db4 error(2) from dbenv->open: No such file or directory
error: cannot open Packages index using db3 - No such file or directory (2)
error: cannot open Packages database in /var/lib/rpm
rpmdb: /var/lib/rpm/__db.001: unable to find environment
error: db4 error(2) from dbenv->open: No such file or directory
error: cannot open Packages database in /var/lib/rpm
package rpm is not installed

As the error was printed after rpmquery.i386 examined `/var/lib/rpm/__db.001' tried to run normal x86_64:
# rpm --rebuilddb
# _
Everything was OK and /var/lib/rpm/__db.* files got deleted as usual.

Now I tried to run again:
# /tmp/rpm-4.6.0-1.fc10.i386/usr/bin/rpmquery -q rpm
rpm-4.6.0-1.fc10.x86_64
on Fedora 10 x86_64 and it worked fine now!

But now x86_64 rpm started to fail:
# rpm -qv rpm
rpmdb: munmap: Invalid argument
rpmdb: munmap: Invalid argument
<hang>
<later:>
rpmdb: unable to join the environment
error: db4 error(11) from dbenv->open: Resource temporarily unavailable
error: cannot open Packages index using db3 - Resource temporarily unavailable (11)
error: cannot open Packages database in /var/lib/rpm
rpmdb: munmap: Invalid argument

After `rm -f /var/lib/rpm/__db.*' it started working again.

(Bug 480510 Comment #3)
> the *database* is 32/64bit portable, the db environment is not. Going through
> the environment is the means to safe concurrent access to the db,

Therefore asking for fixing the biarch problem of the environment locking.

More user visible problem is that gdb.ppc64 using rpm-libs.ppc64 does not work because default rpm is rpm.ppc(32).

Possible workaround for GDB is to run popen("rpmquery ...") instead of librpm calls.
Comment 1 Panu Matilainen 2009-02-20 02:16:38 EST
Sharing of the environment across biarch is not something that can be fixed as such (it's deep internals of BDB and contains inherently architecture-dependent data like pointers to memory IIRC). There are possibilities like making librpm switch to not using environment and forcing read-only mode when incompatible environment is spotted, or gdb could fiddle with the db configuration macros to disable environment before it calls librpm but...

For what little GDB needs from rpm, making it use popen("rpm -q --qf  ...") would not be a bad idea at all, quite the contrary. It would avoid this issue and it'd make GDB independent of rpm versions.
Comment 2 Jan Kratochvil 2009-02-20 05:32:44 EST
Why the files /var/lib/rpm/__db.* exist after rpm successfully finishes?

For example `rpm --rebuilddb' does not leave any /var/lib/rpm/__db.* files there and it still works.  Also when there was some futex problem in the past the universal solution was to `rm -f /var/lib/rpm/__db.*' - and no data were lost.

It seems to me:
(1) Arch incompatibilities exist only in the /var/lib/rpm/__db.* files.
(2) /var/lib/rpm/__db.* files are some lock files which have no use between
    different rpm sessions.

Or was it just a luck rpmquery.i386 worked in the specific case for me with x86_64 /var/lib/rpm/ after it had deleted (by --rebuilddb) the /var/lib/rpm/__db.* files?
Comment 3 Panu Matilainen 2009-02-21 03:59:14 EST
Yes, the db itself is arch-independent, only the environment (the __db.* files) is not.

As to why rpm leaves the environment around after use is somewhat more complicated issue (hasn't been my decision, this is just the way it's "always" been):

Not removing is ok and almost recommended according to BDB documentation: "Calling DB_ENV->remove should not be necessary for most applications because the Berkeley DB environment is cleaned up as part of normal database recovery procedures." I suppose there's non-zero cost to setting up the environment instead of just using an existing one, but haven't measured it.

Perhaps the more practical issue is that the code to figure out correct flags on opening the rpmdb is racy when the environment is not present. So leaving the environment around swipes this issue under the carpet much of the time.

I actually do think that removing the environment after use is the sanest way to deal with bug 455836 and its variants, it just requires the races to be eliminated, either by a) rewriting the somewhat twisty opening logic in a non-racy way b) serializing the open+close paths with an extra lock. b) is probably the easier short-term fix.

In any case, I still think calling rpmquery instead of using librpm API would be beneficial for GDB: the same code would work on practically any rpm version you can dig up, it'd be one less library dependency and it would avoid the biarch issue.
Comment 4 Jan Kratochvil 2009-02-21 04:37:32 EST
(In reply to comment #3)
> I still think calling rpmquery instead of using librpm API would be beneficial for GDB:

The era of shell scripting with pipes is hopefully for 20+ years gone.

> the same code would work on practically any rpm version you can dig up,

There is missing a configure.ac check for librpm which can also check if the API is compatible for the functions in use.  It already has a pending patch for year another API of rpm5.org.

> it'd be one less library dependency

I was already considering using dlopen() there for this purpose.

> it would avoid the biarch issue.

This is a rpm problem.


Another possibility if librpm API is not considered as stable is to provide stable DBUS rpm API.  System applications database IMO should have some public API.  It would hopefully solve also the biarch problems.
Comment 5 Panu Matilainen 2009-02-24 01:22:12 EST
Heh, I'm quite fond of scripting with pipes. But sure, from C an API is nicer... it's just that librpm API access to rpmdb gets you involved with BDB peculiarities more than most API consumers would want to - this is a good example of that.

For consumers like gdb and net-snmp who only want to perform some fairly basic queries, a DBUS query API to the rpmdb would indeed provide a nice insulation layer from the BDB imposed quirks that librpm cannot fully encapsulate.
Comment 6 Jeff Johnson 2009-02-25 01:19:13 EST
The best way to fix this problem is to have rpm create a store
of file path -> N-V-R.A information so that detached -debuginfo
symbols can be found by gdb when needed. Its easier for rpm
to push specific information when installing -debuginfo packages.

Any other approach
   0) linking -lrpm (as done currently)
   1) piping to rpm --query
   2) dlopen'ing multi-arch libraries as needed
   3) removing the dbenv
so that gdb can pull information from an rpmdb forces
gdb to participate in concurrent locking schemes, and will
have failure modes.

Been there, done that, with all of the approaches above:
   0) net-snmp currently does -lrpm
   1) net-snmp used to run rpm --query
   2) Red Carpet tried dlopen with rpm-4.0.2. rather a disaster
   3) removing the dbenv opens races for every use of rpm.

The HR-MIB in net-snmp needs to know N-V-R.A and install time,
and that information is now pushed into /var/cache/hrmib at the same
time that a package is registered into an rpmdb. A similar approach with
gdb is better than any other option.
Comment 7 Bug Zapper 2009-11-18 07:46:10 EST
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 8 Bug Zapper 2009-12-18 02:58:32 EST
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.
Comment 9 Jan Kratochvil 2010-01-03 12:07:24 EST
/tmp/rpm-4.7.2-1.fc12.i686/usr/bin/rpmquery -q rpm
rpmdb: Build signature doesn't match environment
error: db4 error(-30971) from dbenv->open: DB_VERSION_MISMATCH: Database environment version mismatch
error: cannot open Packages index using db3 -  (-30971)
error: cannot open Packages database in /var/lib/rpm
rpmdb: Build signature doesn't match environment
error: db4 error(-30971) from dbenv->open: DB_VERSION_MISMATCH: Database environment version mismatch
error: cannot open Packages database in /var/lib/rpm
package rpm is not installed
Comment 10 Bug Zapper 2010-11-04 07:29:35 EDT
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 11 Bug Zapper 2010-12-05 02:00:27 EST
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.
Comment 12 Jan Kratochvil 2010-12-05 19:25:56 EST
Re-verified on rpm-4.8.1-2.fc13.x86_64 vs. rpm-4.8.1-2.fc13.i686.
Comment 13 Bug Zapper 2011-06-02 14:15:15 EDT
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 14 Bug Zapper 2011-06-27 10:06:34 EDT
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.