479915 – rpmdb environment creation is racy

Bug 479915 - rpmdb environment creation is racy

Summary: rpmdb environment creation is racy

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	rpm
Sub Component:
Version:	5.2
Hardware:	ia64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Packaging Maintenance Team
QA Contact:	BaseOS QE Security Team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	502912 590060 668957
TreeView+	depends on / blocked

Reported:	2009-01-14 00:11 UTC by Bryan Mason
Modified:	2018-11-14 18:34 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-03-18 12:16:11 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Bryan Mason 2009-01-14 00:11:09 UTC

Description of problem:

    When running the commands:

        rm -f /var/lib/rpm/__db.*; 
        rpm -q kernel-debuginfo > /dev/null & 
        rpm -q kernel-debuginfo > /dev/null &

    The following error message will occasionally be displayed:

        rpmdb: Program version 4.3 doesn't match environment version
        error: db4 error(-30974) from dbenv->open: DB_VERSION_MISMATCH: 
        Database environment version mismatch
        error: cannot open Packages index using db3 -  (-30974)
        error: cannot open Packages database in /var/lib/rpm

    or
 
	error: db4 error(2) from dbenv->open: No such file or directory
	error: cannot open Packages index using db3 - No such file or 
	directory (2)
	error: cannot open Packages database in /var/lib/rpm

    Since the error "cannot open packages database in %s" comes from
    rpmtsOpenDB, maybe the db lock race mentioned in that function is
    being hit?

Version-Release number of selected component (if applicable):

    rpm-4.4.2.3-7.el5

How reproducible:

    2-3 out of 100 iterations of the script above.

Steps to Reproduce:
1.  Run 
 
    rm -f /var/lib/rpm/__db.*; 
    rpm -q kernel-debuginfo > /dev/null & 
    rpm -q kernel-debuginfo > /dev/null &

2.  Rinse, repeat.
  
Actual results:

    Error messages shown above.

Expected results:

    No errors.

Additional info:

    Only reproducible in the ia64 architecture.  Not reproducible on
    x86_64.

Comment 15 RHEL Program Management 2009-03-24 18:24:47 UTC

Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Comment 16 Issue Tracker 2009-03-27 01:45:46 UTC

Dear Watanabe-san,

----
Though we seem that this problem is not fixed from the latest BZ comment,
what's this ticket going?
Please let us know about it.
----

Engineerings now think of your situation and racy condition on berkley DB
used by rpm,
 and want more information about symptoms to decide if we could provide
fix to initscripts.

-----
   In the system booting, rpmdb file is erased in rc.sysinit.
   Our MW executes rpm -q kernel-debuginfo only once in system booting.
   Also, rpm command executes in another MW.
   However, because the problem only rarely occurs, we could not get the
data.

Can you explain this?  What is "MW"?  What occurs to make two rpm
commands run simultaneously in the customer's boot scripts?

FJ also said, "it is very difficult for user to fix the their script." 
Can you explain why it is very difficult?
-----

So sorry for bothering you but please let me now about...

1. What are MWs and How it worked for customer's system actually?
  Since 'issue' you provided states that "Related Middleware /
Application: None."
  So please let me know what/How MW -- MiddleWare"s" on racy condition
affect this problem?

  To make things more clear, please kindly provide 
    e.g. Situation/commands, Name of MiddleWare"s",if possible, and
example for *HOW* affect to the system on the boot?

2. How/Why Difficulty on customer's side?
  Could you explain why it is very difficult to avoid this?
  Maybe, you meant it's not supported by middleware vendors that users
change startup scripts, and so on?
  Please provide more detailed explanation , since we want to know where
your customer stands on.

Thanks in advance.

Regards,
Masaki Furuta


Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by mfuruta 
 issue 249003

Comment 24 Issue Tracker 2009-05-22 07:54:58 UTC

Hi Watanabe-san,

We could provide Hotfix package for you, and will request right now!
Please let me know your concern about this!

Here's details from our engineering:
----
SEG and Engineering Management have agreed to provide a one-off Hotfix for
the customer.  This means that we will provide them with a supported
package that fixes the problem by adding the 'rpm -q' command to
rc.sysinit, but we will not provide this fix to other customers.

There are still some details that need to be worked out, so I'm not sure
how much we can tell Fujitsu just yet, other than that we're working to
come up with a solution that will be acceptable to everybody.
---

Thanks in advance.

Regards,
Masaki Furuta



This event sent from IssueTracker by mfuruta 
 issue 249003

Comment 25 Issue Tracker 2009-05-22 13:26:42 UTC

Hi Watanabe-san,

Ok, I believed I had understood/shared your thought.
And yes, I agree with you, and we know it's best for all of us that fix
will be released as Eratta on RHEL5.4.

But now, the situation is very hard to us, because this ultimately is an
rpm issue and we also know that we could not change behaviour like
serializing rpmdb open/close on rpmdb easiliy, since it needs some further
work and wider testing somewhere else than RHEL first. So an rpm-level fix
for this is not going to happen for 5.4. 

In addition, if we could include this, anything in initscripts is just
working around things. 
And I believe that these workarounds also might cause more
troubles/confusion for you and your customers, when rpm(db) will be fixed
and move it out of initscripts pkg into rpm pkg. 

So, Let me know that if there's *CHANCE* to release this as HotFix at
this moment?
As you said, when this problem hit multiple customers, please let us how
many there is?
And how could we support those, your customers by HotFix?

Thanks in advance.

Regards,
Masaki Furuta



Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by mfuruta 
 issue 249003

Comment 30 Issue Tracker 2009-06-05 08:15:31 UTC

Hi Watanabe-san,

Sorry for delay, 
engineering still investigateing best way to solve this, but this really
is considered as must-be-fixed issue.

---- Here's comment from engineering: ----
This is a rather important thing to fix, as the same thing that cures the
races allows curing several other annoyances (see comment #4) too. So the
answer to the "will this be fixed" is certainly "yes, this is a
must-fix issue", just the when part is open: I'm still investigating how
to best fix the thing upstream, there's a whole tangle of locks and
several access modes with funny twists and turns to deal with.
----

I'll keep you posted, and also, could you let me know more detailed
concern about what to add description of this problem to rpm's man-page
and kbase? Please be aware that we would not promise those things but I
believe it's good for us to know your concern:

  * What/How would you like to be explained to your customer? 
    e,g. About symptoms/workaronds, and/or as limitation etc..?, if what
you would like to suggest?  
  * Are there specific customers having time limits for this or special
concern?
    e,g. should be describe on man page rather than kbase etc..

Please let me know your thoughts, I will disscuss this with engineerings.
And feel free to ask me if anything else.

Thanks in advance.

Regards,
Masaki Furuta

Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by mfuruta 
 issue 249003

Comment 35 RHEL Program Management 2009-11-06 18:55:09 UTC

This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 37 Bill Nottingham 2009-11-11 17:08:35 UTC

We are not going to add 30+ seconds (at a minimum!) to every boot to run --rebuilddb.

Comment 53 RHEL Program Management 2011-05-31 14:08:19 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 57 RHEL Program Management 2012-04-02 10:29:18 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 61 Florian Festi 2013-03-18 12:16:11 UTC

Fixing this requires reimplementing the locking completely. This would be much too invasive for an RHEL5 update and this rather special use case does not justify the risk of other regressions. Closing. Sorry!

Note You need to log in before you can comment on or make changes to this bug.