Description of problem: When running the commands: rm -f /var/lib/rpm/__db.*; rpm -q kernel-debuginfo > /dev/null & rpm -q kernel-debuginfo > /dev/null & The following error message will occasionally be displayed: rpmdb: Program version 4.3 doesn't match environment version error: db4 error(-30974) from dbenv->open: DB_VERSION_MISMATCH: Database environment version mismatch error: cannot open Packages index using db3 - (-30974) error: cannot open Packages database in /var/lib/rpm or error: db4 error(2) from dbenv->open: No such file or directory error: cannot open Packages index using db3 - No such file or directory (2) error: cannot open Packages database in /var/lib/rpm Since the error "cannot open packages database in %s" comes from rpmtsOpenDB, maybe the db lock race mentioned in that function is being hit? Version-Release number of selected component (if applicable): rpm-4.4.2.3-7.el5 How reproducible: 2-3 out of 100 iterations of the script above. Steps to Reproduce: 1. Run rm -f /var/lib/rpm/__db.*; rpm -q kernel-debuginfo > /dev/null & rpm -q kernel-debuginfo > /dev/null & 2. Rinse, repeat. Actual results: Error messages shown above. Expected results: No errors. Additional info: Only reproducible in the ia64 architecture. Not reproducible on x86_64.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.
Dear Watanabe-san, ---- Though we seem that this problem is not fixed from the latest BZ comment, what's this ticket going? Please let us know about it. ---- Engineerings now think of your situation and racy condition on berkley DB used by rpm, and want more information about symptoms to decide if we could provide fix to initscripts. ----- In the system booting, rpmdb file is erased in rc.sysinit. Our MW executes rpm -q kernel-debuginfo only once in system booting. Also, rpm command executes in another MW. However, because the problem only rarely occurs, we could not get the data. Can you explain this? What is "MW"? What occurs to make two rpm commands run simultaneously in the customer's boot scripts? FJ also said, "it is very difficult for user to fix the their script." Can you explain why it is very difficult? ----- So sorry for bothering you but please let me now about... 1. What are MWs and How it worked for customer's system actually? Since 'issue' you provided states that "Related Middleware / Application: None." So please let me know what/How MW -- MiddleWare"s" on racy condition affect this problem? To make things more clear, please kindly provide e.g. Situation/commands, Name of MiddleWare"s",if possible, and example for *HOW* affect to the system on the boot? 2. How/Why Difficulty on customer's side? Could you explain why it is very difficult to avoid this? Maybe, you meant it's not supported by middleware vendors that users change startup scripts, and so on? Please provide more detailed explanation , since we want to know where your customer stands on. Thanks in advance. Regards, Masaki Furuta Internal Status set to 'Waiting on Customer' Status set to: Waiting on Client This event sent from IssueTracker by mfuruta issue 249003
Hi Watanabe-san, We could provide Hotfix package for you, and will request right now! Please let me know your concern about this! Here's details from our engineering: ---- SEG and Engineering Management have agreed to provide a one-off Hotfix for the customer. This means that we will provide them with a supported package that fixes the problem by adding the 'rpm -q' command to rc.sysinit, but we will not provide this fix to other customers. There are still some details that need to be worked out, so I'm not sure how much we can tell Fujitsu just yet, other than that we're working to come up with a solution that will be acceptable to everybody. --- Thanks in advance. Regards, Masaki Furuta This event sent from IssueTracker by mfuruta issue 249003
Hi Watanabe-san, Ok, I believed I had understood/shared your thought. And yes, I agree with you, and we know it's best for all of us that fix will be released as Eratta on RHEL5.4. But now, the situation is very hard to us, because this ultimately is an rpm issue and we also know that we could not change behaviour like serializing rpmdb open/close on rpmdb easiliy, since it needs some further work and wider testing somewhere else than RHEL first. So an rpm-level fix for this is not going to happen for 5.4. In addition, if we could include this, anything in initscripts is just working around things. And I believe that these workarounds also might cause more troubles/confusion for you and your customers, when rpm(db) will be fixed and move it out of initscripts pkg into rpm pkg. So, Let me know that if there's *CHANCE* to release this as HotFix at this moment? As you said, when this problem hit multiple customers, please let us how many there is? And how could we support those, your customers by HotFix? Thanks in advance. Regards, Masaki Furuta Internal Status set to 'Waiting on Customer' Status set to: Waiting on Client This event sent from IssueTracker by mfuruta issue 249003
Hi Watanabe-san, Sorry for delay, engineering still investigateing best way to solve this, but this really is considered as must-be-fixed issue. ---- Here's comment from engineering: ---- This is a rather important thing to fix, as the same thing that cures the races allows curing several other annoyances (see comment #4) too. So the answer to the "will this be fixed" is certainly "yes, this is a must-fix issue", just the when part is open: I'm still investigating how to best fix the thing upstream, there's a whole tangle of locks and several access modes with funny twists and turns to deal with. ---- I'll keep you posted, and also, could you let me know more detailed concern about what to add description of this problem to rpm's man-page and kbase? Please be aware that we would not promise those things but I believe it's good for us to know your concern: * What/How would you like to be explained to your customer? e,g. About symptoms/workaronds, and/or as limitation etc..?, if what you would like to suggest? * Are there specific customers having time limits for this or special concern? e,g. should be describe on man page rather than kbase etc.. Please let me know your thoughts, I will disscuss this with engineerings. And feel free to ask me if anything else. Thanks in advance. Regards, Masaki Furuta Internal Status set to 'Waiting on Customer' Status set to: Waiting on Client This event sent from IssueTracker by mfuruta issue 249003
This request was evaluated by Red Hat Product Management for inclusion, but this component is not scheduled to be updated in the current Red Hat Enterprise Linux release. If you would like this request to be reviewed for the next minor release, ask your support representative to set the next rhel-x.y flag to "?".
We are not going to add 30+ seconds (at a minimum!) to every boot to run --rebuilddb.
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Fixing this requires reimplementing the locking completely. This would be much too invasive for an RHEL5 update and this rather special use case does not justify the risk of other regressions. Closing. Sorry!