Bug 523698
Summary: | Needless incompatibility across distros by DB_HASH | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jan Kratochvil <jan.kratochvil> |
Component: | rpm | Assignee: | Panu Matilainen <pmatilai> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | ffesti, herrold, jnovy, msalter, n3npq, pmatilai, yersinia.spiros, zeekec |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-09-17 12:27:59 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jan Kratochvil
2009-09-16 13:29:54 UTC
For RPM 4.4.2 the backport was already rejected upstream https://bugzilla.redhat.com/show_bug.cgi?id=464752 (In reply to comment #1) > For RPM 4.4.2 the backport was already rejected upstream > > https://bugzilla.redhat.com/show_bug.cgi?id=464752 That Bug 464752 was about the __db.* files. This Bug is about the Packages file. Different problem. Bug 464752 is more easily workarounded which I do in cron mock updates by: rpm -r $i/root --rebuilddb (`rm -f $i/root/var/lib/rpm/__db.*' could possibly be enough, I do not know.) I was going to submit it next but thanks for the notice it was already WONTFIXed. The symptoms are different but it is not a different problem. The whole issue is that Berkeley DB provides backward but not forward compatibility. So when you have a mixture of Berkeley DB versions, incompatibilities arise. The problem in #464752 is quite straightforward, there's a version stamp in a file which causes an error return (EINVAL in older releases, DB_VERSION_MISMATCH in newer). The patch automates the corrective action, removing the file that has the wrong version stamp within. The problem here is that the hash version changed, and switching to DB_BTREE instead of DB_HASH avoids the problem (note that there is a non-trivial one time cost switching from DB_HASH -> DB_BTREE everywhere). There's a few more details ensuring that rpm itself can open an rpmdb transparently, dealing with both DB_BTREE and DB_HASH as found, not as configured. But the fundamental problem here and in #464752 is the same, ensuring transparent interoperation when there are multiple versions of Berkeley DB accessing a single rpmdb. That btree happens to work here is just getting lucky with the format not changing across these particular versions, not because it's somehow inherently "more compatible" than hashes. Btree is versioned just like hash is and can change incompatibly in any new BDB version. WONTFIX - rpm might switch to btree by default for other reasons (such as potentially better performance) at some point but not because of false hopes of better compatibility. Like Jeff points out, there are numerous things to take care of besides just changing the default configuration, and while a future rpm version might be able to deal with on-the-fly btree/ht detection/conversion, there's little chance that such code would end up in existing RHEL and even less chance for EOL Fedora version. Of course you're free to configure your own systems and chroots to use btree instead of hash while the luck with compatibility lasts. Lucky? Not using DB_HASH because it has a known incompatibility is "lucky"? Sure all the formats are version'ed, and can change whenever is necessary. That's also true for EPM: surely you should have changed the version format when you decided to use SHA256 rather than MD5 in *.rpm packages. But perhaps you just got "lucky". I pointed out that there is a one-time cost in converting. Well duh. I also pointed out an another nicety that is "optional". But go ahead, cite me to claim WONTFIX for a known to work change that avoids a luser incompatibility. Have fun! (In reply to comment #4) > WONTFIX - > rpm might switch to btree by default for other reasons (such as > potentially better performance) Expecting hash was chosen because rpm does not need to traverse the entries in sorted order. In such case btree is slower (O(log(n)) than hash (O(1)). It is just the current luck of better compatibility that may be worth the change (while the performance degradation may not be measurable). > while a future rpm version might be able to deal with on-the-fly btree/ht > detection/conversion, there's little chance that such code would end up in > existing RHEL and even less chance for EOL Fedora version. This is invalid argument. Current (F12) db4 btree is still compatible with existing epel-4 btree format. I did file this Bug for Rawhide, not for F9 or RHEL4. rpm change for F13 was the intended target of this Bug which would ease the epel-4 maintenance already in several months. > Of course you're free to configure your own systems and chroots to use btree > instead of hash while the luck with compatibility lasts. I already do workaround rpm4 by regular --rebuilddb (Bug 464752) and occasional db*_{dump,load} (this Bug). Suggesting workarounds is not the goal of a package maintainer assignment. Re comment #6: Actually, the reason for DB_HASH is hysterical, not O(1) performance. db-1.85 did not have a btree implementation a decade ago. Citing O(1) or O(log(n)), while true, misses real world issues. E.g. RPM must lookup file paths in two indices because the data is not rationally indexed, and a string beginning with '/' might be in either the Providename or the Basenames table. RPM is often forced to do sequential access (true for rpm -qa e.g.), and does too many redundant accesses. The above issues largely obliterate any performance benefit from using DB_HASH or DB_BTREE. Its rather easy to do the benchmarks, just add --stats to any RPM command and compare using DB_BTREE and DB_HASH. I did due diligence when I switched from DB_HASH to DB_BTREE @rpm5.org and there was no measurable performance gain from using either DB_BTREE or DB_HASH. There are other performance gains from improved access on certain paths. E.g. rpm-5.2.0 @rpm5.org has a measured (with callgrind and --stats) 14.6x speed-up by changing perhaps 50 lines of code on path lookups. But clearly I got "lucky" and just guessed which lines of code to change. |