Suppose a package A contains /use/share/doc/foo. Now A is split in A and B, where /usr/share/doc/foo is moved to B. If you now install B and afterwards update A /usr/share/doc/foo will be gone, because if update of A doesn't check if file foo belongs to another package because of the skipDirs() call. SkipDirs should not be called on package removal. (I propose that rpmdbFindFpList gets another parameter, char **skipdirs. That would be much cleaner than hardcoding a list of directories in a database funtion...)
Oooh, nice catch, thank you. Yes, hardwiring compiled in paths in a database function is dead wrong. However, the solution to the problem runs much much deeper than adding an argument to rpmdbFindFpList. E.g. disk accounting for all the directories in skipDirs[] is wrong. And even that isn't the real problem. Try the following: 1) Delete the line _skip("/lib/modules"), 2) Grab the odd 16 different kernel packages. 3) Install kernels one by one, monitoring the memory footprint using top. I've caught rpm at ~1GB memory footprint with ~20 kernels installed. The memory foot print is scaling as like n**3 (my guess) in the number of duplicate base names which, in the case of kernel packages with approx. 18K identically named files, gets quite large in a hurry. So the real fix is more than passing char ** skipDirs to rpmdbFindFpList(), the actual algorithm in rpmdbFindFpList() is busted when packages contain large numbers of duplicate base names.
Yes, that's why SuSE uses has the tagged fileindex patch. It deals nicely with duplicate basenames but breaks fingerprint handling a bit (if directories are symlinked). Btw, as I wrote in #103204, you really should add the rpmdbFindFpListExclude function, it doesn't change semantics and makes updates a bit faster.
Partial solutions are well known. I await a complete and final solution.
*** Bug 164883 has been marked as a duplicate of this bug. ***
*** Bug 164518 has been marked as a duplicate of this bug. ***
*** Bug 119372 has been marked as a duplicate of this bug. ***
I've put the excludes patch in to rawhide for testing.
This patch involves a schema change in the rpmdb join key and is insufficiently general. You've been warned.
Created attachment 121649 [details] patch from rawhide I don't see how this would change the rpmdb join key - it's purely the exclude method which is wrapped by the existing method so existing callers should be unaffected. It's not the full taggeddirs patch.
The tagged file index patch mentioned in comment #2 adds the file element index to the most significant 16 bits of the join key (from 2 year olod memory). Meanwhile -- as I pointed out in commen t #1 -- a real fix needs more than the ability to configure a list of paths. Disk accounting on all paths mentioned in the skip list is broken no matter how the list is specified, and specifying the list of directory paths on which disk accounting should be broken solves no real problem.
The core issue is that fingerprints are *ALWAYS* supposed to match when files are identical (and they do). Adding additional logic, like the exclude arg, to try and regain functionality that is broken by deliberately introducing skiplists to reduce the memory footprint of a broken algorithm is a hack to a hack. Hiding additional information in the join key is limited by the available no. of bits. There are packages that have >65K files, and there are known cases with even that number of duplicate basenames. And the comment " but breaks fingerprint handling a bit (if directories are symlinked)" indicates that the approach is flawed, as the raison d'etre for fingerprints is to identify and manage identical files *IN SPITE OF ALTERNATIVE SYMLINKED PATHS*. The right fixes are 1) scrap fingerprints entirely, the functionality is hardly needed anymore with large disks. 2) fix the broken algorithm. Good luck! Hacking around trying to regain functionality in spite of deliberately introduced breakage serves no purpose IMHO.
You didn't look at the "exclude" part, did you? It is about 10 lines of code change. It is used so that rpm doesn't try to intersect a header with itself (as this will always match) and filter out the matches later. It makes erase operations quite a bit faster. I misuse the exclude parameter in suse's rpm to identify erases and skip the skiplist in that case, but that's just an ugly workaround and not a clean solution.
So this is still happeneing in (soon to be) FC6. The most problematic part aren't the docs but all the translations. As it was mentionned in a bug marked as a duplicate of this one, simply doing "yum remove glibc.i686" on a default x86_64 installation results in thousands of missing files, of which docs and translations for quite important packages like many parts of GNOME and system tools.
FWIW, I believe this problem is fixed in rpm-4.4.7 (perhaps 4.4.6, I forget) by removing the skip list entirely (not verified, removing glibc.i686 is not possible on any machine I have access too).
Could this bug be fixed in the FC-5 branch too?
And FC6... This should be reopened with (at least one of the) fc5/fc6/devel targets, because in "MODIFIED" state, it risks getting forgotten. (The problem of the same bugzilla being used for upstream and downstream RPM strikes again.)
FYI, "MODIFIED" is the state when downstream, not upstream has "fixed" a problem. If you want to reopen, it means that the downstream "fix" dies not.
Possible dup - bug 209306.
About to be fixed in rpm cvs, will be in rpm-4.4.9-0.3 when built. UPSTREAM
CLOSED
*** Bug 223639 has been marked as a duplicate of this bug. ***
User pnasrat's account has been closed
Reassigning to owner after bugzilla made a mess, sorry about the noise...
This has been fixed in rpm-4.4.2.1-1.fc6 which has now been pushed to updates.