140055 – skipDirs handling in rpmdbFindFpList is evil and breaks updates

Bug 140055 - skipDirs handling in rpmdbFindFpList is evil and breaks updates

Summary: skipDirs handling in rpmdbFindFpList is evil and breaks updates

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	rpm
Sub Component:
Version:	6
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Panu Matilainen
QA Contact:	Mike McLean
Docs Contact:
URL:
Whiteboard:
Duplicates (4):	119372 164518 164883 223639 (view as bug list)
Depends On:
Blocks:	119372
TreeView+	depends on / blocked

Reported:	2004-11-19 15:22 UTC by Michael Schröder
Modified:	2007-11-30 22:10 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2007-08-27 17:52:48 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
patch from rawhide (2.25 KB, patch) 2005-11-30 19:48 UTC, Paul Nasrat	no flags	Details \| Diff
View All

Description Michael Schröder 2004-11-19 15:22:27 UTC

Suppose a package A contains /use/share/doc/foo. Now A is split 
in A and B, where /usr/share/doc/foo is moved to B. 
 
If you now install B and afterwards update A /usr/share/doc/foo 
will be gone, because if update of A doesn't check if file foo 
belongs to another package because of the skipDirs() call. 
 
SkipDirs should not be called on package removal. 
 
(I propose that rpmdbFindFpList gets another parameter, 
char **skipdirs. That would be much cleaner than hardcoding a list 
of directories in a database funtion...)

Comment 1 Jeff Johnson 2004-11-20 04:08:44 UTC

Oooh, nice catch, thank you.

Yes, hardwiring compiled in paths in a database function
is dead wrong. However, the solution to the problem runs
much much deeper than adding an argument to rpmdbFindFpList.

E.g. disk accounting for all the directories in skipDirs[]
is wrong. 

And even that isn't the real problem. Try the following:

1) Delete the line
    _skip("/lib/modules"),

2) Grab the odd 16 different kernel packages.

3) Install kernels one by one, monitoring the memory
footprint using top.

I've caught rpm at ~1GB memory footprint with ~20 kernels
installed.

The memory foot print is scaling as like n**3 (my guess) in the
number of duplicate base names which, in the case of kernel
packages with approx. 18K identically named files, gets quite
large in a hurry.

So the real fix is more than passing char ** skipDirs
to rpmdbFindFpList(), the actual algorithm in rpmdbFindFpList()
is busted when packages contain large numbers of duplicate
base names.

Comment 2 Michael Schröder 2004-11-22 10:49:05 UTC

Yes, that's why SuSE uses has the tagged fileindex patch. It deals 
nicely with duplicate basenames but breaks fingerprint handling a bit 
(if directories are symlinked). 
 
Btw, as I wrote in #103204, you really should add the 
rpmdbFindFpListExclude function, it doesn't change semantics and 
makes updates a bit faster.

Comment 3 Jeff Johnson 2004-11-22 17:22:28 UTC

Partial solutions are well known. I await a complete and
final solution.

Comment 4 Paul Nasrat 2005-11-28 18:50:44 UTC

*** Bug 164883 has been marked as a duplicate of this bug. ***

Comment 5 Paul Nasrat 2005-11-28 18:57:09 UTC

*** Bug 164518 has been marked as a duplicate of this bug. ***

Comment 6 Paul Nasrat 2005-11-28 23:32:03 UTC

*** Bug 119372 has been marked as a duplicate of this bug. ***

Comment 7 Paul Nasrat 2005-11-29 17:00:45 UTC

I've put the excludes patch in to rawhide for testing.

Comment 8 Jeff Johnson 2005-11-30 19:39:49 UTC

This patch involves a schema change in the rpmdb join key and is insufficiently general. You've been 
warned.

Comment 9 Paul Nasrat 2005-11-30 19:48:13 UTC

Created attachment 121649 [details]
patch from rawhide

I don't see how this would change the rpmdb join key - it's purely the exclude
method which is wrapped by the existing method so existing callers should be
unaffected.  It's not the full taggeddirs patch.

Comment 10 Jeff Johnson 2005-11-30 20:04:23 UTC

The tagged file index patch mentioned in comment #2 adds the file element
index to the most significant 16 bits of the join key (from 2 year olod memory).

Meanwhile -- as I pointed out in commen t #1 -- a real fix needs more than the
ability to configure a list of paths. Disk accounting on all paths mentioned
in the skip list is broken no matter how the list is specified, and specifying the
list of directory paths on which disk accounting should be broken solves no real problem.

Comment 11 Jeff Johnson 2005-11-30 20:33:35 UTC

The core issue is that fingerprints are *ALWAYS* supposed to match
when files are identical (and they do). Adding additional logic, like the exclude arg, to try
and regain functionality that is broken by deliberately introducing skiplists
to reduce the memory footprint of a broken algorithm is a hack to a hack.

Hiding additional information in the join key is limited by the available no.
of bits. There are packages that have >65K files, and there are known cases with
even that number of duplicate basenames.

And the comment " but breaks fingerprint handling a bit  (if directories are symlinked)"
indicates that the approach is flawed, as the raison d'etre for fingerprints is to
identify and manage identical files *IN SPITE OF ALTERNATIVE SYMLINKED PATHS*.

The right fixes are
   1) scrap fingerprints entirely, the functionality is hardly needed anymore with large disks.
   2) fix the broken algorithm. Good luck!

Hacking around trying to regain functionality in spite of deliberately introduced
breakage serves no purpose IMHO.

Comment 12 Michael Schröder 2005-12-01 11:10:56 UTC

You didn't look at the "exclude" part, did you? It is about 10 lines of code 
change. It is used so that rpm doesn't try to intersect a header with itself 
(as this will always match) and filter out the matches later. It makes erase 
operations quite a bit faster. 
 
I misuse the exclude parameter in suse's rpm to identify erases and skip the 
skiplist in that case, but that's just an ugly workaround and not a clean 
solution.

Comment 13 Matthias Saou 2006-09-29 11:48:21 UTC

So this is still happeneing in (soon to be) FC6. The most problematic part
aren't the docs but all the translations. As it was mentionned in a bug marked
as a duplicate of this one, simply doing "yum remove glibc.i686" on a default
x86_64 installation results in thousands of missing files, of which docs and
translations for quite important packages like many parts of GNOME and system tools.

Comment 14 Jeff Johnson 2006-10-03 16:04:28 UTC

FWIW, I believe this problem is fixed in rpm-4.4.7 (perhaps 4.4.6, I forget) by removing
the skip list entirely (not verified, removing glibc.i686 is not possible on any machine
I have access too).

Comment 15 Laurent Rineau 2006-10-26 13:41:38 UTC

Could this bug be fixed in the FC-5 branch too?

Comment 16 Kevin Kofler 2006-10-26 20:42:49 UTC

And FC6... This should be reopened with (at least one of the) fc5/fc6/devel 
targets, because in "MODIFIED" state, it risks getting forgotten. (The problem 
of the same bugzilla being used for upstream and downstream RPM strikes again.)

Comment 17 Jeff Johnson 2006-10-27 00:06:36 UTC

FYI, "MODIFIED" is the state when downstream, not upstream has "fixed" a problem.

If you want to reopen, it means that the downstream "fix" dies not.

Comment 18 Aleksey Nogin 2007-02-13 22:47:23 UTC

Possible dup - bug 209306.

Comment 19 Jeff Johnson 2007-04-01 18:57:20 UTC

About to be fixed in rpm cvs, will be in rpm-4.4.9-0.3 when built.

UPSTREAM

Comment 20 Jeff Johnson 2007-06-23 11:57:53 UTC

CLOSED

Comment 21 Panu Matilainen 2007-07-03 12:21:05 UTC

*** Bug 223639 has been marked as a duplicate of this bug. ***

Comment 22 Red Hat Bugzilla 2007-08-21 05:19:14 UTC

User pnasrat's account has been closed

Comment 23 Panu Matilainen 2007-08-22 06:30:18 UTC

Reassigning to owner after bugzilla made a mess, sorry about the noise...

Comment 24 Panu Matilainen 2007-08-27 17:52:48 UTC

This has been fixed in rpm-4.4.2.1-1.fc6 which has now been pushed to updates.

Note You need to log in before you can comment on or make changes to this bug.