Description of problem: This bug has been around since the beginning of time. Actually not really, but since very early versions of rpm where the split of dirnames and basenames was introduced. The way rpm generates dirnames for a build is simply put like this (in pseudocode): Generate filelist with dirs Sort filelist for each file in filelist: Split dname and bname from file Search dirindex for dname in dirnames using glibc bsearch if found, use dirindex for that bname otherwise append dname to dirnames and use new dirindex for that bname append bname to basenames Now on first glance this sounds correct. But if you look closely at how the filelist is generated you'll suddenly see that dirnames afterwards isn't actually a sorted list. Here a real life example of glibc-common.i386 from Fedora-9 release (glibc-common-2.8-3.i386.rpm) Output of rpm -qpl glibc-common-2.8-3.i386.rpm: /etc/default /etc/default/nss /usr/bin/catchsegv /usr/bin/gencat /usr/bin/getconf /usr/bin/getent /usr/bin/iconv /usr/bin/ldd /usr/bin/lddlibc4 /usr/bin/locale /usr/bin/localedef /usr/bin/rpcgen /usr/bin/sprof /usr/bin/tzselect /usr/lib/locale /usr/lib/locale/locale-archive /usr/lib/locale/locale-archive.tmpl /usr/libexec/pt_chown /usr/sbin/build-locale-archive /usr/sbin/tzdata-update /usr/sbin/zdump /usr/sbin/zic /usr/share/doc/glibc-common-2.8 /usr/share/doc/glibc-common-2.8/ChangeLog.15.bz2 /usr/share/doc/glibc-common-2.8/ChangeLog.16.bz2 /usr/share/doc/glibc-common-2.8/ChangeLog.bz2 /usr/share/doc/glibc-common-2.8/README.timezone /usr/share/doc/glibc-common-2.8/README.ufc-crypt /usr/share/doc/glibc-common-2.8/gai.conf /usr/share/i18n /usr/share/i18n/charmaps /usr/share/i18n/charmaps/ANSI_X3.110-1983.gz /usr/share/i18n/charmaps/ANSI_X3.4-1968.gz ... /usr/share/i18n/charmaps/UTF-8.gz /usr/share/i18n/charmaps/VIDEOTEX-SUPPL.gz /usr/share/i18n/charmaps/VISCII.gz /usr/share/i18n/charmaps/WINDOWS-31J.gz /usr/share/i18n/locales /usr/share/i18n/locales/POSIX /usr/share/i18n/locales/aa_DJ ... /usr/share/i18n/locales/zh_SG /usr/share/i18n/locales/zh_TW /usr/share/i18n/locales/zu_ZA /usr/share/locale /usr/share/locale/be /usr/share/locale/be/LC_MESSAGES /usr/share/locale/be/LC_MESSAGES/libc.mo /usr/share/locale/bg /usr/share/locale/bg/LC_MESSAGES /usr/share/locale/bg/LC_MESSAGES/libc.mo /usr/share/locale/ca Side by side comparison of dirnames and LANG=C sorted dirnames: /etc/ /etc/ /etc/default/ /etc/default/ /usr/bin/ /usr/bin/ /usr/lib/ /usr/lib/ /usr/lib/locale/ /usr/lib/locale/ /usr/libexec/ /usr/libexec/ /usr/sbin/ /usr/sbin/ /usr/share/doc/ /usr/share/ /usr/share/doc/glibc-common-2.8/ /usr/share/ /usr/share/ /usr/share/doc/ /usr/share/i18n/ /usr/share/doc/glibc-common-2.8/ /usr/share/i18n/charmaps/ /usr/share/i18n/ /usr/share/i18n/locales/ /usr/share/i18n/charmaps/ /usr/share/ /usr/share/i18n/locales/ /usr/share/locale/ /usr/share/locale/ As you can see, /usr/share/ is listed twice in the dirnames list. This comes from the simple fact that after the splitting of dirnames and basenames for the files the resulting dirnames aren't sorted anymore: ... /usr/share/doc/glibc-common-2.8 -> /usr/share/doc/ : glibc-common-2.8 ... /usr/share/i18n -> /usr/share/ : i18n ... /usr/share/i18n/charmaps -> /usr/share/i18n/ : charmaps ... /usr/share/i18n/charmaps/ANSI_X3.110-1983.gz -> /usr/share/i18n/charmaps/ : ANSI_X3.110-1983.gz ... /usr/share/i18n/locales -> /usr/share/i18n/ : locales ... /usr/share/i18n/locales/POSIX -> /usr/share/i18n/locales/ : POSIX ... /usr/share/locale -> /usr/share/ : locale ... /usr/share/locale/be -> /usr/share/locale/ : be And as that list isn't sorted anymore it comes as no surprise that bsearch() will eventually not find an already existing dirname in dirnames, e.g. /usr/share/ in this example. This has several obvious problems: - Duplicate dirname entries in dirnames, wasting space - If the bsearch() implementation of glibc should ever change you will get different dirnames headers for idential filelists - Any tool working with rpm headers and dirnames that assumes that dirnames is sorted and/or unique will fail miserably Version-Release number of selected component (if applicable): rpm since the change from "filenames" tag to "dirnames" and "basenames". How reproducible: Happens with lots of rpms with larger filelists. A correct implementation would probably have to do use 2 pass algorithm to first collect a list of all possible dirnames and then uniquely sort it and in the 2nd pass extract the basenames and use the now correctly working bsearch() for any implementation to find the corresponding dirname from dirnames. Thanks & regards, Phil
Phil Why not just one pass, hashing each dir, and inserting into a b-tree, spotting [and avoiding] collisions (and thus dupes)? -- Russ herrold
Hm, how would you manage the dirindexes for the basenames then? I suppose instead of using direct integers you could make it a pointer to a dirindex entry in a struct of the dirname and dirindex with which you populate the b-tree. Sample: struct dnames { int dirindex; char *dirname; struct dnames *left; struct dnames *right; }; and have dirindexes: int *dirindexes; dirindexes = malloc(sizeof(int *) * len_of_filelist); This way you can at least build the b-tree and the basenames list and the dirindexes pointer list in 1 pass and just need a quick 2nd pass to generate the final dirnames list together with the correct dirindexes in the b-tree and then save the values of the pointers in dirindexes. And as the dirnames should be a lot smaller than the filelist the time overhead should be minimal.
Let's try a reality check, shall we? The savings in removing duplicates in RPMTAG_DIRNAMES is infintessimal. Do the math ... E.g. the savings is (at least) an order of magnitude smaller than the added overhead of attaching file capabilities to *every* file so that perhaps 100+ files can be installed with capabilities. The data in RPMTAG_DIRNAMES has *never* been guaranteed to be sorted. Examples of unsorted RPMTAG_DIRNAMES are everywhere. Pursuing a sort order for RPMTAG_DIRNAMES hardly changes anything. Unsorted RPMTAG_DIRNAMES will remain in *.rpm packages forevermore. So rpm wil continue to have to sort RPMTAG_DIRNAMES if bsearch is desired. Luckily, there's no access on any critical path in rpm that would benefit from pre-sorted RPMTAG_DIRNAMES. There is no benefit -- to rpm, to developers, to Red Hat, to LSB, etc -- from sorting RPMTAG_DIRNAMES. Nor are there any forseeable benefits from a Newer! Better! Bestest! implementation that I can conceive of even after smoking a lot of crack ... Using a B-tree, or a data structure, to achieve a 1-pass generation of simultaneously sorted RPMTAG_DIRNAMES and RPMTAG_FILENAMES with a RPMTAG_DIRINDEXES lookup is just bleeping overkill. At least do the sort by embedding OCAML or using an Oracle database optimized for parallel retrieval or something manly please ...
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.