Description of problem: The installer does not remember that sqlite journaling is not in effect. These filenames are looked up much too often: 16469 /tmp/cache/anaconda-base-200710250923.x86_64/primary.sqlite-journal 1064 /tmp/cache/anaconda-base-200710250923.x86_64/filelists.sqlite-journal (the number is the number of times that the lookup failed.) Version-Release number of selected component (if applicable): anaconda-11.3.0.45-1 How reproducible: always Steps to Reproduce: 1. as soon as vtty2 becomes available: strace -f -o '|gzip' -p <pid-of-anaconda> > strace.out & (you have to provide strace on a USB flash memory device before boot, and mount the device yourself.) 2. check strace.out (63MB compressed, 1GB uncompressed) for ENOENT involving .sqlite-journal 3. Actual results: 16469 /tmp/cache/anaconda-base-200710250923.x86_64/primary.sqlite-journal 1064 /tmp/cache/anaconda-base-200710250923.x86_64/filelists.sqlite-journal Expected results: Only two failing lookups (once per filename.) Additional info:
Is there a way we can tell sqlite that there's not going to be a journal file so it should not bother looking for one over and over again?
Checking for journal existence is part of sqlite locking mechanism (see http://www.sqlite.org/lockingv3.html, "dealing with hot journals"). AFAICT the only way to avoid checking for the journal is grabbing an exclusive lock on the db (doable at least with "begin exclusive" SQL-statement) but I suppose that'd have to be done on yum-level and that in turn would probably have some other, not necessarily wanted, implications...
Actually, I don't know why we wouldn't want that. At the yum level, we already grab a global lock and don't allow other (correct :-) API users to do operations at the same time. So locking the sqlite dbs also doesn't seem unfair.
Well, a quick experiment of sticking the below sledgehammer-approach patch to yum-metadata-parser cuts down the tests for journal existence on "yum --enablerepo=updates-testing update" on my box ATM from 5488 to 7 :) --- sqlitecachec.py.orig 2007-12-22 15:58:48.000000000 +0200 +++ sqlitecachec.py 2007-12-22 16:20:44.000000000 +0200 @@ -31,6 +31,9 @@ con = sqlite.connect(filename) if sqlite.version_info[0] > 1: con.row_factory = sqlite.Row + cur = con.cursor() + cur.execute("pragma locking_mode = EXCLUSIVE") + del cur return con def getPrimary(self, location, checksum): With sufficiently large number of accesses, it starts to even show up in wallclock times: in apt-rpm usage patterns exclusive access to sqlite db cuts something like ~7% from a full cache rebuild time. Yum's usage patterns are wildly different but it can't hurt there either...
panu, thank you for the patch, I've applied it to y-m-p and I'm going to check put it in rawhide shortly.