Bug 536818 - Inefficient database disk reading
Summary: Inefficient database disk reading
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: rpm
Version: 19
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Fedora Packaging Toolset Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-11-11 14:10 UTC by Zdenek Kabelac
Modified: 2015-06-23 13:32 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-05-20 11:36:26 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
callgrind of rpm -qa --nosignature --nodigest (11.11 KB, text/plain)
2009-11-12 21:29 UTC, Zdenek Kabelac
no flags Details
Demonstration C code (176.06 KB, text/plain)
2009-11-13 21:41 UTC, Zdenek Kabelac
no flags Details
rpm -qa strace (490.76 KB, text/plain)
2009-11-14 23:55 UTC, Jeff Johnson
no flags Details
db_stat -m (2.12 KB, text/plain)
2009-11-20 11:20 UTC, Zdenek Kabelac
no flags Details
db_stat -m 1st. run with bigger cache --nodigest --nosignature (2.13 KB, text/plain)
2009-11-21 00:16 UTC, Zdenek Kabelac
no flags Details
db_stat -m 2nd. run with bigger cache --nodigest --nosignature (2.13 KB, text/plain)
2009-11-21 00:17 UTC, Zdenek Kabelac
no flags Details
db_stat -m 3rd. run with bigger cache --nodigest --nosignature (2.13 KB, text/plain)
2009-11-21 00:18 UTC, Zdenek Kabelac
no flags Details
rpmdigest --alldigests /bin/bash (10.06 KB, text/plain)
2009-11-21 01:31 UTC, Jeff Johnson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1194222 0 low CLOSED [perf] dnf reinstall `rpm -qa` is slow 2021-02-22 00:41:40 UTC

Internal Links: 1194222

Description Zdenek Kabelac 2009-11-11 14:10:40 UTC
Description of problem:

RPM seem to be very very slow with database reading:

here are couple examples from my quite long time/fragmented disk usage:

Look at various timed sequences:

# echo 3 >/proc/sys/vm/drop_caches
# time rpm -qa >/dev/null
real	0m47.770s
user	0m1.870s
sys	0m2.803s

# echo 3 >/proc/sys/vm/drop_caches 
# time  ( cat /var/lib/rpm/Packages >/dev/null ; rpm -qa >/dev/null)
real	0m8.470s

same with already cached data:
# time rpm -qa >/dev/null
real	0m2.840s


After discussion with Jindra (jnovy) I've rebuilded databaze:
rpm --rebuilddb


so now again in same order
# echo 3 >/proc/sys/vm/drop_caches
# time rpm  --nosignature --nodigest -qa >/dev/null
real	0m11.517s
user	0m0.480s
sys	0m0.560s

Actually quite an improvement - though I was unware I should rebuild db from time to time to get better performance.

# echo 3 >/proc/sys/vm/drop_caches
# time  ( cat /var/lib/rpm/Packages >/dev/null ; rpm --nosignature --nodigest -qa )  >/dev/null
real	0m5.087s
user	0m0.253s
sys	0m0.413s

Hmm still more than 2x faster - some the db reading must have quite some overhead.

# time rpm --nosignature --nodigest -qa   >/dev/null
real	0m0.353s
user	0m0.257s
sys	0m0.090s

This looks quite fast and decent when data are already cached and database is freshly rebuild.

I assume someone should track why the disc reading is so slow.
(On nonfragmented drives differences are much bigger)


Version-Release number of selected component (if applicable):
rpm-4.7.1-6.fc12.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Zdenek Kabelac 2009-11-11 16:10:45 UTC
I think part of problem could be explained by bug 534146  - as my database really needed rebuild, that significantly reduced read time.

Comment 2 Jeff Johnson 2009-11-12 14:03:00 UTC
The benchmark's of --rebuilddb and -qa are hardly indicative of rpmdb database
performance since they are both sequential accesses of all data.

You have also failed to supply any metric other than wall clock time.

And you also have failed to control for the memory pool caching
(and tuning) used by Berkeley DB, only the kernel buffer cache.

Claims of "overhead" should be backed up by using callgrind which
just isn't that hard to do.

Comment 3 Zdenek Kabelac 2009-11-12 21:28:17 UTC
Let's go into details:

1.) I've mentioned --rebuilddb purely for the reason that rpm over time get much much slower - I've not measured time of --rebuilddb. Obviously either db4 or rpm is seriously fragmenting data over months of usage and gets slower. Is it documented I should run 'rebuildb' regularly to get decent performance ??

2.) My metric is perfectly valid and ideally you should try that yourself. I think if there is more the 100% speed up just by using cat - then either it should be configurable option for rpm (so it would do this itself - i.e. simple wrapper script could handle that) or the rpm read capabilities should be enhanced - 1st option is probably more simpler to implement, 2nd choise is preferable longterm solution.

Also I believe you are probably slightly missing what I want to point out in this bugzilla.

I've never read anything about tunning libdb for improving rpm performance and I'm not quite sure why should I - why isn't the default for rpm to go as fast as possible - why the user should tune libdb to get decent rpm speed?

3.) My "overhead" term here does not mean rpm 'burns' CPU - but actually nearly sleeps and waits all the time - because of some were low performing file access pattern - it actually remainds me usage of mmap without readahead - but that's just wild guess - I do not know rpm sources (nor libdb4)

4.) I'll attach callgrind - but I'm not really sure it helps here - I may provide probably output of oprofile, perf, or other tools you would like - I could even time track disk access though this will be probably quite long trace if you want. But IMHO why don't you try yourself - I've observed same behavior on many RHEL and Fedora machine around me - so it's definitely not the problem of my personal laptop.

5.) Just a side note - when using rpm without --nodigest options - it reveals very high load from SHA515 invocation - so again my simple 'dumb' time test reveals:

sha512sum /var/lib/rpm/Packages: - 0.46s  (when cached)
cat >/dev/null   0.04s (when cached)

but rpm with digesting takes  1.1s
without digest (--nodigest --nosignature) 0.3s 

This gives 0.8s difference - it looks like sha512sum would checksum nearly twice my 80MB Package file within this period of time - so again I may ask a simple question - what is calculated so heavily inside rpm nss-softokn library ?

6.) --nodigest reveals a lot of time being spent rpm-4.7.1/lib/header.c:dataLength - just by a plain look there it appears to me, that rpm actually spends major time for scanning strings inside binary file - why not store string size within the string - or use some indexes for this ?

I'm not sure if that's the reason why the yum is so slow - but I think it's part of the puzzle....

If you need more details let me know.

Comment 4 Zdenek Kabelac 2009-11-12 21:29:35 UTC
Created attachment 369328 [details]
callgrind of rpm -qa --nosignature --nodigest

Comment 5 Jeff Johnson 2009-11-12 22:17:21 UTC
Now you're talking ...

1) No one (certainly not me) said your reported results were invalid. I don't think
you've analyzed the results correctly (see your callgrind measurements, dataLength
is not I/O related per se). And its premature to advertise a performance "fix" like running
--rebuilddb periodically when the performance problem is poorly characterized and understood.

2) I did not say your metric was "invalid", read what I wrote. I have tried the results myself.
I run callgrind on rpm at least weekly and already know (and have fixed) many performance
problems in RPM. I did point out that there are other issues than I/O, and suggested callgrind,
where I/O overhead is _NOT_ the issue that you have measured. I did point out that you hav
 another level of caching that needs to be controlled for useful I/O metrics.

And certainly having rpm run as fast is possible is my goal, I have no idea
where you got the idea that any other goal is preferred.

3) If "sleeps and waits" is the issue (its not afaik), that was not at all clear from your
wall clock benchmarks. And I most definitely know both rpm and db4 sources, in fact I have
achieved a measured (w callgrind) 14.6x performance increase @rpm5.org  by running careful
(better than wallclock) benchmarks. But that's not relevant here.

4) Stare at the numero uno piggy in the callgrind spewage. When you start to realize that
serialization and marshalling is the issue, then you will begin to understand the
performance issue.

5) I'm not sure how SHA512 is related other than through signatures, where --nosignature
is the disabler. In all cases, verifying digests on header blob's is overhead unrelated
to I/O performance and must be controlled for.

6) yum performance depends on many factors unrelated to rpm. But run benchmarks
on yum if you wish to understand yum performance problems. Without measurements,
feel free to claim anything you wish about the cause of yum's pathetic performance,
your opinion is as good or bad as anyone else's.

And certainly I have no argument with you, nor anyone willing to run callgrind to
verify issues ;-)

Comment 6 Zdenek Kabelac 2009-11-13 09:19:38 UTC
(In reply to comment #5)
> Now you're talking ...
> 
> 1) No one (certainly not me) said your reported results were invalid. I don't
> think
> you've analyzed the results correctly (see your callgrind measurements,
> dataLength
> is not I/O related per se). And its premature to advertise a performance "fix"
> like running
> --rebuilddb periodically when the performance problem is poorly characterized
> and understood.

I've not done any --rebuilddb analysis - I've just wrote:

rpm -qa  time before rebuild was 47 seconds - after rebuild 12 seconds.
I've not saved older dataset for this analysis as I've not expected any problems. So it's just the fact that speed of my machine with --rebuildb improved ~4 times.


> 
> 2) I did not say your metric was "invalid", read what I wrote. I have tried the
> results myself.
> I run callgrind on rpm at least weekly and already know (and have fixed) many
> performance
> problems in RPM. I did point out that there are other issues than I/O, and
> suggested callgrind,
> where I/O overhead is _NOT_ the issue that you have measured. I did point out
> that you hav
>  another level of caching that needs to be controlled for useful I/O metrics.

Do you flush disk buffers within your tests ?

Time when all data are buffered in memory is 'almost' acceptable (though there are still some reserves - but there might be limits from DB format, which is probably nontrivial to change)

My report is mainly about the moment when there are no data in memory - thus trivial query for installed package takes 12 seconds.

> 3) If "sleeps and waits" is the issue (its not afaik), that was not at all
> clear from your
> wall clock benchmarks. And I most definitely know both rpm and db4 sources, in
> fact I have
> achieved a measured (w callgrind) 14.6x performance increase @rpm5.org  by
> running careful
> (better than wallclock) benchmarks. But that's not relevant here.

Is this rpm 4.7 going to be replaced by rpm 5 - or is it unrelated project to Fedora's rpm package?


> 4) Stare at the numero uno piggy in the callgrind spewage. When you start to
> realize that
> serialization and marshalling is the issue, then you will begin to understand
> the
> performance issue.

As I've said - callgrind will not show I/O stalls.

> 5) I'm not sure how SHA512 is related other than through signatures, where
> --nosignature
> is the disabler. In all cases, verifying digests on header blob's is overhead
> unrelated
> to I/O performance and must be controlled for.

Sure it's not related to slow disk reading - it's just what callgrind shows - and I've been just curious how much memory chunks needs to be checksummed for every simple rpm command - maybe it might be effective to use a short term daemon, to speed up repeated invocation (if daemon keeps lock on database)

> 6) yum performance depends on many factors unrelated to rpm. But run benchmarks
> on yum if you wish to understand yum performance problems. Without
> measurements,
> feel free to claim anything you wish about the cause of yum's pathetic
> performance,
> your opinion is as good or bad as anyone else's.

Yeah - sure python is much bigger CPU eater in this case - but rpm is not negligible either...

Comment 7 Jeff Johnson 2009-11-13 13:28:16 UTC
I did not mean to imply you have done --rebuilddb benchchmarks. My statement is
true no matter what: neither -qa _NOR_ --rebuilddb are proper measurements
of rpmdb "performance" because the access is sequential. So any claim of "inefficient"
rpmdb I/O will only apply narrowly and incompletely.

> Is this rpm 4.7 going to be replaced by rpm 5 - or is it unrelated project to
> Fedora's rpm package?

I quote myself "irrelevant". The problems were the same as what is in your callgrind
spewage because the code was largely the same. But feel free to not look for fixes
in projects that are unrelated to Fedora. The measured callgrind speed up after fixing
dataLength and other issues was 14.6x @rpm5.org.

> As I've said - callgrind will not show I/O stalls.

Yes. Which is why I use callgrind to measure I/O "overhead" claims as here.

> Do you flush disk buffers within your tests ?
Callgrind measurements are largely immune to buffer state as well. Yes I flush buffers
where needed or appropriate.

Are you claiming I/O stalls or not? You have not disclosed any measurement
that directly shows stalls, only pointed out the possibility afaict. strace tstamps
would be convincing. I have not seen stalls, or behavior indicative of I/O waits,
while benchmarking RPM.


> Yeah - sure python is much bigger CPU eater in this case - but rpm is not
> negligible either...  
LVM is not exactly svelte either. Reasoning from yum->python->lvm peformance
to claimed "inefficient database disk reading" for RPM based on the largeness
of the code base is no measurement I understand.

Comment 8 Zdenek Kabelac 2009-11-13 14:22:21 UTC
(In reply to comment #7)
> I did not mean to imply you have done --rebuilddb benchchmarks. My statement is
> true no matter what: neither -qa _NOR_ --rebuilddb are proper measurements
> of rpmdb "performance" because the access is sequential. So any claim of
> "inefficient"
> rpmdb I/O will only apply narrowly and incompletely.

Well I still don't get what you mean by this -

I've reported problem I'm experiencing as a regular daily user.

I see a very slow behavior and I report this as a problem.

It could be easily checked by anyone just by repeating steps in my first post.

If you think this bugzilla title is not correct - propose a better name.


> Are you claiming I/O stalls or not? You have not disclosed any measurement
> that directly shows stalls, only pointed out the possibility afaict. strace
> tstamps
> would be convincing. I have not seen stalls, or behavior indicative of I/O
> waits,
> while benchmarking RPM.

Statistics cached rpm

perf stat --  rpm -qa >/dev/null

 Performance counter stats for 'rpm -qa':

    1119.839601  task-clock-msecs         #      0.994 CPUs 
            137  context-switches         #      0.000 M/sec
              0  CPU-migrations           #      0.000 M/sec
           3667  page-faults              #      0.003 M/sec
     2453079208  cycles                   #   2190.563 M/sec
     4382415288  instructions             #      1.786 IPC  
       27935906  cache-references         #     24.946 M/sec
         408832  cache-misses             #      0.365 M/sec

    1.126486505  seconds time elapsed

Statistics uncached 

 Performance counter stats for 'rpm -qa':

    2073.234535  task-clock-msecs         #      0.179 CPUs 
           2443  context-switches         #      0.001 M/sec
              1  CPU-migrations           #      0.000 M/sec
           3666  page-faults              #      0.002 M/sec
     2830241244  cycles                   #   1365.133 M/sec
     4833644897  instructions             #      1.708 IPC  
       37085859  cache-references         #     17.888 M/sec
         543550  cache-misses             #      0.262 M/sec

   11.552013713  seconds time elapsed

Statistics cached without digest 

 Performance counter stats for 'rpm -qa --nodigest --nosignature':

     355.230563  task-clock-msecs         #      0.990 CPUs 
             41  context-switches         #      0.000 M/sec
              1  CPU-migrations           #      0.000 M/sec
           3547  page-faults              #      0.010 M/sec
      778136260  cycles                   #   2190.510 M/sec
      845519636  instructions             #      1.087 IPC  
       20456887  cache-references         #     57.588 M/sec
         353068  cache-misses             #      0.994 M/sec

    0.358798949  seconds time elapsed

Statistics uncached without digest

 Performance counter stats for 'rpm -qa --nodigest --nosignature':

    1160.226932  task-clock-msecs         #      0.099 CPUs 
           2322  context-switches         #      0.002 M/sec
              3  CPU-migrations           #      0.000 M/sec
           3546  page-faults              #      0.003 M/sec
     1111501957  cycles                   #    958.004 M/sec
     1267144821  instructions             #      1.140 IPC  
       28076070  cache-references         #     24.199 M/sec
         478383  cache-misses             #      0.412 M/sec

   11.668546646  seconds time elapsed

---
Short sample from strace -tt  cached - difference .003302s

15:12:49.660754 write(1, "dejavu-lgc-sans-mono-fonts-2.30-"..., 46) = 46
15:12:49.660860 pread(3, "\0\0\0\0\1\0\0\0\3142\0\0\0\0\0\0\3152\0\0\1\0\346\17\0\7\0\0\0J\0\1"..., 4096, 53264384) = 4096
...
15:12:49.663573 pread(3, "\0\0\0\0\1\0\0\0\3452\0\0\3442\0\0\0\0\0\0\1\0\252\f\0\7\0\0\0\0\0\0"..., 4096, 53366784) = 4096
15:12:49.663865 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
15:12:49.663953 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
15:12:49.664056 write(1, "gedit-2.28.0-1.fc12.x86_64\n", 27) = 27

----
Short sample from strace -tt  uncached - difference .013265s

15:13:27.990154 write(1, "dejavu-lgc-sans-mono-fonts-2.30-"..., 46) = 46
..
15:13:28.003419 write(1, "gedit-2.28.0-1.fc12.x86_64\n", 27) = 27

> 
> 
> > Yeah - sure python is much bigger CPU eater in this case - but rpm is not
> > negligible either...  
> LVM is not exactly svelte either. Reasoning from yum->python->lvm peformance
> to claimed "inefficient database disk reading" for RPM based on the largeness
> of the code base is no measurement I understand.  

Not exactly sure what do you mean here by LVM....

Comment 9 Jeff Johnson 2009-11-13 14:57:52 UTC
So you are seeing additional "overhead" (including context switches
and cache misses) in what you are identifying as "uncached".

(aside)
Oddly, the --nodigest case is slower than if digests are calculated
for the "uncached" case. Likely from statistical noise ...

But I do not see any immediately apparent I/O stalls and waits anomaly
in the measured behavior. Sure there's waits in the measurement, all "uncached" I/O will
wait. But if I've misinterpreted the perf data, please point that out. What I
lack is some comparison point to calibrate my expectations so that
I can notice when some measurement is not within a typical range.

What _IS_ different about rpmdb I/O (from other I/O cases) is that RPM (and Berkeley DB)
goes to some lengths to ensure that data is written to disk.

While "rpm -qa" is largely a read operation, populating the memory pool cache
operation is a write operation. So you may wish to also control for the memory
pool cache by doing
    rm -f /var/lib/rpm/__db*
if you really want to measure the components of "rpm -qa" I/O more carefully.

You are making claims that rpm --rebuilddb is needed as part of normal maintenance
based on what you see as a normal user with a stop watch. The behavior needs
to be more carefully analyzed (imho) before that claim is justified (imho). Whether
it is true (or not) that --rebuilddb is needed I cannot yet say. But I know from a decade
of living without --rebuilddb for routine rpmdb performance maintenance that there's some other
piece of this puzzle that needs to be understood first.

Yes, the title "inefficient database disk reading" is misleading. *shrug* The behavior
needs to better characterized and understood first.

Comment 10 Zdenek Kabelac 2009-11-13 15:16:46 UTC
Yes - the 0.1s difference in 11s measurement is the system noise - the time of sha512 summing is hidden in I/O stalls (waits for cache page in)

I've forget to add these missing perf stat:

uncached cat

 Performance counter stats for 'cat /var/lib/rpm/Packages':

     123.237846  task-clock-msecs         #      0.035 CPUs 
           1131  context-switches         #      0.009 M/sec
              1  CPU-migrations           #      0.000 M/sec
            160  page-faults              #      0.001 M/sec
      211787797  cycles                   #   1718.529 M/sec
      191481355  instructions             #      0.904 IPC  
        3519474  cache-references         #     28.558 M/sec
          61659  cache-misses             #      0.500 M/sec

    3.478775266  seconds time elapsed

cached cat 

 Performance counter stats for 'cat /var/lib/rpm/Packages':

      49.098802  task-clock-msecs         #      0.848 CPUs 
              5  context-switches         #      0.000 M/sec
              0  CPU-migrations           #      0.000 M/sec
            159  page-faults              #      0.003 M/sec
      107537451  cycles                   #   2190.226 M/sec
       54117805  instructions             #      0.503 IPC  
        1520101  cache-references         #     30.960 M/sec
          62342  cache-misses             #      1.270 M/sec

    0.057930420  seconds time elapsed

Where you could see much higher throughput.

Also please note - I'm not doing code analysis - I'm bug reporter ;)

Comment 11 Jeff Johnson 2009-11-13 15:29:36 UTC
Cat(1) measurements help, but as I tried to point out, this is a database,
and databases have different needs and different I/O patterns than
cat(1) does. And any solution (once the problem is characterized) will likely
be rather different too.

You are hardly a typical user, most of whom have no idea
what callgrind is or does, or what an I/O stall issue is. Please
stop pretending that you are "just a user" with a stop watch. ;-)

(aside)
Note that there has been a historical issue with Berkeley DB db-4.1.x
(iirc, been years) with I/O stalls and waits. The issue was tied to
select vs poll behavior. There's a bugzilla report and a perl script
reproducer generating 5000 records in Berkeley DB on (iirc) approx a RHEL4 time frame.

Which is why I ask for details like
   Can you document I/O stalls & waits as part of a performance problem?

Not also that I have never heard or seen an issue with rpmdb performance
degradation over time until now. Which means that something (other than
RPM and Berkeley DB which haven't changed much at all) is likely
relevant to what you are seeing.

Comment 12 Zdenek Kabelac 2009-11-13 16:03:27 UTC
(In reply to comment #11)
> Cat(1) measurements help, but as I tried to point out, this is a database,
> and databases have different needs and different I/O patterns than
> cat(1) does. And any solution (once the problem is characterized) will likely
> be rather different too.
> 
> You are hardly a typical user, most of whom have no idea
> what callgrind is or does, or what an I/O stall issue is. Please
> stop pretending that you are "just a user" with a stop watch. ;-)

It's not about pretending - its about having many other tasks in my hands...

> (aside)
> Note that there has been a historical issue with Berkeley DB db-4.1.x
> (iirc, been years) with I/O stalls and waits. The issue was tied to
> select vs poll behavior. There's a bugzilla report and a perl script
> reproducer generating 5000 records in Berkeley DB on (iirc) approx a RHEL4 time
> frame.
> 
> Which is why I ask for details like
>    Can you document I/O stalls & waits as part of a performance problem?

Which I could do only with local recompilation with a special debug hinting.
I've simply thought that package developers have better idea where the problem can be burred.

Is this behavior (from comment 1) similar with rpm5 or is this onlt rpm 4.X thing ?
 
> Not also that I have never heard or seen an issue with rpmdb performance
> degradation over time until now. Which means that something (other than
> RPM and Berkeley DB which haven't changed much at all) is likely
> relevant to what you are seeing.  

Maybe users do not bother to report :)?
Maybe they consider those 12 seconds query time still acceptable?

But as I said - I've tested it on several machines and experienced approximately same results.

I should probably note that my machine is F8 installation continuously upgraded and usually its quite fresh Rawhide. In the past I've been forced to run --rebuilddb just once after some serious problem. So maybe during the live cycle of Fedora release the database stays 'fast' enough?

Comment 13 Jeff Johnson 2009-11-13 16:24:32 UTC
No time perfectly understood. Let's not argue ;-)

I have not ever seen performance degradation over time
doing daily development (and I rarely do --rebuilddb).

But when I do measurements, I typically control for behavior
within the measurement, which may have missed I/O
degradation over time.

We both agree that after --rebuilddb (and controlling for
--nodigest and --nosignature) that rpm -qa performance is reasonable.

The issue for me is solely whether --rebuilddb is routinely
needed for maintenance (its not afaik yesterday).

If there is performance degradation, it can (will in code that I develop) likely be avoided by
changing the implementation.

The 12 (or 18 or 48 or ...) sec behavior for rpm -qa can be lived
with no matter what. Its not like "rpm -qa" hangs or takes an hour.

There are definitely rpm -qa speed-ups available by avoiding
loading header's entirely. The entire "rpm -qa" spewage
could be cached with a tstamp and just blasted at the
user whenever needed. But that solution just bandaid's
deeper problems that may need to be analyzed and solved
more carefully.

(aside)
Note that rpmdb performance @rpm5.org is now transactionally
protected to have ACID behavior and /var/lib/rpm/Packages
is going to be eliminated and replaced with an entirely different
store and schema than what is used in rpm4.

So far the I/O performance is approx (within a factor of 2x) status quo ante.
But -qa will be reworked to be faster so that I don't have to sort
out --nodigest --nosignature and other factors when analyzing problem
reports.

Comment 14 Jeff Johnson 2009-11-13 20:36:34 UTC
After looking around a bit for time degradation issues with Berkeley DB
(there are no visible reports), this thought occurs to me:

There's an optimization for DB_HASH access added to db-4.6.x.
As always, Berkeley DB provides backward compatibility but
perhaps the  older DB_HASH layout is no longer optimal.
So the (hypothesized) result of doing --rebuilddb changes the
behavior to improve performance, but is not necessarily
time degraded, or fragmentation related.

I have no easy way to confirm the hypothesis. But RPM use
of Berkeley DB is quite straightforward and hasn't changed since forever.

And I expect -- because of wide usage -- that a performance time degradation with
Berkeley DB I/O performance would have been reported _SOMEWHERE_.

Does that sound consistent with what you are seeing?

Comment 15 Zdenek Kabelac 2009-11-13 21:40:40 UTC
Well, just from plain simple look into the strace and pread's nearly 'random'
seek positions I think it's most probably pretty clear the only reason for this
very slow access.

In the attachment you will find a C program - it contains the array of offsets
used directly from strace pread.

I've spent some time writing it - thus I hope it will be usable.

It demonstrates all things.

Enjoy

Comment 16 Zdenek Kabelac 2009-11-13 21:41:21 UTC
Created attachment 369493 [details]
Demonstration C code

Comment 17 Zdenek Kabelac 2009-11-13 21:44:47 UTC
Hmm for your comment 14  -  I assume there should be some testing code available that simulates heavy rpm usage over the long period of time - i.e. installing, removing packages with lots of files ?

If it's not - it would be probably wise idea to create such test script ?

Comment 18 Jeff Johnson 2009-11-13 21:59:16 UTC
Sure a simulated "heavy load" is the only credible test
for whether there is a degradation in rpmdb I/O behavior over time.

There are plain and simply too many other factors, like
file system type, kernel version, Berkeley DB version,
RPM implementation, etc etc to compare results meaningfully.

I've already asked privately for opinions re "time degradation"
from users who I know have long running rpm implementations
and whose opinion I trust. (I trust your report and methodology
but the analysis is a bit premature so far. Nothing personal, jmho ;-)

And a test script isn't that hard, I have several.

Still I'm going to be surprised if there is performance degradation
over time with RPM+BDB.

Degradation is a very different issue than whether  rpmdb
I/O is optimal with "rpm -qa", I already know zillions of
inefficiencies in rpmdb I/O handling, and have been actively
addressing those issues since db-4.8.24 was released in September.

Comment 19 Zdenek Kabelac 2009-11-13 22:10:49 UTC
Well - the issue doesn't need to be actually degradation - it eventually might have been some 'buggy' version of rpm or libdb4 in past. I've not been solving the issue - until I got curious why it takes those ~47sec for a simple query.

I'm mostly using yum - which is slow by design - thus I've not been checking in depth rpm.

Comment 20 Jeff Johnson 2009-11-13 23:01:02 UTC
We both agree that 47sec for "rpm -qa" is too long.

What isn't clear yet is the root cause. No matter what,
we both agree the --rebuilddb leads to better performance.

But degradation is my primary concern. If Berkeley DB
degrades over time, then a "fix" should be attempted.

BTW, tnx for the block trace. I can almost see some patterns
that I can map into Berkeley DB and RPM code.

FYI, you need to control for whether "rpm -qa" is run as root (or not).
If run as root, then a dbenv is opened, and there is a memory pool
cache that is interposed. Your programs lists blocks solely
for Packages afaict. There should be additional I/O occurring to the
memory pool as well.

With a sequential access like "rpm -qa", cache blow out is inevitable.

There's also a page size tunable, rpm traditionally has used
512b pages because that optimizes locking granularity,
locks are per-page, so large pages are likelier to lead to
lock contention because large pages are likelier to overlap.

There was no significant affect of changing the mempool page size on
I/O performance when I looked (not recently, but I believe ffesti
saw similar "No effect." within the last year).

There is an issue of rpmdb fragmentation on ext4 reported by sandeen.
In empty chroot's, an rpmdb is the most fragmented file. When
one considers that an rpmdb is just about the only database
in use in chroot installs, and is certainly the most active
file(s) during a chroot install, the presence of fragmentation should
surprise no one.

Whether fallocate (or equivalent) could/should be used to address
rpmdb fragmentation is not yet clear. Certainly fallocate would reduce
rpmdb fragmentation, but there are other issues, such as (on linux) fallocate
is not  available on older systems and introduces a run time dependency
on both kernel and glibc implementations; and (on non-linux) that fallocate
may not be reliably available even if in POSIX. I have an implementation
half done in RPM, but there's no reason to rush to fallocate until there
are clear performance reasons to do so. So far its just the presence
of fragmentation, not any reliable measurements of improved/degraded
 performance, that are being reported.

Comment 21 Zdenek Kabelac 2009-11-14 18:11:48 UTC
(In reply to comment #20)

> What isn't clear yet is the root cause. No matter what,
> we both agree the --rebuilddb leads to better performance.
> 
> But degradation is my primary concern. If Berkeley DB
> degrades over time, then a "fix" should be attempted.

Well I've no idea how DB works here with my ext3 filesystem.

My system partitions is around ~8GB and usually has around 0.7G free space.

But as you like plain numbers and I like my simple 'wall' clock experiments,
lets dig a bit deeper here :).

I've 1GB 'play' partition for experiments (usually for lvm :)).
Thus completely fresh format of ext2/ext3/ext4 fs was used for the following test.

`hdparm -t`  gives 30MB/s for this test  partition.

using nongfragmented 80MB file

1. pread()     ~7.5s   0.06s
2. mmap()      ~5.8s   0.10s
3. mmap() ADV  ~2.8s   0.07s
4  `cat`       ~2.8s   0.06s

Timing was nearly same for all extX.
(Note - `cat` is slighly faster as it is not doing memcpy - thus mmap with just reading the page has the same or better speed - I may provide updated source file to attachment if needed)

So obviously pread() is by far the worst way for this jobs.

Let's continue with experiments.

Weird fragmentated DB's file /var/lib/rpm/Packages on my system drive results in actually pretty slow read speed - as plain `cp` of this Packages to Packages.copy file reveals this:

(`hdparm -t` of this system drive gives 43MB/s)

original uncached `cat` of Packages  3.7s
copied uncached  `cat` of Packages.copy  - 2.0s  - wow 55% faster.

So all this could show some possible updates in code.

1. pread() could be replaced with mmap() - probably pretty easy change - I think it might be optional - i.e. `rpm --use-mmap` - if there would be no problem and user would be happy - it might be switched as default. MMap should probably also result in significantly smaller memory foot print of the application.

2. from time to time probably full copy of Packages file should be done - to defragment strange file layout in filesystem - note that this Package files is just 1day old from fresh rebuilddb and only few packages were modified.

3. hmm just wondering if using plain ASCII small files could get any worse then this Berkeley DB.  (i.e. /var/lib/dpkg/info way)

4. Wait till all users switch to SSD - and apply only 1.) to safe memory ;)

> 
> BTW, tnx for the block trace. I can almost see some patterns
> that I can map into Berkeley DB and RPM code.
> 
> FYI, you need to control for whether "rpm -qa" is run as root (or not).
> If run as root, then a dbenv is opened, and there is a memory pool
> cache that is interposed. Your programs lists blocks solely
> for Packages afaict. There should be additional I/O occurring to the
> memory pool as well.

Yep - 4 appearances of pread() seems to be from different file descriptor in my strace. But they would have probably very small impact on the total time.

Just another simple strace check of pread() appearance for simple small rpm file installation - it looks like there are some ~10000 pread() calls as well - which is slightly less than for -qa  - but still pretty high number.


> 
> With a sequential access like "rpm -qa", cache blow out is inevitable.

rpm -i  seems to be doing not so much different job after all...

> There is an issue of rpmdb fragmentation on ext4 reported by sandeen.

Yep - revealed by my plain simple experiment as well and I'm running ext3 

> In empty chroot's, an rpmdb is the most fragmented 
> Whether fallocate (or equivalent) could/should be used to address
> rpmdb fragmentation is not yet clear. Certainly fallocate would reduce

I think large DB files should be probably split to smaller pieces according to their usage. Thus for the most common task only small number of data would need to be loaded. For some hardly every used commands like  rpm -q --changelog  (which I'm still wondering why they are part of DB and not stored somewhere in /usr/share/doc/pkg/changelog file) could be loaded from a separate DB.

Anyway take this only just as my ideas - nothing you should probably worry about.  Maybe there is a way how to improve Berkeley DB to handle all of this still in one file...

Comment 22 Jeff Johnson 2009-11-14 18:44:37 UTC
> But as you like plain numbers and I like my simple 'wall' clock experiments,
> lets dig a bit deeper here :).

I like KISS. The only issue with wall clock is that it is sometime difficult to analyze and compare
Otherwise I like wallclock a lot ...

> 1. pread() could be replaced with mmap()
Sure. But that's a deep change to uisng Berkeley DB, which has numerous consequences.
Likely better is just to use the existing mpool handling in BDB, with a
cache size appropriate for the data being cached, with double buffering
removed using O_DIRECT. But there's a balance that will be needed between
transactionally protected data and I/O performance, the balance can only
come with experience.

> 2. from time to time probably full copy of Packages file should be done
Easy to say but very hard to automate a ~100Mb copy with bullet-proofing.
Lusers need to design their own backups.

FYI: Packages (and the header blob within) are gonna be eliminated @rpm5.org
by the end of the year and replaced with mmap(2) onto a secondary store of /some/path/*.rpm.
The issue for trnsactional logging is reducing the size of the logs. But that
also means that all metadata must be stored in indices so that headerLoad()
is avoided.

> 3. hmm just wondering if using plain ASCII small files
Please do the math. Hash/btree access is _ALWAYS_ superior
to flat files for anything but toy in-memory cases. Or Oracle would not exist.

> 4. Wait till all users switch to SSD
Well NAND has its own I/O performance (and failure) issues, rather different than DASD. See
bz #529948 if you want to see s-l-o-o-w rpmdb perfomance.

> I think large DB files should be probably split to smaller pieces ...

Yup. Possible with db-4.8.24, not older versions.

> Anyway take this only just as my ideas - nothing you should probably worry
What me worry? ;-) All very sane suggestions, actively being implemented @rpm5.org.

E.g. signature/digest verification of header blob's (which are already PROT_READ
protected) was removed this morning.

Comment 23 Zdenek Kabelac 2009-11-14 21:47:05 UTC
(In reply to comment #22)

> > 1. pread() could be replaced with mmap()
> Sure. But that's a deep change to uisng Berkeley DB, which has numerous
> consequences.

Going from  pread() -> mmap() should be fairly simplier than switching to anything like memory pools.

Easy way could be to mmap DB at the beging, and instead of pread() just call memcpy(). Second step would to throw away those calls and access memory directly.


> Likely better is just to use the existing mpool handling in BDB, with a
> cache size appropriate for the data being cached, with double buffering
> removed using O_DIRECT. But there's a balance that will be needed between

O_DIRECT is performance killer unless you think you know much better access pattern than Linux kernel could guess.


> > 3. hmm just wondering if using plain ASCII small files
> Please do the math. Hash/btree access is _ALWAYS_ superior
> to flat files for anything but toy in-memory cases. Or Oracle would not exist.

You have those btrees already in filesystem - so the math still works.
And with btrfs you have there even Oracle ;)

Comment 24 Zdenek Kabelac 2009-11-14 21:52:11 UTC
(In reply to comment #22)

> > 2. from time to time probably full copy of Packages file should be done
> Easy to say but very hard to automate a ~100Mb copy with bullet-proofing.
> Lusers need to design their own backups.

Forget to add one information:

I've just checked that --rebuilddb already creates such 'weird' slow file.
Are there used files with holes ?
I think it's actually pretty nontrivial to make it so much fragment.

As a quick hack - maybe even just plain copy right after --rebuilddb could make things better for a lot of people....

Comment 25 Jeff Johnson 2009-11-14 22:10:48 UTC
We are tuning different aspectsof I/O perfomance.

E.g. Create a /var/lib/rpm/DB_CONFIG file with these 2 lines:
    set_cachesize           0 67108864 4
    set_mp_mmapsize         268435456
1st line permits 4 cache files of 64Mb
2nd line pemits up to 256Mb to be memory mapped.

Prime the cache by running as root

    rpm -qa

Show cache hits by doing
    cd /var/lib/rpm
    /usr/lib/rpm/db_stat -m

Adding -Z will rezero the counters.

The above is a rather different I/O trace than you have reported with strace.

Yes, O_DIRECT with Berkeley DB likely knows better than the kernel ;-)

And "quick hacks" tend to get forgotten. You have no idea how many "quick hacks"
there are in RPM that noone has a clue about.

I have no problem whatsoever using available I/O performance in a linux
kernel. But tuning a database != a tuna fish.

Comment 26 Jeff Johnson 2009-11-14 23:55:25 UTC
Created attachment 369559 [details]
rpm -qa strace 

Note the complete absence of pread(2) or any I/O from Packages
with the DB_CONFIG lines as described.

Comment 27 Zdenek Kabelac 2009-11-16 10:49:26 UTC
(In reply to comment #26)
> Created an attachment (id=369559) [details]
> rpm -qa strace 
> 
> Note the complete absence of pread(2) or any I/O from Packages
> with the DB_CONFIG lines as described.  


Good now I can see your trace works through mmap while I could not get mmap working even with 1GB mp_mmapsize.

It looks like Jindra is going to recheck the code - he's found some comments in sources about AC_FUNC_MMAP from 2007...

I'll wait for a package rebuild.

I like that we are finally moving somewhere :)

Comment 28 Jeff Johnson 2009-11-16 12:53:12 UTC
Sadly, rpm has been deliberately crippled to avoid mmap(2)
when the mapped region is larger than a limit (that is too small imho).

(aside)
The issue way back when was to avoid the appearance of
large numbers in top(1) displays that bothered lusers who
reported "bugs". And then there's sparse /var/log/lastlog
which has all sorts of hilarious hysteria with RPM. There's
hardly any need to package /var/log/lastlog.

(another aside)
Prelinking (through a pipe to prelink --undo) has never been implemented correctly
in RPM either. The need to verify a digest on unprelinked libraries forces prelinking
detection and a prelink --undo helper rather than using mmap(2) directly to calculate
library file digests. While prelink could do the digest check for --md5/--sha1, the
recent change to SHA256 for file digests causes RPM to use I/O rather than mmpa(2).

But the important rpmdb I/O questions for me are:
    1) Does Berkeley DB performance degrade over time?
    2) Is there a demonstrable/measurable need for fallocate? Sure
    unfragmented files have less overhead than fragmented files. But
    the performance gain needs to be balanced against the implementation cost.
    I have yet to hear of any credible performance gain measurements for
    "fragmented" rpmdb's, no one has bothered.

Comment 29 Bug Zapper 2009-11-16 15:26:38 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 30 Zdenek Kabelac 2009-11-16 20:40:38 UTC
(In reply to comment #28)
> Sadly, rpm has been deliberately crippled to avoid mmap(2)
> when the mapped region is larger than a limit (that is too small imho).

Well not really sure if there some 'deliberately' crippling - but there is surely a lack of testing and performance checking. 

> (aside)
> The issue way back when was to avoid the appearance of
> large numbers in top(1) displays that bothered lusers who
> reported "bugs". And then there's sparse /var/log/lastlog
> which has all sorts of hilarious hysteria with RPM. There's
> hardly any need to package /var/log/lastlog.

How the lastlog gets connected with rpm is probably beyond my imagination. But the top should be actually showing much better numbers as RS memory should be actually smaller with mmap (when written in a right way) and only VIRT size gets bigger, but that should not bother any user ;)

> 
> (another aside)
> Prelinking (through a pipe to prelink --undo) has never been implemented
> correctly

I've prelinking disabled, because it certainly must eat more CPU/Watts to compute dependencies, that it could ever safe during the life time of those valid results between updates... ;)

> But the important rpmdb I/O questions for me are:
>     1) Does Berkeley DB performance degrade over time?
>     2) Is there a demonstrable/measurable need for fallocate? Sure
>     unfragmented files have less overhead than fragmented files. But
>     the performance gain needs to be balanced against the implementation cost.
>     I have yet to hear of any credible performance gain measurements for
>     "fragmented" rpmdb's, no one has bothered.  

Again I assume this is something for internal testing of rpm tool itself to simulate longterm heavy usage and check whether performance goes down.
As mentioned in commnent #2 I was probably not alone with this problem, but I've not made copy of rpm dir before rebuild :( so I could hardly provide anything better than just observation, that even after --rebuilddb file reading has about 50% speed of 'defragmented' file.

Also when you said BDB could know better then Linux how to access data - that could be only true in case you would use separate partition for such DB file. But when such data are stored on a filesystem like ext3 or any other advance fs which has its own fragmentation I doubt BDB could have any decent algorithm to handle this case.

And I think I should also probably mention that few upgrade were not properly finished because of 'various' rawhide faults - but usually it was handled properly via yum-complete-transaction and package-clean --dupes.
But eventually this might have lead over the time to the increased size because of keeping such invalid transactions still stored in DB  -  again just a wild guess...?

Comment 31 Jeff Johnson 2009-11-16 21:06:09 UTC
> Well not really sure if there some 'deliberately' crippling - but there is
> surely a lack of testing and performance checking. 

Surely you exclude present company. Have fun!

> I've prelinking disabled, because it certainly must eat more CPU/Watts to
> compute dependencies, that it could ever safe during the life time of those
> valid results between updates... ;)

If RPM had any choice, prelink would never have been implemented. Note that
RPM per se is hardly to blame for how it is used. Dependencies are input, if
present, they will be checked. Otherwise go tune your file system with
    rm -rf /
I guarantee "rm -rf /" tuning will be higher performing, for all kernel versions and file systems,
than any other possible "tuning".

I'm going to assume
    1) No BDB degradation over time.
    2) fallocate is not needed
until I hear otherwise. Both 1) and 2) can be fixed any time anyone chooses outside of RPM.

Comment 32 Zdenek Kabelac 2009-11-19 13:43:00 UTC
(In reply to comment #31)
> > Well not really sure if there some 'deliberately' crippling - but there is
> > surely a lack of testing and performance checking. 
> 
> Surely you exclude present company. Have fun!

Well as for me - I'm only trying to help to resolve my bugzilla...

> If RPM had any choice, prelink would never have been implemented. Note that
> RPM per se is hardly to blame for how it is used. Dependencies are input, if
> present, they will be checked. Otherwise go tune your file system with
>     rm -rf /
> I guarantee "rm -rf /" tuning will be higher performing, for all kernel

Not really sure how the  rm -rf /  relates to  my plain easy to see wall-clock experiments.

> versions and file systems,
> than any other possible "tuning".
> 
> I'm going to assume
>     1) No BDB degradation over time.
>     2) fallocate is not needed
> until I hear otherwise. Both 1) and 2) can be fixed any time anyone chooses
> outside of RPM.  

Sure - I cannot provide you exact code lines number where the problem is - 
but just to show some numbers:

Before today's upgrade my 'Packages' file after --rebuilddb was something like 71MB. After upgrade of aprox. 900 packages fc12->fc13 - this file is now over 97MB. Well don't ask me what is in those extra 26MB, I've no idea what's even in those 71MB as that's aprox. 40KB per package

I'm only providing you numbers for you statement 1.)

Comment 33 Jeff Johnson 2009-11-19 13:50:14 UTC
> Before today's upgrade my 'Packages' file after --rebuilddb was something like
> 71MB. After upgrade of aprox. 900 packages fc12->fc13 - this file is now over
> 97MB. Well don't ask me what is in those extra 26MB, I've no idea what's even
> in those 71MB as that's aprox. 40KB per package

SIZE != PERFORMANCE

Performance for hashes (in fact) requires pre-allocated free space
in buckets in order to retain performance. SImilarly for btree's,
the costly operation for btree access is recursing upwards
splitting pages.

Running "db_stat -m" would show statistics if anyone had a clue.
But sure, run "ls -l" and use wall clock as measurements all you
want and claim degradation "objectively".

Comment 34 Zdenek Kabelac 2009-11-20 11:16:20 UTC
Well let's do more tests & numbers after todays upgrade - db_stat -m will be attached as well.

Please note - this bugzilla is about fixing rpm-4.7 - if it works for rpm5 its good for your tool - but it will not solve the problem with my Fedora Rawhide installation.

I should also mention that my measurements are done with unloaded machine and repeated several times - so it's not one-time experiment but easily repeatable thing - and as such they ARE time consuming.

I'll provide mainly wall clock time - as that's what counts for user - he must wait that long - and experiences noticeable response delay.  As has been mentioned several times in this thread - CPU time is not all that high - it's not the best it could - but it's in tolerable range for now  (<1s).


1.) uncached 'rpm -qa'

Original time before yum upgrade was around 11.5 second under same conditions 2 days ago.

Now over 16.2 seconds  - amount of package increased due to various fc12->fc13 deps from  1886 to 1911 - possible few of them could be removed again. Anyway I consider this number to be nearly the same.

around 15.5s with removed __db.*   (by default 4 of them are created)

2.) uncached 'rpm -qa --nodigest --nosignature'

around 14.5s  without __db*
around 15.6s  with  __db*   


3.) uncached  'cat Packages'

3.3s

4.) cached 'rpm -qa'  

0.84s  - nearly same with or without __db*

5.) cached 'rpm -qa --nodigest --nosignature'

0.37s  - nearly same with or without __db*

From these experiments above there could be taken interesting conclusion that actually those cached files makes the rpm tool running slower.


Ok and now something completely different

Using DB_CONFIG from comment 25


6.)  uncached 'rpm -qa'   

~16.2s without __db* - quite similar to case 1.)  

BUT

~12s  second if the __db* files are there (and there is 7 __db files now)


7.) uncached rpm -qa --nodigest --nosignature
~15.5s  without __db*
~12s   with __db*


8.) cached 'rpm -qa'

1.1s  without __db*
0.92s with __db*


9.) cached 'rpm -qa --nodigest --nosignature'

0.60s  without __db*
0.44s  with __db*


Ok - this leads to another conclusion - that using more __db* files seems to be winning strategy only for the first run case when no files are in memory.
In other cases there is noticeable performance penalty.

And now let me do final experiment:

10.) uncached rpm -qa --nodigest --nosignature   after --rebuilddb

~12 seconds - so we are back at original number from comment 1.

There are few more interesting things from strace to be mentioned here - 

a.) mmap is used to process  __db* files - but NOT Packages files.

b.) with DB_CONFIG file 
    rm -f __db*
    rpm -qa 
    rpm -qa     <--- not reading Package file with pread() anymore and uses index __db* files - this is different from the default DB_CONFIG less  configuration. However - the speed is actually lower except for the uncached case.


For this test I've made --rebuildb for separate dir copy - thus I'll continue to use original 100MB Packages file to see if there will be some more degradation over the time of usage.

Comment 35 Zdenek Kabelac 2009-11-20 11:20:46 UTC
Created attachment 372467 [details]
db_stat -m

This is  db_stat  for    rpm -qa   with  __db* files  and DB_CONFIG

Let me tell which other  db_stat files under which circumstances you'd like to see.

Comment 36 Jeff Johnson 2009-11-20 11:36:23 UTC
Not interested in seeing anything. Its your bug with rpm-4.7 and Fedora.

My sole interest was your claim that Berkeley DB I/O performance and
space degrades over time. I cannot tell that from your wallclock measurements.

Comment 37 Jeff Johnson 2009-11-20 11:54:44 UTC
One last note ...

db_stat -m claims 80M for cache, you claim 100M for Packages.

And your cache hits are ~60%, not 100%.

Increasing cache to accomodate workset would seem to be useful.

Comment 38 Zdenek Kabelac 2009-11-21 00:11:13 UTC
(In reply to comment #37)
> One last note ...
> 
> db_stat -m claims 80M for cache, you claim 100M for Packages.
> 
> And your cache hits are ~60%, not 100%.
> 
> Increasing cache to accomodate workset would seem to be useful.  


Well - sure there is no problem to increase the case size - the question is - how to make it efficiently  (and again - why the RPM user should actually care about  DB cache size - at least man page for RPM hides DB_CONFIG from the users like me ;)

Let's get back to increasing the the case size doesn't improve initial problem - and IMHO makes later cached processing actually slower - though in range of couple milliseconds - but it is measurable and consistent across multiple measurements.

I admit the time for --nodigest --nosignature goes down to 10 seconds for uncached __db* case  - but let me repeat that wallclock experiment from comment 1 clearly shows that, that using even cachesize 0 and doing cat Packages; followed by rpm command outperforms this solution by large factor.

IMHO keypoint is to enable full mmap usage preferable with MAP_POPULATE if available and I'm getting the feeling that I'll need to look on this myself... :(

As for degradation BDB - if my simple case is not the sign of degradation (i.e. DB size growth by 30% and its getting slower by approximately same factor) then of course there is no problem to care about ;)

Comment 39 Zdenek Kabelac 2009-11-21 00:16:52 UTC
Created attachment 372649 [details]
db_stat -m  1st. run with bigger  cache --nodigest --nosignature

Comment 40 Zdenek Kabelac 2009-11-21 00:17:38 UTC
Created attachment 372650 [details]
db_stat -m  2nd. run with bigger  cache --nodigest --nosignature

Comment 41 Zdenek Kabelac 2009-11-21 00:18:49 UTC
Created attachment 372652 [details]
db_stat -m  3rd. run with bigger  cache --nodigest --nosignature

Comment 42 Jeff Johnson 2009-11-21 00:26:25 UTC
There are various heuristics that can be implemented
to choose a cache size based on available memory
and (a priori known) working set size. The important point is that
Berkeley DB has a resource cap that is honored. How the resource
cap is chosen involves factors other than performance.

DB_CONFIG is most definitely documented (and has
nothing whatsoever to do with RPM):
    http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/env_db_config.html

Alas, RPM started using Berkeley DB before DB_CACHE was implemented
and has never bothered to change.

Why is MAP_POPULATE preferred?

I've missed 30% performance decrease. What is needed is some hint
about the underlying cause in order to attempt a fix. I don't argue
with wall clock. OTOH, I can't fix anything based solely on wall clock,
the underlying cause needs to be identified.

No matter what: rpm --rebuilddb "fixes" degradation. And the degradation
isn't orders of magnitude for either size of performance.

Comment 43 Jeff Johnson 2009-11-21 00:29:01 UTC
Apologies. DB_CONFIG was intended, not DB_CACHE.

Comment 44 Jeff Johnson 2009-11-21 00:36:46 UTC
Ah, fault ahead is why MAP_POPULATE is important.

Note that RPM achieved a 10% increase in upgrade performance
(measured by wallclock ;-) by changing mmap(2) flags in 3
places as well as using a larger (16Mb but 256 Kb had most of the gain)
buffer for zlib way back when.

Linux kernels have an astonishing amount of read ahead
bandwidth available and often unused.

Comment 45 Zdenek Kabelac 2009-11-21 01:06:44 UTC
(In reply to comment #42)
> There are various heuristics that can be implemented
> to choose a cache size based on available memory
> and (a priori known) working set size. The important point is that
> Berkeley DB has a resource cap that is honored. How the resource
> cap is chosen involves factors other than performance.
> 
> DB_CONFIG is most definitely documented (and has
> nothing whatsoever to do with RPM):
>    
> http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/env_db_config.html
> 

Surely I've googled this easily - it's just that until you provided hint in
comment 25 I had no idea I should actually care about that - that why I'd like
to either get some hint from rpm man page, that user should play with this
variable or rpm should provide reasonable good defaults.


> Alas, RPM started using Berkeley DB before DB_CACHE was implemented
> and has never bothered to change.
> 
> Why is MAP_POPULATE preferred?

mmap without this flag is actually quite slow - as there is no read-ahead - and
unless rpm would use a separate smart thread for this - it would lose a lot of
performance (as has been shown in comment 16.

> I've missed 30% performance decrease. What is needed is some hint
> about the underlying cause in order to attempt a fix. I don't argue
> with wall clock. OTOH, I can't fix anything based solely on wall clock,
> the underlying cause needs to be identified.
> 
> No matter what: rpm --rebuilddb "fixes" degradation. And the degradation
> isn't orders of magnitude for either size of performance.  

Well - yes - degradation isn't all that big - but it all sums up - and makes
the upgrade time quite horrible - yes - yum is king here - but I'd like to
start somewhere and see some progress ;)

Comment 46 Zdenek Kabelac 2009-11-21 01:08:53 UTC
(In reply to comment #44)
> Ah, fault ahead is why MAP_POPULATE is important.
> 
> Note that RPM achieved a 10% increase in upgrade performance
> (measured by wallclock ;-) by changing mmap(2) flags in 3
> places as well as using a larger (16Mb but 256 Kb had most of the gain)
> buffer for zlib way back when.
> 
> Linux kernels have an astonishing amount of read ahead
> bandwidth available and often unused.  


Great, you have noticed yourself before I've finished my comment ;)
The flag is relatively new - and not available on older systems.

Comment 47 Jeff Johnson 2009-11-21 01:15:24 UTC
Sadly, in almost all cases for RPM, MAP_POPULATE for read-ahead
doesn't help much. Decompression and digest checking tend
to dominate CPU usage, and its write performance (where madvise(MADV_DONTNEED)
was the win) that tends to dominate I/O performance.

My guess is that MAP_POPULATE benefits would be mostly seen
with quicker startup for simple queries. But there's other factors,
such as redundant lookups, and the marshalling issues you
see in the callgrind, that likely make MAP_POPULATE unimportant.

But, by all means, show me the wall clock move faster ;-)

Comment 48 Zdenek Kabelac 2009-11-21 01:19:13 UTC
Well that's the problem - I'd enjoy to see rpm dominating my CPU :) but so far it takes 0.5s and the rest is iowait for uncached case.

Once it will be CPU bound (for rpm-4.7) it would be the right time to focus on other factors :)

Comment 49 Jeff Johnson 2009-11-21 01:31:50 UTC
Created attachment 372657 [details]
rpmdigest --alldigests /bin/bash

hehe.

Hint: a single #pragma and -fopenmp build option sped rpmdigest up by 7x

That's where MAP_POPULATE will be a win. PIGZ/PBZIP2 parallel I/O are
already staged for deployment, mutexes on all RPM objects needed to be
stabilized first. Stability was achieved mid-summer.

Comment 50 Jeff Johnson 2009-11-22 15:06:45 UTC
FYI: http://rpm5.org/community/rpm-devel/4014.html

Queries with patterns but still loading headers. When I lose
the headerLoad(), I expect approx an order of magnitude faster,
and perhaps 3 orders of magnitude less data to read, for rpm -qa.

Course that won't help "rpm dominating your CPU" ;-)

Comment 51 Bug Zapper 2010-11-04 06:32:18 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 52 Zdenek Kabelac 2010-11-04 09:17:28 UTC
Still applies to rawhide - unsure - why nobody cares....

Comment 53 Fedora Admin XMLRPC Client 2012-04-13 23:09:24 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 54 Fedora Admin XMLRPC Client 2012-04-13 23:12:16 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 55 Fedora End Of Life 2013-04-03 18:54:14 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 56 Panu Matilainen 2013-05-20 11:36:26 UTC
Rpm >= 4.10 (in Fedora >= 18) has significant optimizations to header loading (in particular from the rpmdb) which is one of the bigger bottlenecks in speed, the rest tends to be in Berkely DB internals over which rpm has little say. Since there is no single actual *bug* here, I'm considering the case closed.

Comment 57 Zdenek Kabelac 2013-05-20 11:59:10 UTC
Well I'm not convinced this has ever been fixed - since event today - still being a user of T61 - but with SSD and throughput of ~250MB/s  I get this:


# echo 3 >/proc/sys/vm/drop_caches 
# time rpm --nosignature --nodigest -qa  | wc -l
3194

real	0m8.205s
user	0m0.697s
sys	0m2.817s


# time rpm --nosignature --nodigest -qa  | wc -l
3194

real	0m0.680s
user	0m0.333s
sys	0m0.337s


As can be seen - the second read is significantly faster.


Combine cat & rpm:

# echo 3 >/proc/sys/vm/drop_caches 
# time ( cat /var/lib/rpm/Packages >/dev/null; rpm --nosignature --nodigest -qa  | wc -l )
3194

real	0m2.080s
user	0m0.357s
sys	0m0.860s

# rpm -qa rpm
rpm-4.11.0.1-4.fc20.x86_64


But since being a user of SSD - I'm not really depressed with the performance of rpm/yum nowadays - I'm rather stressed with pointless rewrites of multiple GB of data with the same with often rawhide mass rebuild upgrades....


Note You need to log in before you can comment on or make changes to this bug.