Bug 508445

Summary: cost excludes are much slower in yum 3.2.23 than 3.2.21
Product: [Fedora] Fedora Reporter: Roberto Ragusa <bugzillaredhat-56f0>
Component: yumAssignee: Seth Vidal <skvidal>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 10CC: dmach, eric.tanguy, ffesti, james.antill, pmatilai, rs, tim.lauridsen, tomek
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
URL: http://www.mail-archive.com/fedora-devel-list@redhat.com/msg03366.html
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 509932 (view as bug list) Environment:
Last Closed: 2009-07-10 16:07:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 509932    

Description Roberto Ragusa 2009-06-27 09:53:28 UTC
yum 3.2.23 on F10 is a lot slower than 3.2.21.

A simple "yum -C list updates" takes 90 seconds instead of 3 seconds,
if two repositories with the same rpms and different cost are enabled
(typical scenario for local mirroring of updates).

The problem appears to be that the code managing the exclusion of
rpm packages from the expensive repo, which loops on the rpms
and deletes them very inefficiently.
"excluding for cost:" lines are printed quite slowly (option -d 5).

Suspects on this code: (is this O(N^2) ?)

  for pkg in pkgs[1:]:
      if pkg.repo.cost > lowcost:
          msg = _('excluding for cost: %s from %s') % (pkg, pkg.repo.id)
          self.verbose_logger.log(logginglevels.DEBUG_3, msg)
          pkg.repo.sack.delPackage(pkg)


Also see

  http://www.mail-archive.com/fedora-devel-list@redhat.com/msg03366.html

(reproduced by another user with similar conf).

Comment 1 James Antill 2009-07-03 14:57:45 UTC
 Can you see what happens if you undo this patch (ie. just delete the function starting at line 284):


diff --git a/yum/yumRepo.py b/yum/yumRepo.py
index afd0877..964f19d 100644
--- a/yum/yumRepo.py
+++ b/yum/yumRepo.py
@@ -281,6 +281,19 @@ class YumRepository(Repository, config.RepoConf):
         self._grabfunc = None
         self._grab = None
 
+    def __cmp__(self, other):
+        """ Sort yum repos. by cost, and then by alphanumeric on their id. """
+        if other is None:
+            return 1
+        if hasattr(other, 'cost'):
+            ocost = other.cost
+        else:
+            ocost = 1000
+        ret = cmp(self.cost, ocost)
+        if ret:
+            return 1
+        return cmp(self.id, other.id)
+
     def _getSack(self):
         # FIXME: Note that having the repo hold the sack, which holds "repos"

Comment 2 Roberto Ragusa 2009-07-05 09:17:58 UTC
It didn't help.
Successive run were as slow as before the change (90 seconds).

Much better results if I remove this line
  self.costExcludePackages()
from __init__.py; it now runs in 7 seconds.
If I undestand correctly, it is now discarding duplicates
in the list of to-be-updated packages, instead of the list
of all packages; so it is faster. A better fix is certainly
possible, I suppose.

Comment 3 James Antill 2009-07-06 18:16:56 UTC
Ok, I've confirmed it and I have a patch that "fixes" it:

Author: James Antill <james>
Date:   Mon Jul 6 14:11:58 2009 -0400

    Fix speed regression in costExcludes

diff --git a/yum/__init__.py b/yum/__init__.py
index 2f9723b..268baaf 100644
--- a/yum/__init__.py
+++ b/yum/__init__.py
@@ -1120,6 +1120,7 @@ class YumBase(depsolve.Depsolve):
                 pkgdict[po.pkgtup] = []
             pkgdict[po.pkgtup].append(po)
         
+        verb = self.verbose_logger.isEnabledFor(logginglevels.DEBUG_3)
         for pkgs in pkgdict.values():
             if len(pkgs) == 1:
                 continue
@@ -1129,8 +1130,10 @@ class YumBase(depsolve.Depsolve):
             #print '%s : %s : %s' % (pkgs[0], pkgs[0].repo, pkgs[0].repo.cost)
             for pkg in pkgs[1:]:
                 if pkg.repo.cost > lowcost:
-                    msg = _('excluding for cost: %s from %s') % (pkg, pkg.repo.
-                    self.verbose_logger.log(logginglevels.DEBUG_3, msg)
+                    if verb:
+                        msg = _('excluding for cost: %s from %s') %(pkg,
+                                                                    pkg.repo.id
+                        self.verbose_logger.log(logginglevels.DEBUG_3, msg)
                     pkg.repo.sack.delPackage(pkg)
             
 

...the big question is how/why this fixes it ... and what changed between 3.2.21 and 3.2.22 that makes the above much slower.

Comment 4 James Antill 2009-07-06 18:20:41 UTC
My data for this was to use a RHEL-5 box and have multiple EPEL repos. named et1, et2, et3, et4 with different costs. The above patch (for 13,132 delPackage() calls) takes costExcludes() time from 243.10 seconds to 7.25 seconds (which is the same as 3.2.19-18).

Comment 5 Roberto Ragusa 2009-07-06 19:49:12 UTC
Interesting.
I confirm that your fix works here.

I tried other variations of the two logging lines (without your patch).

This one is fast (7s):
  msg = _('excluding for cost: %s from %s') % (pkg.repo.id,pkg.repo.id)
  self.verbose_logger.log(logginglevels.DEBUG_3, msg)

This one is slow (101s):
  msg = _('excluding for cost: %s from %s') % (pkg,pkg.repo.id)
  self.verbose_logger.log(logginglevels.DEBUG_3, msg)

This one is slower (189s):
  msg = _('excluding for cost: %s from %s') % (pkg,pkg)
  self.verbose_logger.log(logginglevels.DEBUG_3, msg)

The slow down happens when the %s is expanded to pkg.

Comment 6 James Antill 2009-07-07 04:26:14 UTC
Thanks, I can also confirm that. But even more interesting this is a more minimal patch for the workaround:

Author: James Antill <james>
Date:   Mon Jul 6 14:11:58 2009 -0400

    Fix speed regression in costExcludes (magic)

diff --git a/yum/__init__.py b/yum/__init__.py
index 2f9723b..f2a4bd5 100644
--- a/yum/__init__.py
+++ b/yum/__init__.py
@@ -1129,7 +1129,7 @@ class YumBase(depsolve.Depsolve):
             #print '%s : %s : %s' % (pkgs[0], pkgs[0].repo, pkgs[0].repo.cost)
             for pkg in pkgs[1:]:
                 if pkg.repo.cost > lowcost:
-                    msg = _('excluding for cost: %s from %s') % (pkg, pkg.repo.id)
+                    msg = _('excluding for cost: %s from %s') % (str(pkg), pkg.repo.id)
                     self.verbose_logger.log(logginglevels.DEBUG_3, msg)
                     pkg.repo.sack.delPackage(pkg)


...which is pretty crazy, and implies that logger is doing something weird.

Comment 7 James Antill 2009-07-07 04:28:43 UTC
Of course it can't be logger, because it's happening in % above ... so it has to be core python.

Comment 8 James Antill 2009-07-07 05:23:40 UTC
 Ok, we have a winner:

% time python /usr/lib*/python2.?/timeit.py -n 1 -s 'import yum; yb = yum.YumBase()'  'map(str, yb.pkgSack.returnPackages(patterns="zi*"))'
Loaded plugins: Avahi, auto-update-debuginfo, presto
Loaded plugins: Avahi, auto-update-debuginfo, presto
Loaded plugins: Avahi, auto-update-debuginfo, presto
1 loops, best of 3: 3.03 sec per loop
python /usr/lib*/python2.?/timeit.py -n 1 -s 'import yum; yb = yum.YumBase()'  9.29s user 0.44s system 97% cpu 9.980 total

% time python /usr/lib*/python2.?/timeit.py -n 1 -s 'import yum; yb = yum.YumBase()'  'map(unicode, yb.pkgSack.returnPackages(patterns="zi*"))'
Loaded plugins: Avahi, auto-update-debuginfo, presto
Loaded plugins: Avahi, auto-update-debuginfo, presto
Loaded plugins: Avahi, auto-update-debuginfo, presto
1 loops, best of 3: 177 sec per loop
python /usr/lib*/python2.?/timeit.py -n 1 -s 'import yum; yb = yum.YumBase()'  528.68s user 0.68s system 99% cpu 8:52.93 total

...ie. a change from str(pkg) to unicode(pkg) makes it roughly 50x slower.
 Fixed upstream in commit:

http://yum.baseurl.org/gitweb?p=yum.git;a=commitdiff;h=1f45838e29ba26dac22215dd9c918fb920569fef

...still no idea how python hit this going from 3.2.21 to 3.2.22.

Comment 9 James Antill 2009-07-07 18:20:56 UTC
 git bisect shows that this is the commit that broke it:

d03389725e3779928d721370aed9e14afb5304a6

...ha, it's Seth's fault :).
 Quick test shows that if I do (without the other patches):

diff --git a/yum/sqlitesack.py b/yum/sqlitesack.py
index 5692c07..94dbc1a 100644
--- a/yum/sqlitesack.py
+++ b/yum/sqlitesack.py
@@ -192,7 +192,7 @@ class YumAvailablePackageSqlite(YumAvailablePackage, Package
                         }
 
         if varname.startswith('__') and varname.endswith('__'):
-            if not hasattr(self, varname):
+            if varname == '__unicode__' or not hasattr(self, varname):
                 raise AttributeError, varname
         
         dbname = varname

...then everything is fast again.

Comment 10 James Antill 2009-07-07 18:34:02 UTC
 After more thinking this is because we are in __getattr__(self, ...) and call hasattr(self, ...) ... which then calls getattr() which calls __getattr__() again. Until we run out of stack and it all unwinds again.
 So above fix reverted, and real fix pushed.

Comment 11 Robert Story 2009-09-01 14:01:16 UTC
Can this fix get pushed to F10? I can verify that the simple fix in comment #6 cut a simple 'yum list perl-File-Sync' from 2 minutes to 7 seconds...

Comment 12 seth vidal 2009-09-01 14:18:42 UTC
the changeset from what's in yum's git and f10 is pretty serious. I'm not sure it is a great idea to push it that far back.

Comment 13 Robert Story 2009-09-01 14:30:09 UTC
I just want the 1 line fix from comment #6 in yum/__init__.py...

Comment 14 Robert Story 2009-10-21 16:30:27 UTC
the priorities plugin has a similar issue and fix. (sure would be nice to get a yum-3.2.23-4.f10 package with these one line fixes...)

--- /usr/lib/yum-plugins/priorities.py.orig	2009-10-21 11:54:34.000000000 -0400
+++ /usr/lib/yum-plugins/priorities.py	2009-10-21 11:51:04.000000000 -0400
@@ -144,7 +144,7 @@
                 if delPackage:
                     conduit.delPackage(po)
                     cnt += 1
-                    conduit.info(3," --> %s from %s excluded (priority)" % (po,po.repoid))
+                    conduit.info(3," --> %s from %s excluded (priority)" % (str(po),po.repoid))
 
                 # If this packages obsoletes other packages, check whether
                 # one of the obsoleted packages is not available through
@@ -157,7 +157,7 @@
                             if pkg_priorities_archless.has_key(pkg_name) and pkg_priorities_archless[pkg_name] < repo.priority:
                                 conduit.delPackage(po)
                                 cnt += 1
-                                conduit.info(3," --> %s from %s excluded (priority)" % (po,po.repoid))
+                                conduit.info(3," --> %s from %s excluded (priority)" % (str(po),po.repoid))
                                 break
     conduit.info(2, '%d packages excluded due to repository priority protections' % cnt)

Comment 15 James Antill 2009-10-21 17:48:02 UTC
The new yum has already been pushed to f10, just get it from updates-testing ... there's no reason to change the priorities plugin.