Bug 509932

Summary: cost excludes are much slower in yum 3.2.23 than 3.2.21
Product: Red Hat Enterprise Linux 5 Reporter: James Antill <james.antill>
Component: yumAssignee: James Antill <james.antill>
Status: CLOSED ERRATA QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: ddumas, dmach, eric.tanguy, ffesti, james.antill, jhutar, pmatilai, syeghiay, tim.lauridsen, tomek
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
URL: http://www.mail-archive.com/fedora-devel-list@redhat.com/msg03366.html
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
A recent version of yum was substantially slower than previous versions when calculating cost excludes. The slowdown was caused by variable that expanded to unicode encoding rather than str. The calculation no longer uses unicode, which returns yum to its previous higher speed.
Story Points: ---
Clone Of: 508445 Environment:
Last Closed: 2009-09-02 07:34:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 508445    
Bug Blocks:    

Description James Antill 2009-07-06 22:56:21 UTC
+++ This bug was initially created as a clone of Bug #508445 +++

yum 3.2.23 on F10 is a lot slower than 3.2.21.

A simple "yum -C list updates" takes 90 seconds instead of 3 seconds,
if two repositories with the same rpms and different cost are enabled
(typical scenario for local mirroring of updates).

The problem appears to be that the code managing the exclusion of
rpm packages from the expensive repo, which loops on the rpms
and deletes them very inefficiently.
"excluding for cost:" lines are printed quite slowly (option -d 5).

Suspects on this code: (is this O(N^2) ?)

  for pkg in pkgs[1:]:
      if pkg.repo.cost > lowcost:
          msg = _('excluding for cost: %s from %s') % (pkg, pkg.repo.id)
          self.verbose_logger.log(logginglevels.DEBUG_3, msg)
          pkg.repo.sack.delPackage(pkg)


Also see

  http://www.mail-archive.com/fedora-devel-list@redhat.com/msg03366.html

(reproduced by another user with similar conf).

--- Additional comment from james.antill on 2009-07-03 10:57:45 EDT ---

 Can you see what happens if you undo this patch (ie. just delete the function starting at line 284):


diff --git a/yum/yumRepo.py b/yum/yumRepo.py
index afd0877..964f19d 100644
--- a/yum/yumRepo.py
+++ b/yum/yumRepo.py
@@ -281,6 +281,19 @@ class YumRepository(Repository, config.RepoConf):
         self._grabfunc = None
         self._grab = None
 
+    def __cmp__(self, other):
+        """ Sort yum repos. by cost, and then by alphanumeric on their id. """
+        if other is None:
+            return 1
+        if hasattr(other, 'cost'):
+            ocost = other.cost
+        else:
+            ocost = 1000
+        ret = cmp(self.cost, ocost)
+        if ret:
+            return 1
+        return cmp(self.id, other.id)
+
     def _getSack(self):
         # FIXME: Note that having the repo hold the sack, which holds "repos"

--- Additional comment from bugzillaredhat-56f0 on 2009-07-05 05:17:58 EDT ---

It didn't help.
Successive run were as slow as before the change (90 seconds).

Much better results if I remove this line
  self.costExcludePackages()
from __init__.py; it now runs in 7 seconds.
If I undestand correctly, it is now discarding duplicates
in the list of to-be-updated packages, instead of the list
of all packages; so it is faster. A better fix is certainly
possible, I suppose.

--- Additional comment from james.antill on 2009-07-06 14:16:56 EDT ---

Ok, I've confirmed it and I have a patch that "fixes" it:

Author: James Antill <james>
Date:   Mon Jul 6 14:11:58 2009 -0400

    Fix speed regression in costExcludes

diff --git a/yum/__init__.py b/yum/__init__.py
index 2f9723b..268baaf 100644
--- a/yum/__init__.py
+++ b/yum/__init__.py
@@ -1120,6 +1120,7 @@ class YumBase(depsolve.Depsolve):
                 pkgdict[po.pkgtup] = []
             pkgdict[po.pkgtup].append(po)
         
+        verb = self.verbose_logger.isEnabledFor(logginglevels.DEBUG_3)
         for pkgs in pkgdict.values():
             if len(pkgs) == 1:
                 continue
@@ -1129,8 +1130,10 @@ class YumBase(depsolve.Depsolve):
             #print '%s : %s : %s' % (pkgs[0], pkgs[0].repo, pkgs[0].repo.cost)
             for pkg in pkgs[1:]:
                 if pkg.repo.cost > lowcost:
-                    msg = _('excluding for cost: %s from %s') % (pkg, pkg.repo.
-                    self.verbose_logger.log(logginglevels.DEBUG_3, msg)
+                    if verb:
+                        msg = _('excluding for cost: %s from %s') %(pkg,
+                                                                    pkg.repo.id
+                        self.verbose_logger.log(logginglevels.DEBUG_3, msg)
                     pkg.repo.sack.delPackage(pkg)
             
 

...the big question is how/why this fixes it ... and what changed between 3.2.21 and 3.2.22 that makes the above much slower.

--- Additional comment from james.antill on 2009-07-06 14:20:41 EDT ---

My data for this was to use a RHEL-5 box and have multiple EPEL repos. named et1, et2, et3, et4 with different costs. The above patch (for 13,132 delPackage() calls) takes costExcludes() time from 243.10 seconds to 7.25 seconds (which is the same as 3.2.19-18).

--- Additional comment from bugzillaredhat-56f0 on 2009-07-06 15:49:12 EDT ---

Interesting.
I confirm that your fix works here.

I tried other variations of the two logging lines (without your patch).

This one is fast (7s):
  msg = _('excluding for cost: %s from %s') % (pkg.repo.id,pkg.repo.id)
  self.verbose_logger.log(logginglevels.DEBUG_3, msg)

This one is slow (101s):
  msg = _('excluding for cost: %s from %s') % (pkg,pkg.repo.id)
  self.verbose_logger.log(logginglevels.DEBUG_3, msg)

This one is slower (189s):
  msg = _('excluding for cost: %s from %s') % (pkg,pkg)
  self.verbose_logger.log(logginglevels.DEBUG_3, msg)

The slow down happens when the %s is expanded to pkg.

Comment 8 Ruediger Landmann 2009-09-01 22:14:54 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
A recent version of yum was substantially slower than previous versions when calculating cost excludes. The slowdown was caused by variable that expanded to unicode encoding rather than str. The calculation no longer uses unicode, which returns yum to its previous higher speed.

Comment 9 errata-xmlrpc 2009-09-02 07:34:41 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1419.html