Bug 105326

Summary: [PATCH] Estimation/Guestimation of package installation time
Product: [Retired] Red Hat Linux Beta Reporter: Elijah Newren <newren>
Component: anacondaAssignee: Michael Fulbright <msf>
Status: CLOSED RAWHIDE QA Contact: Mike McLean <mikem>
Severity: medium Docs Contact:
Priority: medium    
Version: beta1CC: mitr
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
URL: N/A
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-10-16 18:30:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch to anaconda-9.0.94 which implements the better estimator none

Description Elijah Newren 2003-09-25 00:31:52 UTC
*** Basic reason for patch and idea behind the patch ***
The current method of time estimation is basically based on install size:
  finishTime = totalSize/sizeCompleted * elapsedTime

This ignores the number of packages.  Since there's worked involved with every
package besides merely copying files (and work which is independent of package
size), smaller packages (on average) take longer to install per kilobyte than do
the larger packages.  Thus, another helpful estimator may be based on number of
packages:
  finishTime2 = numTotal/numComplete * elapsedTime

Of course, we don't want to report two guesses of the final time, so we have to
have some way of combining the above.  Taking the geometric mean (for reasons
I'll leave out here since this email is already going to be pretty long) seems
fairly reasonable:
  realGuestimate = sqrt(finishTime * finishTime2)

which still leaves room for improvement, but it seems to provide a nice
improvement over the current estimator.  I've attached a simple patch which
implements this (and which also watches out for dividing by 0, though I'm not
sure if that's necessary).  Here's the difference I found in a simple stock
"Personal Desktop" installation on my Dell Inspiron laptop:

*** Comparison of methods ***
Under the old method:
  The guestimated total time was 4:46 after installing the first
  package (glibc).  It then basically steadily (though very slowly)
  increases throughout the entire installation (with a couple
  exceptions of really large packages (think OpenOffice) which drop
  the estimated time by 30 seconds to a minute per large package).
  The largest estimated time is also the final estimated time--
  14:43.

Under the new method:
  The guestimated total time was 19:23 after installing the first
  package (glibc).  During the next 20-30 or so packages (all of
  which are extremely small and fly by in a second or two each), the
  time rapidly drops to about 7:00 and then somewhat quickly climbs
  to somewhere between 10 and 11 minutes.  The estimated time then
  basically climbs slowly but steadily to the final time (of
  14:59)--but no package (even OpenOffice) seems to make it jump by
  more than 7 seconds.

*** Summary of the Benefits ***
The improvements:
  1) The user basically gets a lower and upper estimate to work
     with, and the answer really is fairly close to the middle of
     the two.  (The old method only gives a lower bound, and it's
     never very close and is always increasing)
  2) After about 20-30 packages, no package can make the estimate
     jump by very much.
  3) The estimator is a little more accurate--at about the point
     where the improved method is at 10-11 minutes, I believe the
     old method would still be guestimating around 6-7 minutes.
  4) This may be an irrelevant point (and is almost identical to
     #2 anyway), but I think that if Red Hat were ever to change
     their package installation order, they might get drastically
     different results with the current estimator.  The newer
     estimator is much less susceptible to change, and thus would
     help reduce the surprise factor.

*** Other notes ***
It appears that the main reason the new estimator is low, is that the packages
early on in the installation almost never have post-install scripts (or at least
so I'm guessing, since the progress bar for the package won't hang at 100% for
multiple seconds before moving on), whereas it is quite common for packages
later in the installation to have post-install scripts.  If there were a more
even balance, I believe the estimator would be much better.

Comment 1 Elijah Newren 2003-09-25 00:32:48 UTC
Created attachment 94703 [details]
Patch to anaconda-9.0.94 which implements the better estimator

Comment 2 Michael Fulbright 2003-10-09 16:19:20 UTC
I spent some time profiling the install and tried your idea. I came up with a
slightly different solution to the problem and it will be in the next test
release.  Thanks for the time you spent investigating this.