Bug 591656 - [RFE] statistical display of time to being installing and success rate of install per system
[RFE] statistical display of time to being installing and success rate of ins...
Status: CLOSED CURRENTRELEASE
Product: Beaker
Classification: Community
Component: web UI (Show other bugs)
0.5
All Linux
low Severity medium (vote)
: 0.11
: ---
Assigned To: Raymond Mancy
Dan Callaghan
Measurements
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-12 15:34 EDT by Cameron Meadors
Modified: 2014-12-07 20:02 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-01-16 23:33:23 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Cameron Meadors 2010-05-12 15:34:25 EDT
If we tracked and could get a report of how long systems where taking to being installing and how successful they were to get installed, we could use that information to tune the collection of systems in beaker.

Long times to begin install could show a lack of systems compared to the demand for them.  We could then add more machines to reduce the time.

Poor success rates for installing on a system would be a red flag for removing the systems and determining if there is a hardware problem.

This would also allow people to specify individual machines that are more likely to install and do it quickly.
Comment 1 Raymond Mancy 2011-01-06 00:22:27 EST
(In reply to comment #0)
> If we tracked and could get a report of how long systems where taking to being
> installing and how successful they were to get installed, we could use that
> information to tune the collection of systems in beaker.
> 

Having an admin page where we could show system downtime/uptime based on arch etc may be handy for admins when deciding what machines the labs need, although you'd need to be careful the stats were not interpreted with specious reasoning, i.e Machines with nvidia cards had a 2% downtime therefore we need more machines with nvidia cards(never mind the fact that they just happened to have 1TB RAM and were highly sought after for this).


> Long times to begin install could show a lack of systems compared to the demand
> for them.  We could then add more machines to reduce the time.
> 
> Poor success rates for installing on a system would be a red flag for removing
> the systems and determining if there is a hardware problem.
> 

We have some logic now which will flag a system if it fails to install certain distros. 

> This would also allow people to specify individual machines that are more
> likely to install and do it quickly.

If people want to get machines as quickly as possible they need to specify as few requirements in the hostRequires and distroRequires as possible. 

How often would you be willing to look up a set of individual machines that you might normally use, and then determine from their usage rate which specific one you wanted for that particular job? (this isn't a rhetorical question, I'm actually keen to know).

Cheers
Comment 2 Min Shin 2012-11-21 22:33:15 EST
ncoghlan: Install time should fall out of the FR-8 implementation, install failure stats should fall out of FR-2. Not sure this really needs to be a separate requirement. As far as grpah zooming goes, I don't want to promise anything beyond what graphite gives us by default (Jasper reports definitely won't provide live zooming).

FR-8: Machine stats (https://bugzilla.redhat.com/show_bug.cgi?id=500835)
FR-2: Machine Breakage (https://bugzilla.redhat.com/show_bug.cgi?id=741960)

mishin: Isn't there a difference between machine breakage & install failure?
Comment 3 Raymond Mancy 2012-11-26 23:29:32 EST
Min, IIRC I think cmeadors is talking about how long it takes for a recipe to being installing when he says "how long systems where taking to being installing". This is not covered by either of those FRs.

I'd guess that Started - Queued time would be sufficient.
Comment 4 Min Shin 2012-11-26 23:42:37 EST
(In reply to comment #3)
> Min, IIRC I think cmeadors is talking about how long it takes for a recipe
> to being installing when he says "how long systems where taking to being
> installing". This is not covered by either of those FRs.
> 
> I'd guess that Started - Queued time would be sufficient.

Sounds right to me.
Comment 5 Raymond Mancy 2012-11-26 23:54:24 EST
So this this fit into a current FR ?
Comment 6 Min Shin 2012-11-26 23:57:09 EST
(In reply to comment #5)
> So this this fit into a current FR ?

Yes. But let's see if we get approvals from stakeholders for the whole 0.11 release. Please work on urgent bugs until we reach final go/no-go decision. :)
Comment 8 Raymond Mancy 2012-12-19 17:47:33 EST
http://gerrit.beaker-project.org/#/c/1583/ - resource install failure query
http://gerrit.beaker-project.org/#/c/1584/ - resource install duration query
http://gerrit.beaker-project.org/#/c/1573/ - installation tracking
Comment 9 Dan Callaghan 2013-01-01 23:57:28 EST
Setting back to ASSIGNED as the changes for tracking install time break systems with manual power (i.e. no power settings). In those cases there is no reboot command, so the rebooted time is never set, so install_start fails like this:

2013-01-02 14:34:52,210 bkr.server.xmlrpccontroller ERROR Error handling XML-RPC method
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/bkr/server/xmlrpccontroller.py", line 54, in RPC2
    response = self.process_rpc(method,params)
  File "/usr/lib/python2.6/site-packages/bkr/server/xmlrpccontroller.py", line 43, in process_rpc
    response = obj(*params)
  File "<string>", line 3, in install_start
  File "/usr/lib/python2.6/site-packages/turbogears/identity/conditions.py", line 249, in require
    return fn(self, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/bkr/server/recipes.py", line 216, in install_start
    % recipe.id))
BX: u'Cannot start R:402, resource has not been rebooted'
Comment 10 Raymond Mancy 2013-01-02 01:06:47 EST
http://gerrit.beaker-project.org/#/c/1591/1
Comment 12 Dan Callaghan 2013-01-04 02:28:11 EST
Setting back to ASSIGNED as this doesn't cover FR-9 from the 0.11 PRD, which is to provide a summary of the time from submitting a job to it starting ("speed of service").
Comment 13 Raymond Mancy 2013-01-07 09:00:49 EST
http://gerrit.beaker-project.org/#/c/1610/
Comment 17 Dan Callaghan 2013-01-16 23:33:23 EST
Beaker 0.11.0 has been released.

Note You need to log in before you can comment on or make changes to this bug.