Bug 591656

Summary: [RFE] statistical display of time to being installing and success rate of install per system
Product: [Retired] Beaker Reporter: Cameron Meadors <cmeadors>
Component: web UIAssignee: Raymond Mancy <rmancy>
Status: CLOSED CURRENTRELEASE QA Contact: Dan Callaghan <dcallagh>
Severity: medium Docs Contact:
Priority: low    
Version: 0.5CC: azelinka, bpeck, dcallagh, ebaak, kbaker, mcsontos, mishin, rmancy
Target Milestone: 0.11Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: Measurements
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-17 04:33:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Cameron Meadors 2010-05-12 19:34:25 UTC
If we tracked and could get a report of how long systems where taking to being installing and how successful they were to get installed, we could use that information to tune the collection of systems in beaker.

Long times to begin install could show a lack of systems compared to the demand for them.  We could then add more machines to reduce the time.

Poor success rates for installing on a system would be a red flag for removing the systems and determining if there is a hardware problem.

This would also allow people to specify individual machines that are more likely to install and do it quickly.

Comment 1 Raymond Mancy 2011-01-06 05:22:27 UTC
(In reply to comment #0)
> If we tracked and could get a report of how long systems where taking to being
> installing and how successful they were to get installed, we could use that
> information to tune the collection of systems in beaker.
> 

Having an admin page where we could show system downtime/uptime based on arch etc may be handy for admins when deciding what machines the labs need, although you'd need to be careful the stats were not interpreted with specious reasoning, i.e Machines with nvidia cards had a 2% downtime therefore we need more machines with nvidia cards(never mind the fact that they just happened to have 1TB RAM and were highly sought after for this).


> Long times to begin install could show a lack of systems compared to the demand
> for them.  We could then add more machines to reduce the time.
> 
> Poor success rates for installing on a system would be a red flag for removing
> the systems and determining if there is a hardware problem.
> 

We have some logic now which will flag a system if it fails to install certain distros. 

> This would also allow people to specify individual machines that are more
> likely to install and do it quickly.

If people want to get machines as quickly as possible they need to specify as few requirements in the hostRequires and distroRequires as possible. 

How often would you be willing to look up a set of individual machines that you might normally use, and then determine from their usage rate which specific one you wanted for that particular job? (this isn't a rhetorical question, I'm actually keen to know).

Cheers

Comment 2 Min Shin 2012-11-22 03:33:15 UTC
ncoghlan: Install time should fall out of the FR-8 implementation, install failure stats should fall out of FR-2. Not sure this really needs to be a separate requirement. As far as grpah zooming goes, I don't want to promise anything beyond what graphite gives us by default (Jasper reports definitely won't provide live zooming).

FR-8: Machine stats (https://bugzilla.redhat.com/show_bug.cgi?id=500835)
FR-2: Machine Breakage (https://bugzilla.redhat.com/show_bug.cgi?id=741960)

mishin: Isn't there a difference between machine breakage & install failure?

Comment 3 Raymond Mancy 2012-11-27 04:29:32 UTC
Min, IIRC I think cmeadors is talking about how long it takes for a recipe to being installing when he says "how long systems where taking to being installing". This is not covered by either of those FRs.

I'd guess that Started - Queued time would be sufficient.

Comment 4 Min Shin 2012-11-27 04:42:37 UTC
(In reply to comment #3)
> Min, IIRC I think cmeadors is talking about how long it takes for a recipe
> to being installing when he says "how long systems where taking to being
> installing". This is not covered by either of those FRs.
> 
> I'd guess that Started - Queued time would be sufficient.

Sounds right to me.

Comment 5 Raymond Mancy 2012-11-27 04:54:24 UTC
So this this fit into a current FR ?

Comment 6 Min Shin 2012-11-27 04:57:09 UTC
(In reply to comment #5)
> So this this fit into a current FR ?

Yes. But let's see if we get approvals from stakeholders for the whole 0.11 release. Please work on urgent bugs until we reach final go/no-go decision. :)

Comment 8 Raymond Mancy 2012-12-19 22:47:33 UTC
http://gerrit.beaker-project.org/#/c/1583/ - resource install failure query
http://gerrit.beaker-project.org/#/c/1584/ - resource install duration query
http://gerrit.beaker-project.org/#/c/1573/ - installation tracking

Comment 9 Dan Callaghan 2013-01-02 04:57:28 UTC
Setting back to ASSIGNED as the changes for tracking install time break systems with manual power (i.e. no power settings). In those cases there is no reboot command, so the rebooted time is never set, so install_start fails like this:

2013-01-02 14:34:52,210 bkr.server.xmlrpccontroller ERROR Error handling XML-RPC method
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/bkr/server/xmlrpccontroller.py", line 54, in RPC2
    response = self.process_rpc(method,params)
  File "/usr/lib/python2.6/site-packages/bkr/server/xmlrpccontroller.py", line 43, in process_rpc
    response = obj(*params)
  File "<string>", line 3, in install_start
  File "/usr/lib/python2.6/site-packages/turbogears/identity/conditions.py", line 249, in require
    return fn(self, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/bkr/server/recipes.py", line 216, in install_start
    % recipe.id))
BX: u'Cannot start R:402, resource has not been rebooted'

Comment 10 Raymond Mancy 2013-01-02 06:06:47 UTC
http://gerrit.beaker-project.org/#/c/1591/1

Comment 12 Dan Callaghan 2013-01-04 07:28:11 UTC
Setting back to ASSIGNED as this doesn't cover FR-9 from the 0.11 PRD, which is to provide a summary of the time from submitting a job to it starting ("speed of service").

Comment 13 Raymond Mancy 2013-01-07 14:00:49 UTC
http://gerrit.beaker-project.org/#/c/1610/

Comment 17 Dan Callaghan 2013-01-17 04:33:23 UTC
Beaker 0.11.0 has been released.