854248 – Upgrade :The penalty for failure Strategy is not penalizing the provider

Bug 854248 - Upgrade :The penalty for failure Strategy is not penalizing the provider

Summary: Upgrade :The penalty for failure Strategy is not penalizing the provider

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	CloudForms Cloud Engine
Classification:	Retired
Component:	aeolus-conductor
Sub Component:
Version:	1.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Assignee:	Angus Thomas
QA Contact:	Rehana
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-04 13:33 UTC by Rehana
Modified:	2012-12-12 15:18 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-12-12 15:18:21 UTC
Embargoed:

Attachments	(Terms of Use)

Description Rehana 2012-09-04 13:33:19 UTC

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:
This issue is observed on a upgrade machine 

Steps to Reproduce:
1.Build and pushed image to rhevm and mock provider
2.Added penalty for failure criteria as
    Penalty for each failure (in percentage) * =50
    Time period over which the failures are aggregated (in minutes) * =60
    The number of failures over which the provider account should not be used if possible =1
3.Disable mock provider
4.Launched instance till the rhevm provider becomes full
5. Now enable mock provider
6. Launch instance
  
Actual results:
Observed that the new VM request was going to rhevm provider instead of going to mock

Expected results:
Once the failure criteria was met the instance should go to other avaialable providers to which the images are pushed

Additional info:
rpm -qa | grep aeolus
aeolus-conductor-0.13.0-0.20120904080008git280b413.fc16.noarch
rubygem-aeolus-image-0.6.0-0.20120904040029git902c81c.fc16.noarch
aeolus-conductor-doc-0.13.0-0.20120904080008git280b413.fc16.noarch
aeolus-configure-2.8.0-0.20120904040024git7dfddaf.fc16.noarch
aeolus-all-0.13.0-0.20120904080008git280b413.fc16.noarch
rubygem-aeolus-cli-0.7.0-0.20120904040025gitd64d64f.fc16.noarch
aeolus-conductor-daemons-0.13.0-0.20120904080008git280b413.fc16.noarch

Comment 2 Imre Farkas 2012-09-07 08:15:26 UTC

The issue only applies for RHEV-M providers and it's not reproducable always but not sure why. The root cause is that RHEV-M failure is not always reported to Conductor, so penalizing fails because Conductor doesn't know about any failure.

Jan's patchset which fixes RHEV-M launching also fixes this issue:
https://lists.fedorahosted.org/pipermail/aeolus-devel/2012-September/012478.html

Comment 3 Matt Wagner 2012-09-11 21:22:45 UTC

Based on Imre's comment, I'm going to move this over to ON_QA for retesting -- Jan's patchset has already been pushed and is on 1.1:



commit e69cd25fd846495221d528778e51a7ea6feba8c5
Author: Jan Provaznik <jprovazn>
Date:   Wed Sep 5 15:40:53 2012 +0200

    Changed Event.description column to text
    
    It turned out that sometimes is error message quite long (if a match is not found),
    in such case is the message saved into description column.
    (cherry picked from commit 7451414c6c89f17b80768451aeb2164b400464c2)

commit e0ef69e0df7e7cfa598d60067dd6a63f39ccaf43
Author: Jan Provaznik <jprovazn>
Date:   Wed Sep 5 15:29:53 2012 +0200

    Translated deployment event messages
    (cherry picked from commit e3f37f5dab9fda7142d329490060205054d543c4)

commit a7f8cdaef3c3239956b6fd59a054512ff4052a31
Author: Jan Provaznik <jprovazn>
Date:   Tue Aug 28 15:17:56 2012 +0200

    Improved starting of RHEVM instances (rev. 2)
    
    RHEVM instances goes to stopped state after creating and they need
    to be started explicitly. This patch improves check if explicit
    start request should be sent.
    
    It also extends stopping of RHEVM instances - it's possible that
    RHEVM isntances freeze in shutting_down state on provider side
    if development tools are not installed inside VM. In such case,
    another 'stop' request is sent to the instance.
    (cherry picked from commit 41a068fc24725015f19c86575f505830a98294a1)

Comment 4 Mike Orazi 2012-09-11 21:28:10 UTC

Note -- we might just need more details on the reproducer if it still is possible to reproduce.

Comment 5 Rehana 2012-10-03 09:47:21 UTC

I m unable to reproduce the issue. 

Observed that now the provider is getting penalised for the failure and new instance requests went to the other provider (mock) to which the same image is pushed

on 

rpm -qa | grep aeolus
aeolus-conductor-doc-0.13.16-1.el6cf.noarch
rubygem-aeolus-cli-0.7.3-1.el6cf.noarch
aeolus-all-0.13.16-1.el6cf.noarch
aeolus-conductor-0.13.16-1.el6cf.noarch
aeolus-configure-2.8.8-1.el6cf.noarch
aeolus-conductor-daemons-0.13.16-1.el6cf.noarch
rubygem-aeolus-image-0.3.0-12.el6.noarch

moving to verified.

Comment 6 James Laska 2012-12-12 15:18:21 UTC

CloudForms-1.1 shipped with aeolus-conductor-0.13.24-1.el6cf, aeolus-configure- 2.8.11-1.el6cf, imagefactory-1.0.2-1.el6cf, iwhd-1.5-2.el6, and oz-0.8.0-6.el6cf

Marking this bug CLOSED CURRENTRELEASE.  Please reopen if the problem has not   been addressed in the 1.1 product.

Note You need to log in before you can comment on or make changes to this bug.