Description of problem: Version-Release number of selected component (if applicable): How reproducible: This issue is observed on a upgrade machine Steps to Reproduce: 1.Build and pushed image to rhevm and mock provider 2.Added penalty for failure criteria as Penalty for each failure (in percentage) * =50 Time period over which the failures are aggregated (in minutes) * =60 The number of failures over which the provider account should not be used if possible =1 3.Disable mock provider 4.Launched instance till the rhevm provider becomes full 5. Now enable mock provider 6. Launch instance Actual results: Observed that the new VM request was going to rhevm provider instead of going to mock Expected results: Once the failure criteria was met the instance should go to other avaialable providers to which the images are pushed Additional info: rpm -qa | grep aeolus aeolus-conductor-0.13.0-0.20120904080008git280b413.fc16.noarch rubygem-aeolus-image-0.6.0-0.20120904040029git902c81c.fc16.noarch aeolus-conductor-doc-0.13.0-0.20120904080008git280b413.fc16.noarch aeolus-configure-2.8.0-0.20120904040024git7dfddaf.fc16.noarch aeolus-all-0.13.0-0.20120904080008git280b413.fc16.noarch rubygem-aeolus-cli-0.7.0-0.20120904040025gitd64d64f.fc16.noarch aeolus-conductor-daemons-0.13.0-0.20120904080008git280b413.fc16.noarch
The issue only applies for RHEV-M providers and it's not reproducable always but not sure why. The root cause is that RHEV-M failure is not always reported to Conductor, so penalizing fails because Conductor doesn't know about any failure. Jan's patchset which fixes RHEV-M launching also fixes this issue: https://lists.fedorahosted.org/pipermail/aeolus-devel/2012-September/012478.html
Based on Imre's comment, I'm going to move this over to ON_QA for retesting -- Jan's patchset has already been pushed and is on 1.1: commit e69cd25fd846495221d528778e51a7ea6feba8c5 Author: Jan Provaznik <jprovazn> Date: Wed Sep 5 15:40:53 2012 +0200 Changed Event.description column to text It turned out that sometimes is error message quite long (if a match is not found), in such case is the message saved into description column. (cherry picked from commit 7451414c6c89f17b80768451aeb2164b400464c2) commit e0ef69e0df7e7cfa598d60067dd6a63f39ccaf43 Author: Jan Provaznik <jprovazn> Date: Wed Sep 5 15:29:53 2012 +0200 Translated deployment event messages (cherry picked from commit e3f37f5dab9fda7142d329490060205054d543c4) commit a7f8cdaef3c3239956b6fd59a054512ff4052a31 Author: Jan Provaznik <jprovazn> Date: Tue Aug 28 15:17:56 2012 +0200 Improved starting of RHEVM instances (rev. 2) RHEVM instances goes to stopped state after creating and they need to be started explicitly. This patch improves check if explicit start request should be sent. It also extends stopping of RHEVM instances - it's possible that RHEVM isntances freeze in shutting_down state on provider side if development tools are not installed inside VM. In such case, another 'stop' request is sent to the instance. (cherry picked from commit 41a068fc24725015f19c86575f505830a98294a1)
Note -- we might just need more details on the reproducer if it still is possible to reproduce.
I m unable to reproduce the issue. Observed that now the provider is getting penalised for the failure and new instance requests went to the other provider (mock) to which the same image is pushed on rpm -qa | grep aeolus aeolus-conductor-doc-0.13.16-1.el6cf.noarch rubygem-aeolus-cli-0.7.3-1.el6cf.noarch aeolus-all-0.13.16-1.el6cf.noarch aeolus-conductor-0.13.16-1.el6cf.noarch aeolus-configure-2.8.8-1.el6cf.noarch aeolus-conductor-daemons-0.13.16-1.el6cf.noarch rubygem-aeolus-image-0.3.0-12.el6.noarch moving to verified.
CloudForms-1.1 shipped with aeolus-conductor-0.13.24-1.el6cf, aeolus-configure- 2.8.11-1.el6cf, imagefactory-1.0.2-1.el6cf, iwhd-1.5-2.el6, and oz-0.8.0-6.el6cf Marking this bug CLOSED CURRENTRELEASE. Please reopen if the problem has not been addressed in the 1.1 product.