Bug 1088264 - AvailabilityExecutor stops calling getAvailability() on ResourceComponent after it previously failed with exception
Summary: AvailabilityExecutor stops calling getAvailability() on ResourceComponent aft...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Agent
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHQ 4.11
Assignee: Libor Zoubek
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-16 10:40 UTC by Libor Zoubek
Modified: 2015-11-02 00:44 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-07-21 10:14:00 UTC
Embargoed:


Attachments (Terms of Use)

Description Libor Zoubek 2014-04-16 10:40:27 UTC
Description of problem:

I am facing a weird issue (or feature) when AvailabilityExecutor stops executing getAvailability() for resource if getAvailability() threw exception previously.

When getAvailability() throws Exception for some reason, user can see it in UI, thats good. We expect him to fix managed resource or plugin configuration.

Currently I did not find a way to recover from getAvailability() no longer being called - only restarting agent helps. Even after updating plugin config in UI, I can still see warning messages in agent log which should not appear because I already fixed my pluginConfig and avail should be passing now (if it was called)


Version-Release number of selected component (if applicable):
RHQ 4.11-master

How reproducible:
Always


Steps to Reproduce:
1. Have a resource is UP
2. Turn it DOWN and change managed resource to cause avail exception in plugin
3. Turn your managed resource on

Repro steps apply to following (from Bug 1015334):
1. have EAP6 domain mode UP and imported
2. stop EAP6 domain
3. edit EAP6's host.xml change <host name="master" to name="master1"
4. start EAP6 domain again 

Actual results:

You can see avail error in UI and WARN messages about avail check failed.
Now .. when you stop EAP6 and revert your change in name attribute and start it again, EAP6 resource should get back UP right? But it doesn't. You still get outdated WARN messages and resource stays DOWN


Expected results:
After reverting back changes in host.xml resource must go back UP, AvailabilityExecutor must be calling getAvailability() of ResourceComponent no mater if it previously failed or not.

Additional info:

Comment 1 Libor Zoubek 2014-04-17 10:22:40 UTC
in master 
commit 937cb29ee5450da0bcf04d8e9952310de400e90b
Author: Libor Zoubek <lzoubek>
Date:   Thu Apr 17 11:47:43 2014 +0200

    [BZ 1088264] AvailabilityExecutor stops calling getAvailability() on
    ResourceComponent after it previously failed with exception

    The issue was in handling exception comming from future. When availability
    check failed with exception we cought it, next run, just by calling
    future.get() raises the very same exception. We forgot to mark future to be
    rescheduled next time = setting it to null. This commit also makes exception
    message more verbose so we know more what happened in plugin

Comment 2 Jay Shaughnessy 2014-04-21 16:08:43 UTC
I'm not sure, we may have done this on purpose originally, to prevent repeated failures. The component's getAvailability() method should not, in general, throw exceptions.  It should return DOWN if it can't connect due to poor plugin configuration.  So, I'd say the use case above indicates a bad implementation of getAvailability().

Having said that, this change is probably acceptable. It's more just an implementation decision and perhaps people will prefer it this way.

Comment 3 Heiko W. Rupp 2014-07-21 10:14:00 UTC
Bulk closing of RHQ 4.11 issues, now that RHQ 4.12 is out.

If you find an issue with those, please open a new BZ, linking to the old one.


Note You need to log in before you can comment on or make changes to this bug.