Description of problem: I am facing a weird issue (or feature) when AvailabilityExecutor stops executing getAvailability() for resource if getAvailability() threw exception previously. When getAvailability() throws Exception for some reason, user can see it in UI, thats good. We expect him to fix managed resource or plugin configuration. Currently I did not find a way to recover from getAvailability() no longer being called - only restarting agent helps. Even after updating plugin config in UI, I can still see warning messages in agent log which should not appear because I already fixed my pluginConfig and avail should be passing now (if it was called) Version-Release number of selected component (if applicable): RHQ 4.11-master How reproducible: Always Steps to Reproduce: 1. Have a resource is UP 2. Turn it DOWN and change managed resource to cause avail exception in plugin 3. Turn your managed resource on Repro steps apply to following (from Bug 1015334): 1. have EAP6 domain mode UP and imported 2. stop EAP6 domain 3. edit EAP6's host.xml change <host name="master" to name="master1" 4. start EAP6 domain again Actual results: You can see avail error in UI and WARN messages about avail check failed. Now .. when you stop EAP6 and revert your change in name attribute and start it again, EAP6 resource should get back UP right? But it doesn't. You still get outdated WARN messages and resource stays DOWN Expected results: After reverting back changes in host.xml resource must go back UP, AvailabilityExecutor must be calling getAvailability() of ResourceComponent no mater if it previously failed or not. Additional info:
in master commit 937cb29ee5450da0bcf04d8e9952310de400e90b Author: Libor Zoubek <lzoubek> Date: Thu Apr 17 11:47:43 2014 +0200 [BZ 1088264] AvailabilityExecutor stops calling getAvailability() on ResourceComponent after it previously failed with exception The issue was in handling exception comming from future. When availability check failed with exception we cought it, next run, just by calling future.get() raises the very same exception. We forgot to mark future to be rescheduled next time = setting it to null. This commit also makes exception message more verbose so we know more what happened in plugin
I'm not sure, we may have done this on purpose originally, to prevent repeated failures. The component's getAvailability() method should not, in general, throw exceptions. It should return DOWN if it can't connect due to poor plugin configuration. So, I'd say the use case above indicates a bad implementation of getAvailability(). Having said that, this change is probably acceptable. It's more just an implementation decision and perhaps people will prefer it this way.
Bulk closing of RHQ 4.11 issues, now that RHQ 4.12 is out. If you find an issue with those, please open a new BZ, linking to the old one.