Description of problem: The auto-groups like 'File Systems' or 'JBossAS Servers' display the availability as green when member resources are in unknown state (when platform is down and resources in unknown state after agent shudown) Please refer the attached screenshot. Version-Release number of selected component (if applicable): Build#1386 (Version: 4.4.0-SNAPSHOT Build Number: 28e565c) How reproducible: Always Steps to Reproduce: 1. Shut down the agent through UI (RHQ Agent->Operations->New->Shutdown Agent) 2. The platform goes down and servers and services go unknown immediately. 3. Navigate to resource tree and click on auto-groups (Ex: CPUs , 'File Systems', 'JBossAS Servers' etc ) 4. Auto-groups display availability as green. Actual results: Auto-groups display availability as green when platform is down (member resources in unknown state) Expected results: Auto-groups should display correct availability status (unknown) Additional info:
Created attachment 579518 [details] Screenshot
Additional info: Observed the same issue with compatible groups. Created compatible group of resources. Shutdown the agent and when platform is down (member resources in unknown state), availability is displayed as green on the compatible group list view in Children and Descendants columns. Please refer Screenshot_GroupListView.
Created attachment 579836 [details] Screenshot_GroupListView
there is clearly a problem for both (I verified the behavior on master). Obviously due to the changes in the avail subsystem. the question is - what should it be? do we show red/down? Or do we show (?) in the UI (that is, the big green check mark is wrong - but should it be the big red circle or should it be a big grey ?) Note that we are talking about groups. In this case, ALL members are unknown avail - which is where the issue is. I think if at least one is down, you'll see a yellow triangle. I need to confirm this. But the fact is - this BZ is reporting what happens when ALL resources are unknown. A second issue would be to confirm that the behavior is OK if only SOME but not all are unknown.
i think the issue is in: org.rhq.core.domain.resource.group.composite.ResourceGroupComposite.getAvailabilityType(boolean) when the count > 0 but down/disabled is both 0 as is the case that we have here: ResourceGroupComposite[name=all agents, implicit[count/down/disabled=,1/0/0], explicit[count/down/disabled=,1/0/0], facets=[CONFIGURATION, PLUGIN_CONFIGURATION, MEASUREMENT, OPERATION, SUPPORT, EVENT]] (that's the toString of the composite that I see in the debugger)
I created a compatible group of 4 CPUs (my platform is a quad-core machine so I have 4 CPUs). I shutdown my agent and so all 4 CPUs are UNKNOWN. I then use a SQL tool to query and update RHQ_RESOURCE_AVAIL table - I just change the availaiblity for one or two of my CPU resources - changing them from 2 (UNKNOWN) to either 0 (DOWN), 1 (UP) or 3 (DISABLED) and I refresh by browser to see how the changes look in the UI. Here's what we see: * CPU #3 and CPU #4 remain UNKNOWN in all tests - I only change #1 or both #1 and #2. * GroupAvailIcon is the big avail icon you see in the summary area when viewing the group (go to the group's Inventory tab for example and look at the top right of the page) * GroupListAvailIcons is the set of icons you see in the table/list when viewing all compatible groups (go to Inventory>CompatibleGroups link) CPU #1 CPU #2 GroupAvailIcon GroupListAvailIcons ======================================================= UNKNOWN UNKNOWN GREEN GREEN (4) DISABLED UNKNOWN DISABLED GREEN (3) DISABLED (1) RED UNKNOWN YELLOW TRIANGLE GREEN (3) RED (1) GREEN UNKNOWN GREEN GREEN (4) GREEN RED YELLOW TRIANGLE GREEN (3) RED (1) GREEN DISABLED DISABLED GREEN (3) DISABLED (1) RED DISABLED YELLOW TRIANGLE GREEN (2) RED (1) DISABLED (1) DISABLED DISABLED DISABLED GREEN (2) DISABLED (2) So its clear that any resource that is UNKNOWN is considered, for all intents and purposes, GREEN or at best ignored and not considered when determining what icons/state to use.
just wanted to document this - this is the SQL used to query the resource group composite: SELECT new org.rhq.core.domain.resource.group.composite.ResourceGroupComposite( ( SELECT COUNT(avail) FROM resourcegroup.explicitResources res JOIN res.currentAvailability avail ) AS explicitCount, ( SELECT COUNT(avail) FROM resourcegroup.explicitResources res JOIN res.currentAvailability avail WHERE avail.availabilityType = 0 ) AS explicitDown, ( SELECT COUNT(avail) FROM resourcegroup.explicitResources res JOIN res.currentAvailability avail WHERE avail.availabilityType = 3 ) AS explicitDisabled, ( SELECT COUNT(avail) FROM resourcegroup.implicitResources res JOIN res.currentAvailability avail ) AS implicitCount, ( SELECT COUNT(avail) FROM resourcegroup.implicitResources res JOIN res.currentAvailability avail WHERE avail.availabilityType = 0 ) AS implicitDown, ( SELECT COUNT(avail) FROM resourcegroup.implicitResources res JOIN res.currentAvailability avail WHERE avail.availabilityType = 3 ) AS implicitDisabled, resourcegroup ) FROM ResourceGroup resourcegroup WHERE ( resourcegroup.groupCategory = :groupCategory AND resourcegroup.visible = :visible ) (NOTE :visible = true and :groupCategory is the "Compatible Group" entity/enum)
this is the actual SQL (not JPQL): select (select count(resourceav3_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre1_, RHQ_RESOURCE resource2_ inner join RHQ_RESOURCE_AVAIL resourceav3_ on resource2_.ID=resourceav3_.RESOURCE_ID where resourcegr0_.ID=explicitre1_.RESOURCE_GROUP_ID and explicitre1_.RESOURCE_ID=resource2_.ID) as col_0_0_, (select count(resourceav6_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre4_, RHQ_RESOURCE resource5_ inner join RHQ_RESOURCE_AVAIL resourceav6_ on resource5_.ID=resourceav6_.RESOURCE_ID where resourcegr0_.ID=explicitre4_.RESOURCE_GROUP_ID and explicitre4_.RESOURCE_ID=resource5_.ID and resourceav6_.AVAILABILITY_TYPE=0) as col_1_0_, (select count(resourceav9_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre7_, RHQ_RESOURCE resource8_ inner join RHQ_RESOURCE_AVAIL resourceav9_ on resource8_.ID=resourceav9_.RESOURCE_ID where resourcegr0_.ID=explicitre7_.RESOURCE_GROUP_ID and explicitre7_.RESOURCE_ID=resource8_.ID and resourceav9_.AVAILABILITY_TYPE=3) as col_2_0_, (select count(resourceav12_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre10_, RHQ_RESOURCE resource11_ inner join RHQ_RESOURCE_AVAIL resourceav12_ on resource11_.ID=resourceav12_.RESOURCE_ID where resourcegr0_.ID=implicitre10_.RESOURCE_GROUP_ID and implicitre10_.RESOURCE_ID=resource11_.ID) as col_3_0_, (select count(resourceav15_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre13_, RHQ_RESOURCE resource14_ inner join RHQ_RESOURCE_AVAIL resourceav15_ on resource14_.ID=resourceav15_.RESOURCE_ID where resourcegr0_.ID=implicitre13_.RESOURCE_GROUP_ID and implicitre13_.RESOURCE_ID=resource14_.ID and resourceav15_.AVAILABILITY_TYPE=0) as col_4_0_, (select count(resourceav18_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre16_, RHQ_RESOURCE resource17_ inner join RHQ_RESOURCE_AVAIL resourceav18_ on resource17_.ID=resourceav18_.RESOURCE_ID where resourcegr0_.ID=implicitre16_.RESOURCE_GROUP_ID and implicitre16_.RESOURCE_ID=resource17_.ID and resourceav18_.AVAILABILITY_TYPE=3) as col_5_0_, resourcegr0_.ID as col_6_0_ from RHQ_RESOURCE_GROUP resourcegr0_ where resourcegr0_.CATEGORY='COMPATIBLE' and resourcegr0_.visible = true
To repeat the current behavior: CPU #1 CPU #2 GroupAvailIcon GroupListAvailIcons ======================================================= UNKNOWN UNKNOWN GREEN GREEN (4) DISABLED UNKNOWN DISABLED GREEN (3) DISABLED (1) RED UNKNOWN YELLOW TRIANGLE GREEN (3) RED (1) GREEN UNKNOWN GREEN GREEN (4) GREEN RED YELLOW TRIANGLE GREEN (3) RED (1) GREEN DISABLED DISABLED GREEN (3) DISABLED (1) RED DISABLED YELLOW TRIANGLE GREEN (2) RED (1) DISABLED (1) DISABLED DISABLED DISABLED GREEN (2) DISABLED (2) Ignoring implementation details, here's what I will try to get the UI to do (remember, this group has *four* members - in all of my tests, CPU resources #3 and #4 are UNKNOWN) CPU #1 CPU #2 GroupAvailIcon GroupListAvailIcons ======================================================= UNKNOWN UNKNOWN YELLOW TRIANGLE* UNKNOWN (4)* DISABLED UNKNOWN YELLOW TRIANGLE* UNKNOWN (3)* DISABLED (1) RED UNKNOWN YELLOW TRIANGLE UNKNOWN (3)* RED (1) GREEN UNKNOWN YELLOW TRIANGLE* UNKNOWN (3)* GREEN (1)* GREEN RED YELLOW TRIANGLE UNKNOWN (2)* GREEN (1)* RED (1) GREEN DISABLED YELLOW TRIANGLE* UNKNOWN (2)* GREEN (1)* DISABLED (1) RED DISABLED YELLOW TRIANGLE UNKNOWN (2)* RED (1) DISABLED (1) DISABLED DISABLED YELLOW TRIANGLE* UNKNOWN (2)* DISABLED (2) Those marked with (*) is the new behavior, different from the original behavior. So you can see here that UNKNOWN is no longer ignored or considered GREEN. The group icon is now going to show the yellow triangle "Warning" icon - which seems more appropriate. If a resource is in an UNKNOWN state, we should visually warn the user that something seems amiss (since, presumably, the user will want to know the state of all his resources - having something whose state is unknown to the monitoring system seems like something the user will not consider normal (or green) - the user should be visually alerted to such a condition.) If nothing is UNKNOWN, this is the behavior: IF.... THEN THE GROUP AVAIL ICON IS.... ============================================================================= All resources DISABLED ... DISABLED All resources RED ... RED All resources GREEN ... GREEN Some resources DISABLED, some RED ... YELLOW TRIANGLE Some resources GREEN, some RED ... YELLOW TRIANGLE Some resources GREEN, some DISABLED ... YELLOW TRIANGLE Some resources GREEN, RED, DISABLED ... YELLOW TRIANGLE If all are UNKNOWN, the group avail icon should be UNKNOWN (today it shows as GREEN, this is something we should change).
I just want to remember this JPQL so I will put it here. I fiddled with group by queries to see if there is a better way, and though I didn't figure out a better way, here's a group by query that we can build on later if we want to further try to get a faster/better set of queries to get group avail: SELECT g.id, ra.availabilityType, count(ra.availabilityType) FROM ResourceGroup g LEFT JOIN g.implicitResources res LEFT JOIN res.currentAvailability ra WHERE g.groupCategory = 'COMPATIBLE' AND g.visible = true GROUP BY g.id, ra.availabilityType
if we do change the avail icons to display, make sure we change the information on it here: http://rhq-project.org/display/RHQ/Design-Availability+Checking#Design-AvailabilityChecking-GroupAvailability http://www.rhq-project.org/display/RHQ/Release+Notes+4.4.0
Making this block the 4.4 release
Created attachment 581415 [details] db performance tests before and after fix I attached a LibreOffice spreadsheet containing data from a series of database performance tests I ran. These were based off of a unit test that will be going in with the fix - I just tweeked the unit test to create different sizes/numbers of groups and the users used to query the data. There is a lot of data in that spreadsheet, but the sheets that are of interest are those whose names are prefixed by "BOTH_". These contain charts of data for testing both with and without the fix for this BZ. With the fix, it was expected that database performance would suffer (since we are adding new subselects to the queries). This data shows how much of a performance hit there is. I ran testing on both Oracle and Postgres and the database vendor doesn't make a difference (which is good and to be expected). Ignore the absolute values of the Oracle tests - Oracle is not tuned and running on my laptop over a wireless network - so the absolute numbers will be slow. The important thing is to look at the relative numbers between testing with and without the fix (this is important for the Postgres data as well - though my Postgres was running on a more powerful machine and was running on the same machine as the unit test JVM so no network latency is involved in the Postgres data. But again, just look at the relative numbers to get a feel for the impact of this fix). In short, the fix introduces some additional slowdown, but nothing that I would consider bad enough for this fix to be rejected. Spreadsheet with the full data is attached for you to examine yourself.
Mazz, I agree, I think for the more intelligent behavior the perf hit is acceptable. It does slow things down, sometimes a little more than I would have guessed, but overall performance is still reasonable, I think.
master commits: 0c1f0a03e4709b89890bedd85b75af9a13dd9142 2567366ed179b4b896b1b501d8d0d50007000ba8 eef406b0d827e8027a81f2ec20ad5e4eba5dfaca
Verified on Version: 3.1.0.BETA1 Build Number: 95ef567:68f5518 Verified that when member resources are in unknown state, the GroupAvailIcon shows the yellow triangle "Warning" icon and the GroupListAvailIcons in the compatible groups list view displays 'unknown' icon.