Created attachment 1111903 [details] attached screen shots Description of problem: In JBoss ON environment with big number of resources, then number of child and/or descendent resources is not correct all the time. Version-Release number of selected component (if applicable): JBoss ON 3.3.4 How reproducible: Always - for big number of imported resources and big number of groups; Steps to Reproduce: 1. Install JBoss ON 3.3.4 and at least 3 agents; 2. Import big number of resources (I used rhq-perftest-plugin-4.9.0.jar - so in my case I started all agents with "./rhq-agent.sh -Drhq.perftest.scenario=configurable-1 -Drhq.perftest.server-a-count=200 -Drhq.perftest.service-a-count=100" ); 3. Create two dynagroups: 3.1.) name: Server Type description: Group by Server Type expression: resource.type.category = SERVER groupby resource.type.name recursive: yes calculate: 5 minutes 3.2.) name: Server Name description: Group by Server Name expression: resource.type.category = SERVER groupby resource.name recursive: yes calculate: 5 minutes 4. Navigate to All Groups page and check the content of the page. Actual results: Attached screen shots that show the issue. Initially, as servers and services were not discovered at the same time, the numbers of servers/services were different. But, after few calculations of dynagroups, the page All Groups was as in incorrectNoResources_1.png (see attached images). In this screen shot, most of the dynagroups show correct number of children (3) and descendants (303) - except for server-a-152 which for some reason shows 3 available and 1 unknown in the children column (this cannot be true). Then - IncorrectNoResources_2.png shows the same server-a-152 server but also server-a-94 with only 1 unknown child resource and 303 descendants (and it should be 3 available child resources - as seen in IncorrectNoResources_6.png). When "Refresh" button is pressed for the first time (see IncorrectNoResources_3.png) server-a-115 showed 600 children and 303 descendants (the first number is clearly wrong), and then IncorrectNoResources_4.png after next refresh button where server-a-115 shows 3 child and 303 descendants. Also, in the same IncorrectNoResources_4.png see the numbers for server-a-161 - they look correct - 3 child and 303 descendants. After Refresh button is pressed again I got the situation as in IncorrectNoResources_5.png where the number of resources for server-a-115 is correct but for server-a-161 not (as now this server shows 600 child resources and 303 descendants). Expected results: Every dynagroup/compatible group in All Groups/Compatible Groups page shows correct number of child/descendants resources. Additional info: It may happen that this is caused with work done on Bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1244941.
Additionally, see attached DynaGroup_-_Server_Type__RHQ_Storage_Node_.png - it shows 3 child resources and I only have one storage node. Interesting, when I press "Refresh" button, for a few milliseconds I can see 1 child resource but then it changes to 3. :-(
Created attachment 1111906 [details] DynaGroup_-_Server_Type__RHQ_Storage_Node_.png
One more thing - I used the same Server Name dynagroup but without "groupby resource.name" (see attached ServerName_def.png). When group was calculated, there was only one DynaGroup - Server Name created (as expected) (same screen shot - ServerName_def.png). However, there were differences between values shown on "Mixed Groups" and "All Groups" page (see ServerName_1.png and ServerName_2.png).
Created attachment 1112470 [details] ServerName screen shots
Hi Biljana, do you still have agent logs? I was trying to reproduce the issue and agents failed with java.lang.OutOfMemoryError: Java heap space during discovery scan. I'm just guessing but this weird bahavior could be maybe caused by overloaded agent and failures during discovery scan. I tried to use less resources (200 servers with 10 services) and groups were working as expected. The OutOfMemoryError exception on agent is visible in JON 3.3.3 as well.
(In reply to Filip Brychta from comment #6) > Hi Biljana, do you still have agent logs? I was trying to reproduce the > issue and agents failed with java.lang.OutOfMemoryError: Java heap space > during discovery scan. I'm just guessing but this weird bahavior could be > maybe caused by overloaded agent and failures during discovery scan. > I tried to use less resources (200 servers with 10 services) and groups were > working as expected. > The OutOfMemoryError exception on agent is visible in JON 3.3.3 as well. Yes, you will have to increase Xms and Xmx to get agent working without OOM errors. I was generous and set heap of my agent to 2G. Heap for JON Server should be increased as well although I didn't change this (and default is 1G I think) and I didn't have any problem.
I increased the heap but I'm still not able to reproduce it even with 200 servers with 100 services. All groups contain correct count of children and descendants. Targeting to 3.3.6 as this will require more investigation.
After discussion with Biljana I was able to reproduce the issue. The issue shows up after uninventory/inventory of resources (I uninventoried whole platform). Until then the numbers of child/descendants were correct.
The behavior may be expected when an uninventory is performed. The uninventory operation is done asynchronously meaning that resources are not removed immediately. With a large number of resources it could take several minutes for an uninventory to complete. In some cases, 20 or 30 minutes depending on what history is associated with each resource.
That could explain incorrect count of children vs. descendants, but another thing which should be explained is different count of children shown on All groups page vs. real count of children when you open the group and navigate to Inventory->Members tab. e.g. All groups page shows that given group has 1 child but when you navigate to group's Members tab you see there 3 resources.
I don't think it's as simple as a delay in uninventory of resources. The errors are inconcistent, some groups show as having a lot more members than actual, and some a lot less. Others, like Biljana's examples show hundreds of members of a group that should contain only a few. We have a relatively large environment with 450 platforms each running one or more EAP instances. There are automated jobs to import new resources, and since this is Test-env, a job to remove unavailable platforms after some time. This could have explained some inconsistences but not at the amounts we are seeing.
Hello, has there been any update on this case? Since this affects the Groups view in JON later than 3.3.3 we are unable to go to production with these versions. If there is a need for more information to be able to reproduce the issue I am very willing to help. Also as information; for the support case I created a simple CLI script to list groups and the member count, and this always returns the correct number of servers, so the error must lie in how the UI calculates the same value. Code: var criteria = new ResourceGroupCriteria; criteria.fetchExplicitResources(true); var groupList = ResourceGroupManager.findResourceGroupsByCriteria(criteria); for ( var i = 0; i < groupList.size(); i++ ) { var group = groupList.get(i); var groupName = group.name; var groupSize = group.explicitResources.size(); println(groupName + ", " + groupSize + " members"); }
The UI uses a non-public API for this operation. That operation fetches a list of explicit and a list of implicit results and to merges them together. It assumes that the order is the same on both lists. On my tests, the lists weren't always on the same order, so the resources counts were mixed (i.e. if i had 2 JVM and 1 Agent I could end up with a view with 1 JVM and 2 Agents). Treating the order as unknown fixed the problem on my env.
commit 1bef38230893da8d940f5f545b99eb20ad7fdd19 Merge: 4e8ddcb f39302f Author: Michael Burman <yak> Date: Thu May 12 14:47:22 2016 +0300 Merge pull request #223 from josejulio/bugs/1295863 Bug 1295863 - The number of resources in All Groups/Compatible Groups... commit f39302f62c15a0635571b0c778250974b7a9349f Author: Josejulio Martínez <jmartine> Date: Wed Mar 16 18:45:31 2016 -0600 Bug 1295863 - The number of resources in All Groups/Compatible Groups page is not correct all the time We can't assume that the order is the same.
Moving to ON_QA as available to test with JON 3.3.6 DR01 brew build: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=499890
Created attachment 1171627 [details] Sample of 'All Groups' page in JON 3.3.6 GUI Sample of 'All Groups' page in JON 3.3.6 GUI - for compartment with CLI script output
Created attachment 1171629 [details] Sample of 'All Groups, Compatible' page in JON 3.3.6 GUI Sample of 'All Groups, Compatible' page in JON 3.3.6 GUI
Created testing environment of JON and 6 instances of EAP (6/7, domain/standalone). Create 2 dynamic groups as ordered. Output of CLI script (for comparing with attached screenshots): ===================================================== rhqadmin@localhost:7080$ exec -f dump.cli DynaGroup - Groups by platform ( Linux ), 7 members DynaGroup - All resources currently down, 5 members DynaGroup - All RHQ Agent resources in inventory, 7 members DynaGroup - Managed EAP7 Servers in server-group ( eap7 domain,main-server-group ), 2 members DynaGroup - Server Name ( EAP (0.0.0.0:10090) ), 2 members DynaGroup - Server Name ( RHQ Agent ), 7 members DynaGroup - Server Name ( EAP 7 (0.0.0.0:10090) ), 3 members DynaGroup - Server Type ( RHQ Storage Node ), 1 members DynaGroup - Server Name ( EAP server-two ), 1 members DynaGroup - Server Name ( RHQ Storage Node(vso-jon-latest.bc.jonqe.lab.eng.bos.redhat.com) ), 1 members DynaGroup - Server Name ( rhq ), 3 members DynaGroup - Server Name ( EAP (127.0.0.1:6990) RHQ Server ), 1 members DynaGroup - Server Type ( RHQ Agent ), 7 members DynaGroup - Server Type ( RHQ Agent JVM ), 7 members DynaGroup - Server Type ( EAP7 Standalone Server ), 3 members DynaGroup - Server Type ( Managed Server ), 3 members DynaGroup - Server Name ( JVM ), 8 members DynaGroup - Server Name ( EAP server-one ), 1 members DynaGroup - Server Name ( EAP server-three ), 1 members DynaGroup - Server Name ( EAP 7 Domain Controller (master 0.0.0.0:8990) ), 1 members DynaGroup - Server Type ( JBossAS7 Standalone Server ), 3 members DynaGroup - Managed EAP7 Servers in domain ( eap7 domain ), 3 members DynaGroup - Managed EAP7 Servers in server-group ( eap7 domain,other-server-group ), 1 members DynaGroup - Server Type ( EAP7 Host Controller ), 1 members DynaGroup - Server Type ( Cassandra Server JVM ), 1 members DynaGroup - Server Type ( Postgres Server ), 3 members ================================================================ Numbers of children tested by clicking on <item>, them on <item>/Inventory - looks like quite correct. Screenshots attached. The number of resources in All Groups/Compatible Groups page is correct for moment of testing. BZ -> verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-1519.html