Created attachment 433952 [details] Patch described 1) Uninventory of a resource first gets the whole resource ancestor tree inclusive config objects just to get the resource for a resource id 2) Uninventory queries the database for all children in an expensive query just to rerun the expensive part again for an update Attached patch brings down uninventory time for 1 resource (service) from ~ 12-20sec to ~6sec for 1 server with > 1000 children This has been tested in perf env and locally on postgres with a 1700 resources inventory.
Needs triage
Positive side effect: no more Tx deadlock even on uninventoying 90 servers with >1000 children each. Before (no reliably reproducible) this ended up in ORA-0600 (or -0060 ?) Deadlock exceptions
The following query was doing > 50% activity on the database for quite some time during uninv of each resource. This is now gone with that patch. update RHQ_RESOURCE set INVENTORY_STATUS=:1 , AGENT_ID=null, PARENT_RESOURCE_ID=null, RESOURCE_KEY='deleted' where ID=:2 orID in (select resource1_.ID from RHQ_RESOURCE resource1_ inner join RHQ_RESOURCE resource2_ onresource1_.PARENT_RESOURCE_ID=resource2_.ID where resource2_.ID=:3 ) or ID in (select resource3_.ID from RHQ_RESOURCEresource3_ inner join RHQ_RESOURCE resource4_ on resource3_.PARENT_RESOURCE_ID=resource4_.ID inner join RHQ_RESOURCEresource5_ on resource4_.PARENT_RESOURCE_ID=resource5_.ID where resource5_.ID=:4 ) or ID in (select resource6_.ID fromRHQ_RESOURCE resource6_ inner join RHQ_RESOURCE resource7_ on resource6_.PARENT_RESOURCE_ID=resource7_.ID inner joinRHQ_RESOURCE resource8_ on resource7_.PARENT_RESOURCE_ID=resource8_.ID inner join RHQ_RESOURCE resource9_ onresource8_.PARENT_RESOURCE_ID=resource9_.ID where resource9_.ID=:5 ) or ID in (select resource10_.ID from RHQ_RESOURCEresource10_ inner join RHQ_RESOURCE resource11_ on resource10_.PARENT_RESOURCE_ID=resource11_.ID inner join RHQ_RESOURCEresource12_ on resource11_.PARENT_RESOURCE_ID=resource12_.ID inner join RHQ_RESOURCE resource13_ onresource12_.PARENT_RESOURCE_ID=resource13_.ID inner join RHQ_RESOURCE resource14_ onresource13_.PARENT_RESOURCE_ID=resource14_.ID where resource14_.ID=:6 ) or ID in (select resource15_.ID from RHQ_RESOURCEresource15_ inner join RHQ_RESOURCE resource16_ on resource15_.PARENT_RESOURCE_ID=resource16_.ID inner join RHQ_RESOURCEresource17_ on resource16_.PARENT_RESOURCE_ID=resource17_.ID inner join RHQ_RESOURCE resource18_ onresource17_.PARENT_RESOURCE_ID=resource18_.ID inner join RHQ_RESOURCE resource19_ onresource18_.PARENT_RESOURCE_ID=resource19_.ID inner join RHQ_RESOURCE resource20_ onresource19_.PARENT_RESOURCE_ID=resource20_.ID where resource20_.ID=:7 ) or ID in (select resource21_.ID from RHQ_RESOURCEresource21_ inner join RHQ_RESOURCE resource22_ on resource21_.PARENT_RESOURCE_ID=resource22_.ID inner join RHQ_RESOURCEresource23_ on resource22_.PARENT_RESOURCE_ID=resource23_.ID inner join RHQ_RESOURCE resource24_ onresource23_.PARENT_RESOURCE_ID=resource24_.ID inner join RHQ_RESOURCE resource25_ onresource24_.PARENT_RESOURCE_ID=resource25_.ID inner join RHQ_RESOURCE resource26_ onresource25_.PARENT_RESOURCE_ID=resource26_.ID inner join RHQ_RESOURCE resource27_ onresource26_.PARENT_RESOURCE_ID=resource27_.ID where resource27_.ID=:8 )
This is an optimization, push into master only.
Created attachment 433981 [details] jProfiler output before/after patch + comparision
(In reply to comment #5) > Created an attachment (id=433981) [details] > jProfiler output before/after patch + comparision Forgot that this is on my local box, uninventorying 1750 resources on 1 platform (so uninv. the whole platform).
d3ecc7e in master (RHQ 4)
Setting on OA, as this is already in master (see previous comment)
Heiko, Can you please give steps to reproduce/test bug and confirm that the fix is in JON 2.4 GA?
Rajan, this is master only, so will be in RHQ 4, but is not in JON 2.4.GA
To try to reproduce: - take a platform with multiple servers and mulitple services per server into inventory. Best are > 1000 services. E.g. take a postgres server, create some logical databases and for each create 1100 tables (via a script). Then take this into inventory. When all is settled, uninventory the whole platform.
Verified on RHQ-Master build #423 (http://hudson-qe.rhq.rdu.redhat.com:8080/view/RHQ/job/ci-rhq-master/423/) Steps: 1) Installed RHQ server and agent 2) Agent box has configured multiple servers (JBoss-eap 5.0 has 1048 services and Postgres has 2008 services) 3) Make sure that all are into inventory and settled 4) Uninventory whole platform Observation: Platform uninventory with all its resources within in 3-4-sec. 2010-10-19 16:44:27,372 INFO [org.rhq.enterprise.server.cloud.instance.CacheConsistencyManagerBean] 10.65.193.1 took [37]ms to reload cache for 1 agents 2010-10-19 16:44:31,480 INFO [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Processed AV:[rajantest][329][full] - need full=[false] in (233)ms 2010-10-19 16:44:35,918 INFO [org.rhq.enterprise.server.resource.ResourceManagerBean] User [org.rhq.core.domain.auth.Subject[id=2,name=rhqadmin]] is marking resource [Resource[id=10331, type=Linux, key=rajantest, name=rajantest, parent=<null>, version=Linux 2.6.18-164.el5]] for asynchronous uninventory
Bulk closing of issues that were VERIFIED, had no target release and where the status changed more than a year ago.