Bug 720794

Summary: it takes a long time to import a large number of Resources
Product: [Other] RHQ Project Reporter: Ian Springer <ian.springer>
Component: Core ServerAssignee: Robert Buck <rbuck>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: medium Docs Contact:
Priority: high    
Version: 4.1CC: ccrouch, hrupp, mfoley
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-07 19:21:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 717358, 722548    
Attachments:
Description Flags
Diffs to offload server to client comm to background quartz job; reduces user perceived latency from 15s+ to 3s. none

Description Ian Springer 2011-07-12 19:08:01 UTC
It took me about 10 minutes to import 1500 Resources (300 platforms and 1200 top-level servers). Assuming the import time scales linearly, it would take more than 10 seconds to import any more than 25 Resources. Since we are aiming to have all GUI pages load in less than 10 seconds, and 25 is a fairly small number of Resources, we may want to try to improve the performance here.

Comment 1 Ian Springer 2011-08-18 20:11:13 UTC
*** Bug 717257 has been marked as a duplicate of this bug. ***

Comment 2 Ian Springer 2011-08-18 20:15:12 UTC
Note, Heiko reports that importing an AS7 domain-controller Resource takes much longer than 10 seconds (I'm presuming because it has a lot of descendant services). That is a very basic use case that demonstrates this issue.

Comment 3 Ian Springer 2011-09-06 21:09:18 UTC
A solution for this would be to split Resource import into two parts:

1) the call to importResources() would flip all of the NEW Resources to a new COMMITTING inventory status and then return.
2) a background job would periodically scan for COMMITTING Resources and do the real work necessary to commit them to inventory (syncing to Agents, etc.) and flip them to COMMITTED status.

This would allow the GUI to return very quickly after the user clicks the Import button to import a set of Resources. It could then display a "Import of 207 Resources initiated." message, and the Resources would no longer be listed on the autodiscovery queue view, since they would no longer be NEW. The bad part is the GUI would not know when the import had fully completed and so would not be able to display another message to inform the user the import completed.

Comment 4 Ian Springer 2011-09-16 15:25:50 UTC
Note, we already support importing Resources from an Agent that is currently down.

Comment 5 Ian Springer 2011-09-16 15:33:27 UTC
Rather than introducing a new COMMITTING inventory status, the finishCommit background job could probably use an existing field to determine if a COMMITTED Resource has not been fully committed (i.e. synced to its Agent) yet:

1) if (!resource.isConnected())
2) if (resource.getUuid() == null)

Comment 6 Ian Springer 2011-09-16 15:38:08 UTC
I just noticed when I imported Resources from an Agent that was down, I got an ugly stack trace in the Server log:

11:21:02,019 ERROR [ClientCommandSenderTask] {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[synchronizeInventory], targetInterfaceName=org.rhq.core.clientapi.agent.discovery.DiscoveryAgentService}]]. Cause: org.jboss.remoting.CannotConnectException:Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://127.0.0.1:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] -> java.net.ConnectException:Connection refused. Cause: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://127.0.0.1:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000]
11:21:02,021 WARN  [DiscoveryBossBean] Could not perform commit synchronization with agent for platform [jetengine]
org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://127.0.0.1:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000]
	at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.transport(MicroSocketClientInvoker.java:579)
	at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:122)
	at org.jboss.remoting.Client.invoke(Client.java:1634)
	at org.jboss.remoting.Client.invoke(Client.java:548)
	at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.rawSend(JBossRemotingRemoteCommunicator.java:514)
	at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.sendWithoutCallbacks(JBossRemotingRemoteCommunicator.java:456)
	at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.sendWithoutInitializeCallback(JBossRemotingRemoteCommunicator.java:475)
	at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.send(JBossRemotingRemoteCommunicator.java:496)
	at org.rhq.enterprise.communications.command.client.AbstractCommandClient.invoke(AbstractCommandClient.java:143)
	at org.rhq.enterprise.communications.command.client.ClientCommandSender.send(ClientCommandSender.java:1087)
	at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.send(ClientCommandSenderTask.java:229)
	at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:107)
	at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:55)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
	at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
	at java.net.Socket.connect(Socket.java:529)
	at org.jboss.remoting.transport.socket.SocketClientInvoker.createSocket(SocketClientInvoker.java:192)
	at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.getConnection(MicroSocketClientInvoker.java:827)
	at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.transport(MicroSocketClientInvoker.java:569)
	... 17 more


Since this is a known (and handled) condition, we should not be logging an error or a stack trace. Instead we should just log a warning.

Comment 7 Robert Buck 2011-10-18 21:21:27 UTC
Created attachment 528896 [details]
Diffs to offload server to client comm to background quartz job; reduces user perceived latency from 15s+ to 3s.

Comment 8 Robert Buck 2011-10-19 14:10:02 UTC
QA: Please make sure to test this in HA mode. Thanks.

Comment 9 Robert Buck 2011-10-19 14:20:34 UTC
commit fe75f0f04101c110a722317515043cf063099bd8
Author: Robert Buck <rbuck>
Date:   2011-10-19 10:08:59 -0400

[BZ 720794] Decrease user perceived latency when importing lots of resources by scheduling all server-agent communication as a background quartz task.

Comment 10 Mike Foley 2011-10-26 15:07:25 UTC
observing no functional or performance issues with import of resources.

Comment 11 Mike Foley 2012-02-07 19:21:20 UTC
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE