Bug 608057
Summary: | Perf: Update of measurement schedule for comp/auto group very slow | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Heiko W. Rupp <hrupp> | ||||||
Component: | Core Server | Assignee: | Joseph Marques <jmarques> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Corey Welton <cwelton> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 3.0.0 | CC: | jmarques | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 2.4 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2010-08-12 16:59:52 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Heiko W. Rupp
2010-06-25 14:31:53 UTC
Created attachment 426904 [details]
Profiler output for comp group with 1200 resources
Created attachment 426905 [details]
Profiler output for auto group with 1200 resources (timed out after 1074 res)
Profiler output was for changing one schedule on 1200 resources - all on one agent Profiler output shows we need to - combine the update queries to larger batches of work - not ping the agent for each schedule, but batch them together per agent and send then one batch per agent. Heiko, that's strange. I recall making updates to this subsystem a few years ago to do just that, to batch all of the updates so that we only had to call out to each agent once. I'm going to investigate this and see when those additions were added, and why they aren't kicking in here. OK, so I *did* correctly recall that I put logic in the SLSB to batch the updates to the agent, and I'm glad to see it's still in the MeasurementScheduleManager today. However, even through the raw ability to batch is there, it's not being used it as well as it could be. Right now, the circle is artificially drawn around each resource. So even if you update ALL of the schedules for a single resource, they will all make it to the agent with a single request. But if you tried to update multiple resources at a time, each of those requests are separate call outs to the agent. Luckily, the way the API is written, this will only require minimal tweaks to correct and use the batching mechanism to its fullest extent. I did not measure with many agents, but in the one agent case, the repeated round trips to the DB seem to kill performance. The two attached screen shots from a profiling session show the weak spots. Yes, these are two birds that will be killed with one stone. If we can change what is getting batched, then (the magnitude of) the number of calls out to the batch API will be reduced. This will cause the number of roundtrips to the DB to decrease by the same magnitude as the number of roundtrips to the agent. commit 2d3f784933a13f1049b78186adacf3a88ae60e58 Author: Joseph Marques <joseph> Date: Sun Jun 27 18:15:39 2010 -0400 BZ-608057: improve performance of measurement schedules updates * refactor workflow to batch updates per agent, not per resource ** if X resources across Y agents, fix yields (X-Y) LESS roundtrips to DB ** if X resources across Y agents, fix yields (X-Y) LESS roundtrips between server and agent This bug has the exact same reproduction procedures as were detailed by Ian Springer here: https://bugzilla.redhat.com/show_bug.cgi?id=535283#c3 This bug was fixed in simultaneous conjunction with functional defects to this subsystem, detailed here: https://bugzilla.redhat.com/show_bug.cgi?id=608487 Thus, verifying either of these bugs actually verifies both of them. ---- Heiko, aside from correctness which can be verified by QA, I'd like to see what kind of improvement this yield in the performance environment. QA Closing/Verified. Mass-closure of verified bugs against JON. |