Bug 649057
Summary: | JON241: Agent availability reports to server grossly oversized | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Charles Crouch <ccrouch> |
Component: | Agent | Assignee: | Charles Crouch <ccrouch> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Corey Welton <cwelton> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.0.0 | CC: | greghinkle, hbrock, rtimaniy, sdharane |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | 645502 | Environment: | |
Last Closed: | 2011-05-24 01:13:51 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 616081 |
Comment 1
Charles Crouch
2010-11-02 20:36:54 UTC
commit 42c2d3ed8b17b97aee15d51539619d433610a217 Author: Joseph Marques <joseph> Date: Thu Nov 11 18:51:45 2010 -0500 BZ-649057: re-introduce AvailabilityReport customized serialization for performance * replace payload List<Availability> with List<AvailabiltyReport.Datum> * agent-side calls to getResourceAvailability() need to perform lookups for the corresponding ResourceContainer to print additional data * server-side calls to getResourceAvailability() need to translate back to List<Availability> with an attached fly-weight resource to mirror the previously existing method semantics misc: * remove no-arg constructor for AvailabilityReport, which used to be needed to satisfy the Externalizable interface * remove commented out readExternal/writeExternal methods * add toString() method for AvailabilityReport.Datum, which was needed as part of the toString() impl for AvailabilityReport itself note: * specifically did not add override for equals(Object) in Datum because it's only needed in InventoryManager.handleReport(AvailabilityReport) where Collection.remove() is called; the default reference-equals should suffice Joseph, Can you please provide steps to test the bug? There are a couple things to test: 1) while the system is in steady that, that there aren't any exceptions in either the agent log or server log that indicate serialization issues when dealing with availability data 2) start the agent with the interactive console, and force an availability report to be sent up to the server. first test by sending a partial report, then test by sending a full report. both of these should complete successfully without any exceptions bring printed to the agent log or server log that indicate serialization issues for availability data 3) take the agent down and wait 5-10 minutes for the suspect agent job to trigger. this will come along and mark all resources managed by that agent as down. all resources managed by that agent should be marked down/red in the web UI, and there should be no exceptions in the server log concerning execution of this job. per comment# 4, below is the results for jon-server-2.4.1-SNAPSHOT build# 50f4c45 1. There were no exception in either agent or server log when dealing with availability data. 2. Below are the log snippet agent log: 2010-11-23 19:22:34,614 INFO [RHQ Agent Prompt Input Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.prompt-command-invoked}Prompt command invoked: [avail, --changed] 2010-11-23 19:22:34,775 INFO [RHQ Agent Prompt Input Thread] (rhq.core.pc.inventory.InventoryManager)- Sending availability report to Server... 2010-11-23 19:41:42,257 INFO [RHQ Agent Prompt Input Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.prompt-command-invoked}Prompt command invoked: [avail] 2010-11-23 19:41:42,384 INFO [RHQ Agent Prompt Input Thread] (rhq.core.pc.inventory.InventoryManager)- Sending availability report to Server... server log: 2010-11-23 19:22:34,853 INFO [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Processed AV:[dhcp6-150][302][full] - need full=[false] in (76)ms 3. There were no errors in the log file after suspect agent job was triggered. 2010-11-23 19:59:16,411 INFO [org.rhq.enterprise.server.core.AgentManagerBean] Have not heard from agent [dhcp6-150] since [Tue Nov 23 19:43:20 IST 2010]. Will be backfilled since we suspect it is down 2010-11-23 20:00:00,007 INFO [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Data Purge Job STARTING 2010-11-23 20:00:00,008 INFO [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Measurement data compression starting at Tue Nov 23 20:00:00 IST 2010 2010-11-23 20:00:00,019 INFO [org.rhq.enterprise.server.measurement.MeasurementCompressionManagerBean] Begin compression from [RHQ_MEAS_DATA_NUM_R08] to [RHQ_MEASUREMENT_DATA_NUM_1H] 2010-11-23 20:00:00,020 INFO [org.rhq.enterprise.server.measurement.MeasurementCompressionManagerBean] Begin compressing data from table [RHQ_MEAS_DATA_NUM_R08] to table [RHQ_MEASUREMENT_DATA_NUM_1H] between [11/23/10 6:30:00 PM] and [11/23/10 7:30:00 PM] 2010-11-23 20:00:00,052 INFO [org.rhq.enterprise.server.measurement.MeasurementCompressionManagerBean] Finished compressing data from table [RHQ_MEAS_DATA_NUM_R08] to table [RHQ_MEASUREMENT_DATA_NUM_1H] between [11/23/10 6:30:00 PM] and [11/23/10 7:30:00 PM], [937] compressed rows in [0] seconds 2010-11-23 20:00:00,074 INFO [org.rhq : : : Marking the bug verified. Bookkeeping - closing bug - fixed in recent release. Bookkeeping - closing bug - fixed in recent release. Bookkeeping - closing bug - fixed in recent release. |