Bug 649057

Summary: JON241: Agent availability reports to server grossly oversized
Product: [Other] RHQ Project Reporter: Charles Crouch <ccrouch>
Component: AgentAssignee: Charles Crouch <ccrouch>
Status: CLOSED CURRENTRELEASE QA Contact: Corey Welton <cwelton>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.0.0CC: greghinkle, hbrock, rtimaniy, sdharane
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 645502 Environment:
Last Closed: 2011-05-24 01:13:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 616081    

Comment 1 Charles Crouch 2010-11-02 20:36:54 UTC
Assigning to Joseph for backporting

Comment 2 Joseph Marques 2010-11-11 23:53:19 UTC
commit 42c2d3ed8b17b97aee15d51539619d433610a217
Author: Joseph Marques <joseph>
Date:   Thu Nov 11 18:51:45 2010 -0500

BZ-649057: re-introduce AvailabilityReport customized serialization for performance
    
* replace payload List<Availability> with List<AvailabiltyReport.Datum>
* agent-side calls to getResourceAvailability() need to perform lookups
  for the corresponding ResourceContainer to print additional data
* server-side calls to getResourceAvailability() need to translate back to
  List<Availability> with an attached fly-weight resource to mirror the
  previously existing method semantics
    
misc:
    
* remove no-arg constructor for AvailabilityReport, which used to be needed
  to satisfy the Externalizable interface
* remove commented out readExternal/writeExternal methods
* add toString() method for AvailabilityReport.Datum, which was needed as
  part of the toString() impl for AvailabilityReport itself
    
note:
   
* specifically did not add override for equals(Object) in Datum because it's
  only needed in InventoryManager.handleReport(AvailabilityReport) where
  Collection.remove() is called; the default reference-equals should suffice

Comment 3 Rajan Timaniya 2010-11-15 08:16:15 UTC
Joseph,

Can you please provide steps to test the bug?

Comment 4 Joseph Marques 2010-11-19 19:20:09 UTC
There are a couple things to test:

1) while the system is in steady that, that there aren't any exceptions in either the agent log or server log that indicate serialization issues when dealing with availability data
2) start the agent with the interactive console, and force an availability report to be sent up to the server.  first test by sending a partial report, then test by sending a full report.  both of these should complete successfully without any exceptions bring printed to the agent log or server log that indicate serialization issues for availability data
3) take the agent down and wait 5-10 minutes for the suspect agent job to trigger.  this will come along and mark all resources managed by that agent as down.  all resources managed by that agent should be marked down/red in the web UI, and there should be no exceptions in the server log concerning execution of this job.

Comment 5 Sudhir D 2010-11-23 14:31:20 UTC
per comment# 4, below is the results for jon-server-2.4.1-SNAPSHOT build# 50f4c45

1. There were no exception in either agent or server log when dealing with availability data.

2.  Below are the log snippet
agent log:
2010-11-23 19:22:34,614 INFO  [RHQ Agent Prompt Input Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.prompt-command-invoked}Prompt command invoked: [avail, --changed]
2010-11-23 19:22:34,775 INFO  [RHQ Agent Prompt Input Thread] (rhq.core.pc.inventory.InventoryManager)- Sending availability report to Server...

2010-11-23 19:41:42,257 INFO  [RHQ Agent Prompt Input Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.prompt-command-invoked}Prompt command invoked: [avail]
2010-11-23 19:41:42,384 INFO  [RHQ Agent Prompt Input Thread] (rhq.core.pc.inventory.InventoryManager)- Sending availability report to Server...

server log:
2010-11-23 19:22:34,853 INFO  [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Processed AV:[dhcp6-150][302][full] - need full=[false] in (76)ms

3. There were no errors in the log file after suspect agent job was triggered.
2010-11-23 19:59:16,411 INFO  [org.rhq.enterprise.server.core.AgentManagerBean] Have not heard from agent [dhcp6-150] since [Tue Nov 23 19:43:20 IST 2010]. Will be backfilled since we suspect it is down
2010-11-23 20:00:00,007 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Data Purge Job STARTING
2010-11-23 20:00:00,008 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Measurement data compression starting at Tue Nov 23 20:00:00 IST 2010
2010-11-23 20:00:00,019 INFO  [org.rhq.enterprise.server.measurement.MeasurementCompressionManagerBean] Begin compression from [RHQ_MEAS_DATA_NUM_R08] to [RHQ_MEASUREMENT_DATA_NUM_1H]
2010-11-23 20:00:00,020 INFO  [org.rhq.enterprise.server.measurement.MeasurementCompressionManagerBean] Begin compressing data from table [RHQ_MEAS_DATA_NUM_R08] to table [RHQ_MEASUREMENT_DATA_NUM_1H] between [11/23/10 6:30:00 PM] and [11/23/10 7:30:00 PM]
2010-11-23 20:00:00,052 INFO  [org.rhq.enterprise.server.measurement.MeasurementCompressionManagerBean] Finished compressing data from table [RHQ_MEAS_DATA_NUM_R08] to table [RHQ_MEASUREMENT_DATA_NUM_1H] between [11/23/10 6:30:00 PM] and [11/23/10 7:30:00 PM], [937] compressed rows in [0] seconds
2010-11-23 20:00:00,074 INFO  [org.rhq
  :
  :
  :

Marking the bug verified.

Comment 9 Corey Welton 2011-05-24 01:13:51 UTC
Bookkeeping - closing bug - fixed in recent release.

Comment 10 Corey Welton 2011-05-24 01:13:51 UTC
Bookkeeping - closing bug - fixed in recent release.

Comment 11 Corey Welton 2011-05-24 01:13:51 UTC
Bookkeeping - closing bug - fixed in recent release.