Bug 649057 - JON241: Agent availability reports to server grossly oversized
JON241: Agent availability reports to server grossly oversized
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Agent (Show other bugs)
4.0.0
Unspecified Unspecified
urgent Severity high (vote)
: ---
: ---
Assigned To: Charles Crouch
Corey Welton
:
Depends On:
Blocks: jon241-bugs
  Show dependency treegraph
 
Reported: 2010-11-02 16:36 EDT by Charles Crouch
Modified: 2015-02-01 18:26 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 645502
Environment:
Last Closed: 2011-05-23 21:13:51 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 1 Charles Crouch 2010-11-02 16:36:54 EDT
Assigning to Joseph for backporting
Comment 2 Joseph Marques 2010-11-11 18:53:19 EST
commit 42c2d3ed8b17b97aee15d51539619d433610a217
Author: Joseph Marques <joseph@redhat.com>
Date:   Thu Nov 11 18:51:45 2010 -0500

BZ-649057: re-introduce AvailabilityReport customized serialization for performance
    
* replace payload List<Availability> with List<AvailabiltyReport.Datum>
* agent-side calls to getResourceAvailability() need to perform lookups
  for the corresponding ResourceContainer to print additional data
* server-side calls to getResourceAvailability() need to translate back to
  List<Availability> with an attached fly-weight resource to mirror the
  previously existing method semantics
    
misc:
    
* remove no-arg constructor for AvailabilityReport, which used to be needed
  to satisfy the Externalizable interface
* remove commented out readExternal/writeExternal methods
* add toString() method for AvailabilityReport.Datum, which was needed as
  part of the toString() impl for AvailabilityReport itself
    
note:
   
* specifically did not add override for equals(Object) in Datum because it's
  only needed in InventoryManager.handleReport(AvailabilityReport) where
  Collection.remove() is called; the default reference-equals should suffice
Comment 3 Rajan Timaniya 2010-11-15 03:16:15 EST
Joseph,

Can you please provide steps to test the bug?
Comment 4 Joseph Marques 2010-11-19 14:20:09 EST
There are a couple things to test:

1) while the system is in steady that, that there aren't any exceptions in either the agent log or server log that indicate serialization issues when dealing with availability data
2) start the agent with the interactive console, and force an availability report to be sent up to the server.  first test by sending a partial report, then test by sending a full report.  both of these should complete successfully without any exceptions bring printed to the agent log or server log that indicate serialization issues for availability data
3) take the agent down and wait 5-10 minutes for the suspect agent job to trigger.  this will come along and mark all resources managed by that agent as down.  all resources managed by that agent should be marked down/red in the web UI, and there should be no exceptions in the server log concerning execution of this job.
Comment 5 Sudhir D 2010-11-23 09:31:20 EST
per comment# 4, below is the results for jon-server-2.4.1-SNAPSHOT build# 50f4c45

1. There were no exception in either agent or server log when dealing with availability data.

2.  Below are the log snippet
agent log:
2010-11-23 19:22:34,614 INFO  [RHQ Agent Prompt Input Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.prompt-command-invoked}Prompt command invoked: [avail, --changed]
2010-11-23 19:22:34,775 INFO  [RHQ Agent Prompt Input Thread] (rhq.core.pc.inventory.InventoryManager)- Sending availability report to Server...

2010-11-23 19:41:42,257 INFO  [RHQ Agent Prompt Input Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.prompt-command-invoked}Prompt command invoked: [avail]
2010-11-23 19:41:42,384 INFO  [RHQ Agent Prompt Input Thread] (rhq.core.pc.inventory.InventoryManager)- Sending availability report to Server...

server log:
2010-11-23 19:22:34,853 INFO  [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Processed AV:[dhcp6-150][302][full] - need full=[false] in (76)ms

3. There were no errors in the log file after suspect agent job was triggered.
2010-11-23 19:59:16,411 INFO  [org.rhq.enterprise.server.core.AgentManagerBean] Have not heard from agent [dhcp6-150] since [Tue Nov 23 19:43:20 IST 2010]. Will be backfilled since we suspect it is down
2010-11-23 20:00:00,007 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Data Purge Job STARTING
2010-11-23 20:00:00,008 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Measurement data compression starting at Tue Nov 23 20:00:00 IST 2010
2010-11-23 20:00:00,019 INFO  [org.rhq.enterprise.server.measurement.MeasurementCompressionManagerBean] Begin compression from [RHQ_MEAS_DATA_NUM_R08] to [RHQ_MEASUREMENT_DATA_NUM_1H]
2010-11-23 20:00:00,020 INFO  [org.rhq.enterprise.server.measurement.MeasurementCompressionManagerBean] Begin compressing data from table [RHQ_MEAS_DATA_NUM_R08] to table [RHQ_MEASUREMENT_DATA_NUM_1H] between [11/23/10 6:30:00 PM] and [11/23/10 7:30:00 PM]
2010-11-23 20:00:00,052 INFO  [org.rhq.enterprise.server.measurement.MeasurementCompressionManagerBean] Finished compressing data from table [RHQ_MEAS_DATA_NUM_R08] to table [RHQ_MEASUREMENT_DATA_NUM_1H] between [11/23/10 6:30:00 PM] and [11/23/10 7:30:00 PM], [937] compressed rows in [0] seconds
2010-11-23 20:00:00,074 INFO  [org.rhq
  :
  :
  :

Marking the bug verified.
Comment 9 Corey Welton 2011-05-23 21:13:51 EDT
Bookkeeping - closing bug - fixed in recent release.
Comment 10 Corey Welton 2011-05-23 21:13:51 EDT
Bookkeeping - closing bug - fixed in recent release.
Comment 11 Corey Welton 2011-05-23 21:13:51 EDT
Bookkeeping - closing bug - fixed in recent release.

Note You need to log in before you can comment on or make changes to this bug.