Bug 741971

Summary: Agent measurement schedules list becomes broken after a change on the UI
Product: [Other] RHQ Project Reporter: Costel C <mulderika>
Component: Core ServerAssignee: Robert Buck <rbuck>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.1CC: hrupp, rbuck
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-07 19:23:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 734807    

Description Costel C 2011-09-28 16:14:18 UTC
Description of problem:
When you go to the UI and update the measurement schedules for a resource (or a group of resources) by changing collection interval/enabling/disabling schedules, on the agent side the list of measurement schedules for that resource becomes broken. The new list contains only the schedules that were modified. Even after restarting the agent, the schedules list is still not in sync with the server.


Version-Release number of selected component:
4.0.1

Steps to Reproduce:
1. Go to RHQ Server UI->Inventory, select a Resource <R>
2. Go to the Monitoring->Schedules panel, select a resource metric <M>
3. Change the collection interval, then apply Set
4. Start RHQ agent command line and check the schedules by using the command "inventory --xml"
  
Actual results:
For resource <R>, on the agent side, the list of schedules contains only those corresponding to metric <M>. The other schedules are missing.

Expected results:
Have on the agent side the same list of schedules like on the server side.

Additional info:
I think the problem is in the MeasurementScheduleManagerBean class on the  method named "notifyAgentsOfScheduleUpdates".  
The following line:
   agentClient.getMeasurementAgentService().scheduleCollection(requestsToSend);

should be changed (in order to use the updateCollection method) with the following line:  
   agentClient.getMeasurementAgentService().updateCollection(requestsToSend); 

A possible way to re-sync the agent schedules is using the command "inventory --sync" on the agent command line.

Comment 1 Charles Crouch 2011-09-30 06:32:47 UTC
Bob, can you validate this issue is not a problem with current code in master

Comment 2 Robert Buck 2011-09-30 16:11:29 UTC
I just tested this in the latest code in master; if this was a bug in a prior release, it isn't anymore. Here is how I tested this:

(1) create a new compat group, name it "test"
(2) select available resource "RHQ Agent"
(3) after clicking finish, select item, select monitoring, schedules
(4) then choose 1 metric from the list, and set the collection interval
    e.g. "number of commands successfully sent", change 10 minutes to 30 seconds

The results before and after were identical except for the updated metric that was changed. Furthermore, I got the correct number of requests too; the number of NumberTotalCommandsReceived increased from 1 to 3.

Agent Metrics:
              AgentHomeDirectory: /some/path/to/agents/local/rhq-agent
      AgentServerClockDifference: 3
    AverageExecutionTimeReceived: 0
        AverageExecutionTimeSent: 21
                     CurrentTime: Fri Sep 30 12:09:29 EDT 2011
                JVMActiveThreads: 34
                   Memory - Heap: Used: 43.37 MB, Committed: 80.81 MB, Max: 119.34 MB
               Memory - Non Heap: Used: 25.44 MB, Committed: 40.60 MB, Max: 117.44 MB
             NumberAgentRestarts: 1
        NumberCommandsActiveSent: 0
           NumberCommandsInQueue: 0
           NumberCommandsSpooled: 0
    NumberFailedCommandsReceived: 0
        NumberFailedCommandsSent: 0
NumberSuccessfulCommandsReceived: 3
    NumberSuccessfulCommandsSent: 108
     NumberTotalCommandsReceived: 3
         NumberTotalCommandsSent: 108
            ReasonForLastRestart: PROCESS_START
                         Sending: true
                          Uptime: 26.2 minutes (1573)
                         Version: 4.1.0-SNAPSHOT

Glad to have tested this one. I learned a bit about the agent command line. Thanks.

Comment 3 Costel C 2011-10-03 15:42:13 UTC
Hi,

I've seen that you used the "metrics" command line which returns the Agent Metrics. This doesn't say that the measurement schedules are ok.

Have you tried the "inventory --xml" command line ? This will show you the current running measurement schedules.

This bug I can reproduce it also on version 4.1.0. 
I tried the same test like and checked the schedules using "inventory --xml".
The result is:

[...]
 <resource>
         <id>10642</id>
         <key>test RHQ Agent</key>
         <name>RHQ Agent</name>
         <version>4.1.0</version>
         [....]
         <description>RHQ Management Agent</description>
         <inventory-status>COMMITTED</inventory-status>
         <type>RHQ Agent</type>
         <availabilityType>UP</availabilityType>
         <category>Server</category>
         <container>
            <availability>Availability[id=0,type=UP,start-time=Mon Oct 03 18:15:16 EEST 2011,end-time=null]</availability>
            <state>STARTED</state>
            <installedPackageCount>0</installedPackageCount>
            <schedules>
               <schedule>
                  <schedule-id>14458</schedule-id>
                  <name>NumberSuccessfulCommandsSent</name>
                  <enabled>true</enabled>
                  <interval>30000</interval>
               </schedule>
            </schedules>
         </container>

[...]

As you can see there is only one schedule, the one which was modified, but before the modification there were many more. This means the agent will send to server only the measurements for metric "NumberSuccessfulCommandsSent". The others are not sent anymore.

Comment 4 Robert Buck 2011-10-04 18:28:35 UTC
I retested this. Yes, I see the issue now. Thanks.

Comment 5 Robert Buck 2011-10-05 13:44:11 UTC
commit e10381457c043f083de542e7cae4c210dfefd658
Author: Robert Buck <rbuck>
Date:   2011-10-05 09:25:08 -0400

Remove the unnecessary workaround for jdk 1.5 as we no longer support that and later jdks have the patch that resolves the underlying issues in priority queue remove methods.


commit 171ac69f6a524b6c262246bf5853a5c296c611f4
Author: Robert Buck <rbuck>
Date:   2011-10-04 15:58:46 -0400

[BZ 741971] Agent measurement schedules list becomes broken after a change on the UI; the resource container code replaced the prior collection with the subset. Instead, we simply need to update (always).

The PC code should probably be refactored or cleaned up sometime. I think several of us are in agreement the code is weak.

Comment 6 Robert Buck 2011-10-05 14:33:09 UTC
commit b7293451bbabb825092a5d3ccfb1699850ad82b0
commit e50eb33156ebfe29436690c751932b79f6476991

Comment 7 Mike Foley 2011-10-05 20:30:58 UTC
verified build #476 by following the reproduction steps.  documenting the verification by documenting the output of inventory --xml for the Network Adapter Resource (which is the resource I changed the measurement schedule on)


      <resource>
         <id>10015</id>
         <key>eth0</key>
         <name>eth0</name>
         <version></version>
         <uuid>9ae17e8f-c9f1-4902-9d44-3f7b2ea7958d</uuid>
         <mtime>1317844898804</mtime>
         <mtime-date>Wed Oct 05 16:01:38 EDT 2011</mtime-date>
         <description>BC:30:5B:BB:4E:9A</description>
         <inventory-status>COMMITTED</inventory-status>
         <type>Network Adapter</type>
         <availabilityType>UP</availabilityType>
         <category>Service</category>
         <container>
            <availability>Availability[id=0,type=UP,start-time=Wed Oct 05 16:21:22 EDT 2011,end-time=null]</availability>
            <state>STARTED</state>
            <installedPackageCount>0</installedPackageCount>
            <schedules>
               <schedule>
                  <schedule-id>10131</schedule-id>
                  <name>rxPackets</name>
                  <enabled>false</enabled>
                  <interval>600000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10187</schedule-id>
                  <name>txErrors</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10203</schedule-id>
                  <name>txOverruns</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10115</schedule-id>
                  <name>Trait.net4.address</name>
                  <enabled>true</enabled>
                  <interval>600000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10123</schedule-id>
                  <name>rxBytes</name>
                  <enabled>false</enabled>
                  <interval>600000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10163</schedule-id>
                  <name>rxDropped</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10199</schedule-id>
                  <name>txDropped</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10183</schedule-id>
                  <name>rxFrame</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10167</schedule-id>
                  <name>rxDropped</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10195</schedule-id>
                  <name>txDropped</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10139</schedule-id>
                  <name>txBytes</name>
                  <enabled>false</enabled>
                  <interval>600000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10211</schedule-id>
                  <name>txCollisions</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10119</schedule-id>
                  <name>Trait.interfaceFlags</name>
                  <enabled>true</enabled>
                  <interval>600000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10147</schedule-id>
                  <name>txPackets</name>
                  <enabled>false</enabled>
                  <interval>600000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10143</schedule-id>
                  <name>txBytes</name>
                  <enabled>true</enabled>
                  <interval>600000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10127</schedule-id>
                  <name>rxBytes</name>
                  <enabled>true</enabled>
                  <interval>420000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10135</schedule-id>
                  <name>rxPackets</name>
                  <enabled>true</enabled>
                  <interval>600000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10215</schedule-id>
                  <name>txCollisions</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10175</schedule-id>
                  <name>rxOverruns</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10159</schedule-id>
                  <name>rxErrors</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10151</schedule-id>
                  <name>txPackets</name>
                  <enabled>true</enabled>
                  <interval>600000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10179</schedule-id>
                  <name>rxFrame</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10171</schedule-id>
                  <name>rxOverruns</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10155</schedule-id>
                  <name>rxErrors</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10207</schedule-id>
                  <name>txOverruns</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10219</schedule-id>
                  <name>txCarrier</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10191</schedule-id>
                  <name>txErrors</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
               <schedule>
                  <schedule-id>10223</schedule-id>
                  <name>txCarrier</name>
                  <enabled>false</enabled>
                  <interval>1200000</interval>
               </schedule>
            </schedules>
         </container>
         <children>
         </children>
      </resource>
   </children>
</resource>
</inventory>

Comment 8 Mike Foley 2012-02-07 19:23:11 UTC
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE