Hide Forgot
Description of problem: When you go to the UI and update the measurement schedules for a resource (or a group of resources) by changing collection interval/enabling/disabling schedules, on the agent side the list of measurement schedules for that resource becomes broken. The new list contains only the schedules that were modified. Even after restarting the agent, the schedules list is still not in sync with the server. Version-Release number of selected component: 4.0.1 Steps to Reproduce: 1. Go to RHQ Server UI->Inventory, select a Resource <R> 2. Go to the Monitoring->Schedules panel, select a resource metric <M> 3. Change the collection interval, then apply Set 4. Start RHQ agent command line and check the schedules by using the command "inventory --xml" Actual results: For resource <R>, on the agent side, the list of schedules contains only those corresponding to metric <M>. The other schedules are missing. Expected results: Have on the agent side the same list of schedules like on the server side. Additional info: I think the problem is in the MeasurementScheduleManagerBean class on the method named "notifyAgentsOfScheduleUpdates". The following line: agentClient.getMeasurementAgentService().scheduleCollection(requestsToSend); should be changed (in order to use the updateCollection method) with the following line: agentClient.getMeasurementAgentService().updateCollection(requestsToSend); A possible way to re-sync the agent schedules is using the command "inventory --sync" on the agent command line.
Bob, can you validate this issue is not a problem with current code in master
I just tested this in the latest code in master; if this was a bug in a prior release, it isn't anymore. Here is how I tested this: (1) create a new compat group, name it "test" (2) select available resource "RHQ Agent" (3) after clicking finish, select item, select monitoring, schedules (4) then choose 1 metric from the list, and set the collection interval e.g. "number of commands successfully sent", change 10 minutes to 30 seconds The results before and after were identical except for the updated metric that was changed. Furthermore, I got the correct number of requests too; the number of NumberTotalCommandsReceived increased from 1 to 3. Agent Metrics: AgentHomeDirectory: /some/path/to/agents/local/rhq-agent AgentServerClockDifference: 3 AverageExecutionTimeReceived: 0 AverageExecutionTimeSent: 21 CurrentTime: Fri Sep 30 12:09:29 EDT 2011 JVMActiveThreads: 34 Memory - Heap: Used: 43.37 MB, Committed: 80.81 MB, Max: 119.34 MB Memory - Non Heap: Used: 25.44 MB, Committed: 40.60 MB, Max: 117.44 MB NumberAgentRestarts: 1 NumberCommandsActiveSent: 0 NumberCommandsInQueue: 0 NumberCommandsSpooled: 0 NumberFailedCommandsReceived: 0 NumberFailedCommandsSent: 0 NumberSuccessfulCommandsReceived: 3 NumberSuccessfulCommandsSent: 108 NumberTotalCommandsReceived: 3 NumberTotalCommandsSent: 108 ReasonForLastRestart: PROCESS_START Sending: true Uptime: 26.2 minutes (1573) Version: 4.1.0-SNAPSHOT Glad to have tested this one. I learned a bit about the agent command line. Thanks.
Hi, I've seen that you used the "metrics" command line which returns the Agent Metrics. This doesn't say that the measurement schedules are ok. Have you tried the "inventory --xml" command line ? This will show you the current running measurement schedules. This bug I can reproduce it also on version 4.1.0. I tried the same test like and checked the schedules using "inventory --xml". The result is: [...] <resource> <id>10642</id> <key>test RHQ Agent</key> <name>RHQ Agent</name> <version>4.1.0</version> [....] <description>RHQ Management Agent</description> <inventory-status>COMMITTED</inventory-status> <type>RHQ Agent</type> <availabilityType>UP</availabilityType> <category>Server</category> <container> <availability>Availability[id=0,type=UP,start-time=Mon Oct 03 18:15:16 EEST 2011,end-time=null]</availability> <state>STARTED</state> <installedPackageCount>0</installedPackageCount> <schedules> <schedule> <schedule-id>14458</schedule-id> <name>NumberSuccessfulCommandsSent</name> <enabled>true</enabled> <interval>30000</interval> </schedule> </schedules> </container> [...] As you can see there is only one schedule, the one which was modified, but before the modification there were many more. This means the agent will send to server only the measurements for metric "NumberSuccessfulCommandsSent". The others are not sent anymore.
I retested this. Yes, I see the issue now. Thanks.
commit e10381457c043f083de542e7cae4c210dfefd658 Author: Robert Buck <rbuck> Date: 2011-10-05 09:25:08 -0400 Remove the unnecessary workaround for jdk 1.5 as we no longer support that and later jdks have the patch that resolves the underlying issues in priority queue remove methods. commit 171ac69f6a524b6c262246bf5853a5c296c611f4 Author: Robert Buck <rbuck> Date: 2011-10-04 15:58:46 -0400 [BZ 741971] Agent measurement schedules list becomes broken after a change on the UI; the resource container code replaced the prior collection with the subset. Instead, we simply need to update (always). The PC code should probably be refactored or cleaned up sometime. I think several of us are in agreement the code is weak.
commit b7293451bbabb825092a5d3ccfb1699850ad82b0 commit e50eb33156ebfe29436690c751932b79f6476991
verified build #476 by following the reproduction steps. documenting the verification by documenting the output of inventory --xml for the Network Adapter Resource (which is the resource I changed the measurement schedule on) <resource> <id>10015</id> <key>eth0</key> <name>eth0</name> <version></version> <uuid>9ae17e8f-c9f1-4902-9d44-3f7b2ea7958d</uuid> <mtime>1317844898804</mtime> <mtime-date>Wed Oct 05 16:01:38 EDT 2011</mtime-date> <description>BC:30:5B:BB:4E:9A</description> <inventory-status>COMMITTED</inventory-status> <type>Network Adapter</type> <availabilityType>UP</availabilityType> <category>Service</category> <container> <availability>Availability[id=0,type=UP,start-time=Wed Oct 05 16:21:22 EDT 2011,end-time=null]</availability> <state>STARTED</state> <installedPackageCount>0</installedPackageCount> <schedules> <schedule> <schedule-id>10131</schedule-id> <name>rxPackets</name> <enabled>false</enabled> <interval>600000</interval> </schedule> <schedule> <schedule-id>10187</schedule-id> <name>txErrors</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10203</schedule-id> <name>txOverruns</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10115</schedule-id> <name>Trait.net4.address</name> <enabled>true</enabled> <interval>600000</interval> </schedule> <schedule> <schedule-id>10123</schedule-id> <name>rxBytes</name> <enabled>false</enabled> <interval>600000</interval> </schedule> <schedule> <schedule-id>10163</schedule-id> <name>rxDropped</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10199</schedule-id> <name>txDropped</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10183</schedule-id> <name>rxFrame</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10167</schedule-id> <name>rxDropped</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10195</schedule-id> <name>txDropped</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10139</schedule-id> <name>txBytes</name> <enabled>false</enabled> <interval>600000</interval> </schedule> <schedule> <schedule-id>10211</schedule-id> <name>txCollisions</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10119</schedule-id> <name>Trait.interfaceFlags</name> <enabled>true</enabled> <interval>600000</interval> </schedule> <schedule> <schedule-id>10147</schedule-id> <name>txPackets</name> <enabled>false</enabled> <interval>600000</interval> </schedule> <schedule> <schedule-id>10143</schedule-id> <name>txBytes</name> <enabled>true</enabled> <interval>600000</interval> </schedule> <schedule> <schedule-id>10127</schedule-id> <name>rxBytes</name> <enabled>true</enabled> <interval>420000</interval> </schedule> <schedule> <schedule-id>10135</schedule-id> <name>rxPackets</name> <enabled>true</enabled> <interval>600000</interval> </schedule> <schedule> <schedule-id>10215</schedule-id> <name>txCollisions</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10175</schedule-id> <name>rxOverruns</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10159</schedule-id> <name>rxErrors</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10151</schedule-id> <name>txPackets</name> <enabled>true</enabled> <interval>600000</interval> </schedule> <schedule> <schedule-id>10179</schedule-id> <name>rxFrame</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10171</schedule-id> <name>rxOverruns</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10155</schedule-id> <name>rxErrors</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10207</schedule-id> <name>txOverruns</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10219</schedule-id> <name>txCarrier</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10191</schedule-id> <name>txErrors</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> <schedule> <schedule-id>10223</schedule-id> <name>txCarrier</name> <enabled>false</enabled> <interval>1200000</interval> </schedule> </schedules> </container> <children> </children> </resource> </children> </resource> </inventory>
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE