Bug 1544424

Summary: NPE - Error updating MeasurementSchedules for Agent
Product: [JBoss] JBoss Operations Network Reporter: Filip Brychta <fbrychta>
Component: Core Server, AgentAssignee: Michael Burman <miburman>
Status: CLOSED ERRATA QA Contact: Filip Brychta <fbrychta>
Severity: medium Docs Contact:
Priority: high    
Version: JON 3.3.10CC: loleary, spinder
Target Milestone: ER01Keywords: Regression, Triaged
Target Release: JON 3.3.11   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-16 17:06:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
complete server log none

Description Filip Brychta 2018-02-12 12:34:02 UTC
Created attachment 1394891 [details]
complete server log

Description of problem:
Following error is visible in server.log after update to 3.3.10:
07:24:57,002 ERROR [org.rhq.enterprise.server.measurement.MeasurementScheduleManagerBean] (pool-6-thread-1) Error updating MeasurementSchedules for Agent[id=10001]: : java.lang.NullPointerException

Version-Release number of selected component (if applicable):
JON3.3.10

How reproducible:
3/3

Steps to Reproduce:
1. Install JON3.3.0.GA
2. update to JON 3.3.10 (jon-server-3.3.0.GA-update-10/apply-updates.sh jon-server-3.3.0.GA)
3. grep "ERROR" jon-server-3.3.0.GA/logs/server.log*

Actual results:
Error in server.log:
07:24:57,002 ERROR [org.rhq.enterprise.server.measurement.MeasurementScheduleManagerBean] (pool-6-thread-1) Error updating MeasurementSchedules for Agent[id=10001]: : java.lang.NullPointerException
        at org.rhq.enterprise.server.measurement.MeasurementScheduleManagerBean.sendUpdatedSchedulesToAgent(MeasurementScheduleManagerBean.java:976) [rhq-server.jar:4.12.0.JON330GA-redhat-9]
        at org.rhq.enterprise.server.measurement.MeasurementScheduleManagerBean.modifyDefaultCollectionIntervalForMeasurementDefinitions(MeasurementScheduleManagerBean.java:559) [rhq-server.jar:4.12.0.JON330GA-redhat-9]
        at org.rhq.enterprise.server.measurement.MeasurementScheduleManagerBean.updateDefaultCollectionIntervalAndEnablementForMeasurementDefinitions(MeasurementScheduleManagerBean.java:374) [rhq-server.jar:4.12.0.JON330GA-redhat-9]


Expected results:
No errors

Additional info:
The issue is not visible in 3.3.9
The exception is thrown again even after next restart of jon server

Full server.log attached

Comment 1 Filip Brychta 2018-02-12 12:36:35 UTC
Agent[id=10001] - confusing, agent does not have this id, id 10001 is id of platform

Comment 2 Michael Burman 2018-02-12 12:37:52 UTC
I've seen this happen on the master also, so it's not just 3.3.10

Comment 3 Michael Burman 2018-02-12 14:42:43 UTC
For the agentId wondering, the agentId is correct. 10001 for the platform is a resource_id, not agent_id. 

This bug happens because of 1488179 fix forcing the update of schedules. Now, what happens when starting the RHQ server is that the Agent connection isn't available yet in the AgentClient instance (our LookupManager can't find it).

In that case we put the update information to the backlog and mark the information as something that needs to be updated at the agent side. So the operation does happen, but it does not happen enough early.

So not dangerous, but of course the NPE is a little bit annoying. We could change the process by directing it to the backlog instantly if the connection isn't available (fix is quite easy).

Comment 4 Michael Burman 2018-02-12 14:54:35 UTC
Fixed in the master:

commit f6078a1fa47339503ad8f3de185477646b880a1a (HEAD -> master, upstream/master, origin/master, origin/HEAD)
Author: Michael Burman <miburman>
Date:   Mon Feb 12 16:52:50 2018 +0200

    [BZ 1544424] Add check for null agentClient which means it is not online

Comment 8 errata-xmlrpc 2018-10-16 17:06:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2930