Bug 1025819
Summary: | NullPointerException when trying to add metric graph in Dashboard | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [JBoss] JBoss Operations Network | Reporter: | bkramer <bkramer> | ||||||
Component: | UI | Assignee: | John Sanda <jsanda> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | JON 3.2 | CC: | bkramer, jsanda, jshaughn, loleary, mfoley, mithomps, myarboro, theute | ||||||
Target Milestone: | ER03 | ||||||||
Target Release: | JON 3.3.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2014-12-11 14:02:22 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
bkramer
2013-11-01 16:41:35 UTC
This is a backend issue. bkramer, can you please provide the server.log file? How many storage nodes are you running? I first need to understand why the node has a cluster status of down. Based on the NPE, it looks like the server cannot send talk to the storage node for reading/writing metric data. Created attachment 819022 [details]
Requested server.log file
(In reply to John Sanda from comment #2) > This is a backend issue. bkramer, can you please provide the server.log > file? How many storage nodes are you running? I first need to understand why > the node has a cluster status of down. Based on the NPE, it looks like the > server cannot send talk to the storage node for reading/writing metric data. I have attached my server.log file (server.zip). Yes, I think that something is wrong with my installation. Initially, I had a problem to get JON 3.2 working so I had to re-install it using different IP bind address. As a result, when I checked my storage nodes in Administration, I had two storage nodes - where one was down with the cluster status UP and the other one - the one that was additionally installed was UP but with the cluster status DOWN. bkramer, Can you describe your initial installation steps as well as the steps you took after using a different bind address? And was this all done on the same machine? Created attachment 819141 [details]
Requested rhq-storage-installation.log and rhq-storage.log
(In reply to John Sanda from comment #5) > bkramer, > > Can you describe your initial installation steps as well as the steps you > took after using a different bind address? And was this all done on the same > machine? I followed the steps given in the Installation Guide: https://access.redhat.com/site/documentation/en-US/Red_Hat_JBoss_Operations_Network/3.2/html-single/Installation_Guide/index.html#install-script I run the script as root user. Before that, I made few changes in rhq-server.properties file: # PostgreSQL database rhq.server.database.connection-url=jdbc:postgresql://127.0.0.1:5432/jon320 rhq.server.database.db-name=jon320 ... rhq.server.high-availability.name=10.33.63.231 ... jboss.bind.address=10.33.63.231 ... rhq.autoinstall.public-endpoint-address=10.33.63.231 Installation was done with no problem but when I tried to start all three components (storage node, server and agent), I got a message: Unable to bind to address /10.36.6.85:7100. Set listen_address in cassandra.yaml to an interface you can bind to, e.g., your private IP address on EC2 where 10.36.6.85 is not address that I specified in my rhq-server.properties file. After that I removed installed jon-server-3.2.0.ER3 and did installation again - but using bkramerlt.usersys.redhat.com (still resolves to 10.33.63.231) and this time, I managed to get both JON server and storage node connected and UP but my storage node had cluster status down. There was another node there with the cluster status normal but this storage node had down availability (and that was previously installed storage node - so my reinstall was not clean - at least for storage). After I reinstalled everything - deleted and recreated database, removed jon-server-3.2.0.ER3, rhq-agent and rhq-data folders and installed again using rhqctl script everything worked fine. Storage node was up with the cluster status: sufficient. No exceptions were logged and I was able to add metric chart to the dashboard - with both X and Y axis defined. Should we close then ? Not sure if we should close. Although the environment was in a bad state, the fact that the NPE can occur leads me to believe that there is a bug here. The system should be tolerant to errors and attempt to recover from them if possible. In the event recovery is not possible, a detailed error message should be provided to the user to suggest a fix. If we have no steps to reproduce this won't help. At least it shouldn't hold 3.2.0 release (In reply to Thomas Heute from comment #11) > If we have no steps to reproduce this won't help. Although steps to reproduce would be nice, a NPE is pretty straight forward. It should never happen. However, from talking to jsanda it sounds like this may have been due to a storage node getting added to inventory that no longer existed. > At least it shouldn't hold 3.2.0 release Agreed. I suggest we re-triage this for 3.2.x. This certainly works, I do it all the time. Setting to ON_QA for a sanity check. |