Bug 1002210 - Too many exceptions in server.log
Too many exceptions in server.log
Status: ON_QA
Product: RHQ Project
Classification: Other
Component: Core Server (Show other bugs)
4.9
All Linux
unspecified Severity high (vote)
: ---
: ---
Assigned To: John Sanda
Mike Foley
:
Depends On: 1019807 1019841
Blocks: 951619
  Show dependency treegraph
 
Reported: 2013-08-28 11:47 EDT by Armine Hovsepyan
Modified: 2015-09-02 20:03 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Armine Hovsepyan 2013-08-28 11:47:09 EDT
Description of problem:
Too many exceptions in server.log

Version-Release number of selected component (if applicable):
e2a1811

How reproducible:
very frequently 

Steps to Reproduce:
1. install and start rhq server, storage and agent on ip1
2. Stop-started storage in ip1
3. install and start storage and agent on ip2
4. stop-started or restarted storage in ip1
5. undeploy storage in ip2
6. deploy storage on ip2 again 
7. run prepare for bootstrap operation 
8. cancel prepare bootstrap operation
9. run prepare for bootstrap operation
10. run add node maintenance operation

Additional info:

***   EJB Invocation failed on component MeasurementDataManagerBean  with a stack trace -- when no host is available to connect to --- I would expect an exception handling here when server is in maintenance mode.

***   Failed to get live availability.: java.lang.IllegalStateException with a stack trace -- when one of the agents is not available to connect to --- I would expect an exception handling.

***   Sending exception to client: [1377693404302] : org.rhq.enterprise.server.resource.ResourceNotFoundException: A Resource with id 10591 does not exist in inventory with a stack trace -- couldn't found action performed before - I would expect an exception handling.

***    EJB Invocation failed on component OperationManagerBean for method public abstract void - with a stack trace  -- when a bootstrap operation cancellation is performed -- I would expect an exception handling here.


 server.log uploaded here for detailed investigation -> http://d.pr/f/iqZj
Comment 1 John Sanda 2013-08-28 13:23:16 EDT
Some of the exceptions in the server log are due to bug 1002238. Other exceptions like com.datastax.driver.core.exceptions.UnavailableException can occur while trying to read/write metrics when a node is being added to or removed from the cluster and the cluster is being rebalanced. com.datastax.driver.core.exceptions.NoHostAvailableException is thrown when we try to read/write metrics when the storage cluster is down. These are both RuntimeExceptions and they are getting wrapped in an EJBException which is resulting is a much larger (than necessary) stack trace. 

The following will help clean things up a bit. I will add a new StorageException class that wraps those C* exceptions and make it an application exception. Then we will get a stack trace that does not include all of the internal, container calls. This will help a lot with debugging.
Comment 2 John Sanda 2013-08-29 16:00:56 EDT
I have made some changes to reduce the noise in server.log. From my commit message:

There were some methods in MeasurementDataManagerBean with default transaction
support, but they should be NOT_SUPPORTED since they read/write to and from
Cassandra. This will help reduce stacktraces because when exceptions bubble up
from those methods they will no longer get wrapped in EJBExceptions.

When an error occurs while inserting raw data, we are no longer logging the
full exception. There is a better than likely change that if an exception
occurs for one write, it will ocurr for several. Logging each of the exceptions
resulted in a lot of noise in the logs. Now only the error message is logged. 
The full exception will be logged with DEBUG logging.

master commit hash: 98c76cebf

These changes should be in build 2596 of the rhq-master job.
Comment 3 Armine Hovsepyan 2013-08-30 16:05:01 EDT
update: new time-out exceptions in server.log --http://pastebin.test.redhat.com/161393 --- will update bug as soon as reproduced.
Comment 4 John Sanda 2013-08-31 12:52:21 EDT
The description lists a few different exceptions. As I mentioned comment 1, one of the exceptions is related to bug 1002238. The other exceptions are addressed by the commit 98c76cebf cited in comment 2. The error cited in comment 3 is unrelated, and I would rather if necessary call that out in a separate BZ. I do not want this BZ to become a catch-all bucket for errors that appear in the server log.
Comment 5 John Sanda 2013-08-31 13:01:19 EDT
I have opened bug 1003191 to track the issue cited in comment 3.

Note You need to log in before you can comment on or make changes to this bug.