1002210 – Too many exceptions in server.log

Bug 1002210 - Too many exceptions in server.log

Summary: Too many exceptions in server.log

Keywords:
Status:	ON_QA
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Core Server
Sub Component:
Version:	4.9
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Nobody
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1019807 1019841
Blocks:	951619
TreeView+	depends on / blocked

Reported:	2013-08-28 15:47 UTC by Armine Hovsepyan
Modified:	2022-03-31 04:28 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:
Embargoed:

Attachments	(Terms of Use)

Description Armine Hovsepyan 2013-08-28 15:47:09 UTC

Description of problem:
Too many exceptions in server.log

Version-Release number of selected component (if applicable):
e2a1811

How reproducible:
very frequently 

Steps to Reproduce:
1. install and start rhq server, storage and agent on ip1
2. Stop-started storage in ip1
3. install and start storage and agent on ip2
4. stop-started or restarted storage in ip1
5. undeploy storage in ip2
6. deploy storage on ip2 again 
7. run prepare for bootstrap operation 
8. cancel prepare bootstrap operation
9. run prepare for bootstrap operation
10. run add node maintenance operation

Additional info:

***   EJB Invocation failed on component MeasurementDataManagerBean  with a stack trace -- when no host is available to connect to --- I would expect an exception handling here when server is in maintenance mode.

***   Failed to get live availability.: java.lang.IllegalStateException with a stack trace -- when one of the agents is not available to connect to --- I would expect an exception handling.

***   Sending exception to client: [1377693404302] : org.rhq.enterprise.server.resource.ResourceNotFoundException: A Resource with id 10591 does not exist in inventory with a stack trace -- couldn't found action performed before - I would expect an exception handling.

***    EJB Invocation failed on component OperationManagerBean for method public abstract void - with a stack trace  -- when a bootstrap operation cancellation is performed -- I would expect an exception handling here.


 server.log uploaded here for detailed investigation -> http://d.pr/f/iqZj

Comment 1 John Sanda 2013-08-28 17:23:16 UTC

Some of the exceptions in the server log are due to bug 1002238. Other exceptions like com.datastax.driver.core.exceptions.UnavailableException can occur while trying to read/write metrics when a node is being added to or removed from the cluster and the cluster is being rebalanced. com.datastax.driver.core.exceptions.NoHostAvailableException is thrown when we try to read/write metrics when the storage cluster is down. These are both RuntimeExceptions and they are getting wrapped in an EJBException which is resulting is a much larger (than necessary) stack trace. 

The following will help clean things up a bit. I will add a new StorageException class that wraps those C* exceptions and make it an application exception. Then we will get a stack trace that does not include all of the internal, container calls. This will help a lot with debugging.

Comment 2 John Sanda 2013-08-29 20:00:56 UTC

I have made some changes to reduce the noise in server.log. From my commit message:

There were some methods in MeasurementDataManagerBean with default transaction
support, but they should be NOT_SUPPORTED since they read/write to and from
Cassandra. This will help reduce stacktraces because when exceptions bubble up
from those methods they will no longer get wrapped in EJBExceptions.

When an error occurs while inserting raw data, we are no longer logging the
full exception. There is a better than likely change that if an exception
occurs for one write, it will ocurr for several. Logging each of the exceptions
resulted in a lot of noise in the logs. Now only the error message is logged. 
The full exception will be logged with DEBUG logging.

master commit hash: 98c76cebf

These changes should be in build 2596 of the rhq-master job.

Comment 3 Armine Hovsepyan 2013-08-30 20:05:01 UTC

update: new time-out exceptions in server.log --http://pastebin.test.redhat.com/161393 --- will update bug as soon as reproduced.

Comment 4 John Sanda 2013-08-31 16:52:21 UTC

The description lists a few different exceptions. As I mentioned comment 1, one of the exceptions is related to bug 1002238. The other exceptions are addressed by the commit 98c76cebf cited in comment 2. The error cited in comment 3 is unrelated, and I would rather if necessary call that out in a separate BZ. I do not want this BZ to become a catch-all bucket for errors that appear in the server log.

Comment 5 John Sanda 2013-08-31 17:01:19 UTC

I have opened bug 1003191 to track the issue cited in comment 3.

Note You need to log in before you can comment on or make changes to this bug.