Bug 1405092 - Hawkular Server fails to start due to low disk space but not giving descriptive error message in logs.
Summary: Hawkular Server fails to start due to low disk space but not giving descripti...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Middleware Manager
Classification: JBoss
Component: Metrics
Version: 7.0.0 TP2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: John Sanda
QA Contact: Prachi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-15 14:58 UTC by Prachi
Modified: 2018-01-04 15:35 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-04 15:35:41 UTC
Embargoed:


Attachments (Terms of Use)
HS logs (373.86 KB, image/png)
2016-12-15 15:03 UTC, Prachi
no flags Details
Casandra logs (199.83 KB, image/png)
2016-12-15 15:04 UTC, Prachi
no flags Details
Casandra Latest Log (209.54 KB, text/plain)
2016-12-20 10:44 UTC, Prachi
no flags Details
Hawkular latest log (88.00 KB, text/plain)
2016-12-20 10:48 UTC, Prachi
no flags Details

Description Prachi 2016-12-15 14:58:44 UTC
Description of problem: Hawkular Server fails to start due to low disk space but not giving descriptive error message in logs.

Hawkular Service is not started but did gave low disk space error in logs because Casandra was already stoped due to low disk space with fatal Error. And Hawkular is giving Casandra related Error.


Version-Release number of selected component (if applicable): CFME CR1


How reproducible:


Steps to Reproduce:
1. Start the cassandra container and hawkular services container using quick start guide.
https://docs.engineering.redhat.com/display/JP/CloudForms+Middleware+-+Quickstart+Guide#CloudFormsMiddleware-QuickstartGuide-HawkularServices

2. Keep the cassandra container runnning and stop the hawkular services container.

3.Check disk space availability with "df -h"
Use a combination of "fallocate -l 5G 5GB.img" and "fallocate -l 500M 5M.img" to create large files so that there's very less space left)

4. Start a new hawkular services container (docker run <imageid of hawkular-services> ) with configuration consisting of Inventory data being stored in Postgres (HAWKULAR_INVENTORY_JDBC_URL, HAWKULAR_INVENTORY_JDBC_USERNAME, HAWKULAR_INVENTORY_JDBC_PASSWORD)


Actual results: HS is giving Error as given below:

********************************
  FATAL [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200006: An error occurred trying to connect to the Cassandra cluster: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency SERIAL (1 replica we
 re required but only 0 acknowledged the write)
  at org.hawkular.metrics.api.jaxrs.DistributedLock.lockAndThen(DistributedLock.java:111)
  at org.hawkular.metrics.api.jaxrs.DistributedLock.lockAndThen(DistributedLock.java:95)
  at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.initJobsService(MetricsServiceLifecycle.java:499)
  at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.startMetricsService(MetricsServiceLifecycle.java:337)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency SERIAL (1 replica were required but only 0 acknowledged the write)
********************************




Expected results: hawkular services upgrade fails due to less disk space, it should be failed gracefully with a descriptive error message in logs.


Additional info: Casandra logs are also not giving any low disk space related ERROR in logs.

Comment 2 Prachi 2016-12-15 15:03:58 UTC
Created attachment 1232207 [details]
HS logs

Comment 3 Prachi 2016-12-15 15:04:37 UTC
Created attachment 1232208 [details]
Casandra logs

Comment 4 Prachi 2016-12-15 15:05:29 UTC
Attached logs images, HS_logs.png and Casandra_logs.png

Comment 5 John Sanda 2016-12-16 15:24:39 UTC
Can we get the actual log files please?

Comment 6 John Sanda 2016-12-16 15:40:34 UTC
If the only client-side error we get is a WriteTimeoutException, I am not sure there is a whole lot we can do to provide a more descriptive error message. Without additional information, there is no way to know that the write timeout is caused by lack of disk space.

Comment 7 Prachi 2016-12-20 10:44:00 UTC
Created attachment 1233777 [details]
Casandra Latest Log

Comment 8 Prachi 2016-12-20 10:48:16 UTC
Created attachment 1233778 [details]
Hawkular latest log

Comment 9 Prachi 2016-12-20 10:49:41 UTC
Added logs after replicating issue.

Comment 10 John Sanda 2016-12-20 14:56:37 UTC
Why in a production environment would a user try to deploy a database with such limited disk space? Shouldn't this be a matter of documentation? Installation fails because Cassandra crashes due to lack of disk space. I am not sure how gracefully we can handle that. Now if Hawkular is up and running for some time, and if we then we run out of disk space, that is an entirely different matter which should be handled.


Note You need to log in before you can comment on or make changes to this bug.