Description of problem: Hawkular Server fails to start due to low disk space but not giving descriptive error message in logs. Hawkular Service is not started but did gave low disk space error in logs because Casandra was already stoped due to low disk space with fatal Error. And Hawkular is giving Casandra related Error. Version-Release number of selected component (if applicable): CFME CR1 How reproducible: Steps to Reproduce: 1. Start the cassandra container and hawkular services container using quick start guide. https://docs.engineering.redhat.com/display/JP/CloudForms+Middleware+-+Quickstart+Guide#CloudFormsMiddleware-QuickstartGuide-HawkularServices 2. Keep the cassandra container runnning and stop the hawkular services container. 3.Check disk space availability with "df -h" Use a combination of "fallocate -l 5G 5GB.img" and "fallocate -l 500M 5M.img" to create large files so that there's very less space left) 4. Start a new hawkular services container (docker run <imageid of hawkular-services> ) with configuration consisting of Inventory data being stored in Postgres (HAWKULAR_INVENTORY_JDBC_URL, HAWKULAR_INVENTORY_JDBC_USERNAME, HAWKULAR_INVENTORY_JDBC_PASSWORD) Actual results: HS is giving Error as given below: ******************************** FATAL [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200006: An error occurred trying to connect to the Cassandra cluster: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency SERIAL (1 replica we re required but only 0 acknowledged the write) at org.hawkular.metrics.api.jaxrs.DistributedLock.lockAndThen(DistributedLock.java:111) at org.hawkular.metrics.api.jaxrs.DistributedLock.lockAndThen(DistributedLock.java:95) at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.initJobsService(MetricsServiceLifecycle.java:499) at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.startMetricsService(MetricsServiceLifecycle.java:337) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency SERIAL (1 replica were required but only 0 acknowledged the write) ******************************** Expected results: hawkular services upgrade fails due to less disk space, it should be failed gracefully with a descriptive error message in logs. Additional info: Casandra logs are also not giving any low disk space related ERROR in logs.
Created attachment 1232207 [details] HS logs
Created attachment 1232208 [details] Casandra logs
Attached logs images, HS_logs.png and Casandra_logs.png
Can we get the actual log files please?
If the only client-side error we get is a WriteTimeoutException, I am not sure there is a whole lot we can do to provide a more descriptive error message. Without additional information, there is no way to know that the write timeout is caused by lack of disk space.
Created attachment 1233777 [details] Casandra Latest Log
Created attachment 1233778 [details] Hawkular latest log
Added logs after replicating issue.
Why in a production environment would a user try to deploy a database with such limited disk space? Shouldn't this be a matter of documentation? Installation fails because Cassandra crashes due to lack of disk space. I am not sure how gracefully we can handle that. Now if Hawkular is up and running for some time, and if we then we run out of disk space, that is an entirely different matter which should be handled.