Description of problem: free-int cluster, one ES pod is in CrashLoopBackOff status, checked logs, exception is "Caused by: java.io.IOException: No space left on device" logging-es-data-master-f26an8nv-17-v24hw 2/2 Running 2 11d logging-es-data-master-t7rrl3te-10-5z8cx 2/2 Running 2 11d logging-es-data-master-w6l9n07t-10-lvvc4 1/2 CrashLoopBackOff 151 11d [2018-08-15 03:24:41,596][INFO ][container.run ] Checking if Elasticsearch is ready on https://localhost:9200 2018-08-15 03:24:45,111 main ERROR Unable to write to stream /elasticsearch/persistent/logging-es/logs/logging-es.log for appender rolling: org.apache.logging.log4j.core.appender.AppenderLoggingException: Error writing to stream /elasticsearch/persistent/logging-es/logs/logging-es.log 2018-08-15 03:24:45,113 main ERROR An exception occurred processing Appender rolling org.apache.logging.log4j.core.appender.AppenderLoggingException: Error writing to stream /elasticsearch/persistent/logging-es/logs/logging-es.log at org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:264) at org.apache.logging.log4j.core.appender.FileManager.writeToDestination(FileManager.java:261) at org.apache.logging.log4j.core.appender.rolling.RollingFileManager.writeToDestination(RollingFileManager.java:219) at org.apache.logging.log4j.core.appender.OutputStreamManager.flushBuffer(OutputStreamManager.java:294) at org.apache.logging.log4j.core.appender.OutputStreamManager.flush(OutputStreamManager.java:303) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:179) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:170) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:161) at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:308) at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156) at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129) at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120) at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84) at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:448) at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:433) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:403) at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63) at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:146) at org.apache.logging.log4j.spi.ExtendedLoggerWrapper.logMessage(ExtendedLoggerWrapper.java:217) at org.elasticsearch.common.logging.PrefixLogger.logMessage(PrefixLogger.java:102) at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2116) at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2100) at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:1994) at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1966) at org.apache.logging.log4j.spi.AbstractLogger.info(AbstractLogger.java:1303) at org.elasticsearch.node.Node.<init>(Node.java:254) at org.elasticsearch.node.Node.<init>(Node.java:245) at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:233) at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:233) at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:342) at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:132) at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:123) at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:70) at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:134) at org.elasticsearch.cli.Command.main(Command.java:90) at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) Caused by: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:262) ... 37 more Version-Release number of selected component (if applicable): logging images version: v3.11.0-0.10.0 How reproducible: Always Steps to Reproduce: 1. Check ES pod logs. 2. 3. Actual results: "java.io.IOException: No space left on device" for one es pod Expected results: Additional info:
Created attachment 1476062 [details] IOException: No space left on device
There is an issue with the available node and the ES pod in question: 1 node(s) were not ready, 1 node(s) were out of disk space, 15 Insufficient memory, 16 node(s) didn't match node selector, 2 node(s) were unschedulable, 8 Insufficient cpu.
I managed to get access to the ES PV and wiped it clean. The pod is now running: [root@free-int-master-3c664 ~]# oc get pods NAME READY STATUS RESTARTS AGE logging-es-data-master-f26an8nv-17-v24hw 2/2 Running 6 13d logging-es-data-master-t7rrl3te-10-5z8cx 2/2 Running 6 12d logging-es-data-master-w6l9n07t-10-wqnxj 2/2 Running 0 37m
https://github.com/openshift/openshift-ansible/pull/9663
Moving component and updating title to reflect missing rollover params in the deployment.
Move to verified. 1) The log are rolling as expected when log4j2.properties are configured. -rw-r--r--. 1 1000120000 1000120000 8.4M Aug 29 10:53 anlitest.log -rw-r--r--. 1 1000120000 1000120000 11M Aug 29 10:53 logging-es-2018-08-29.log -rw-r--r--. 1 1000120000 1000120000 2.9K Aug 29 10:48 logging-es_deprecation.log -rw-r--r--. 1 1000120000 1000120000 0 Aug 29 10:47 logging-es_index_indexing_slowlog.log -rw-r--r--. 1 1000120000 1000120000 0 Aug 29 10:47 logging-es_index_search_slowlog.log [anli@upg_slave_qeos10 311rsyslog]$ oc rsh logging-es-data-master-4y5wzs8t-1-4pvvp ls -lh /elasticsearch/persistent/logging-es/logs Defaulting container name to elasticsearch. Use 'oc describe pod/logging-es-data-master-4y5wzs8t-1-4pvvp -n openshift-logging' to see all of the containers in this pod. total 11M -rw-r--r--. 1 1000120000 1000120000 336K Aug 29 10:53 anlitest.log
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652
(In reply to Aditya Deshpande from comment #14) > Hello, > > IHAC using OCP 3.9 and facing a similar issue. > Do we have to backport of this fix? This was introduced into 3.9 as part of: https://bugzilla.redhat.com/show_bug.cgi?id=1568361 > > Also, the customer is asking that as per solution described here > https://github.com/openshift/openshift-ansible/pull/9663/files#diff- > 60a291fe55d2965aefe7aa6e5018f658 > Do they need to append the following params on ES config-map, section > logging.yml and re-deploy the POD? > appender.rolling.policies.size.type=SizeBasedTriggeringPolicy > appender.rolling.policies.size.size=100MB > appender.rolling.strategy.type=DefaultRolloverStrategy > appender.rolling.strategy.max=5 > > Or is there any workaround possible as the customer does not want to upgrade > the cluster? One would have to either manually modify the logging-elasticsearch configmap to include the proper configuration block [1] and restart or you can set environment variables and restart per [2] [1] https://github.com/jcantrill/openshift-log4jextras [2] https://github.com/openshift/origin-aggregated-logging/pull/1127/files#diff-05261c9e7776c31e9b2fed5a68db6a3aR43