Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1616169 - Elasticsearch logging missing rollover and max size params which caused out of disk error.
Elasticsearch logging missing rollover and max size params which caused out o...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
3.11.0
Unspecified Unspecified
unspecified Severity medium
: ---
: 3.11.0
Assigned To: ewolinet
Anping Li
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-08-15 03:30 EDT by Junqi Zhao
Modified: 2018-10-11 03:25 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Elasticsearch 5's log4j.properties file did not contain a size rollover or max rollover configuration Consequence: ES logs would continue to rollover and be kept, causing the pod to run out of local storage. Fix: Add in a rollover policy based on file size and define a maximum file count. Result: We correctly see files rolled over based on size and date and removed once a maximum amount has been met.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-10-11 03:24:57 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
IOException: No space left on device (43.47 KB, text/plain)
2018-08-15 03:32 EDT, Junqi Zhao
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 None None None 2018-10-11 03:25 EDT

  None (edit)
Description Junqi Zhao 2018-08-15 03:30:57 EDT
Description of problem:
free-int cluster, one ES pod is in CrashLoopBackOff status, checked logs, exception is "Caused by: java.io.IOException: No space left on device"

logging-es-data-master-f26an8nv-17-v24hw   2/2       Running            2          11d
logging-es-data-master-t7rrl3te-10-5z8cx   2/2       Running            2          11d
logging-es-data-master-w6l9n07t-10-lvvc4   1/2       CrashLoopBackOff   151        11d

[2018-08-15 03:24:41,596][INFO ][container.run            ] Checking if Elasticsearch is ready on https://localhost:9200
2018-08-15 03:24:45,111 main ERROR Unable to write to stream /elasticsearch/persistent/logging-es/logs/logging-es.log for appender rolling: org.apache.logging.log4j.core.appender.AppenderLoggingException: Error writing to stream /elasticsearch/persistent/logging-es/logs/logging-es.log
2018-08-15 03:24:45,113 main ERROR An exception occurred processing Appender rolling org.apache.logging.log4j.core.appender.AppenderLoggingException: Error writing to stream /elasticsearch/persistent/logging-es/logs/logging-es.log
	at org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:264)
	at org.apache.logging.log4j.core.appender.FileManager.writeToDestination(FileManager.java:261)
	at org.apache.logging.log4j.core.appender.rolling.RollingFileManager.writeToDestination(RollingFileManager.java:219)
	at org.apache.logging.log4j.core.appender.OutputStreamManager.flushBuffer(OutputStreamManager.java:294)
	at org.apache.logging.log4j.core.appender.OutputStreamManager.flush(OutputStreamManager.java:303)
	at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:179)
	at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:170)
	at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:161)
	at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:308)
	at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
	at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
	at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
	at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
	at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:448)
	at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:433)
	at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417)
	at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:403)
	at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63)
	at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:146)
	at org.apache.logging.log4j.spi.ExtendedLoggerWrapper.logMessage(ExtendedLoggerWrapper.java:217)
	at org.elasticsearch.common.logging.PrefixLogger.logMessage(PrefixLogger.java:102)
	at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2116)
	at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2100)
	at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:1994)
	at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1966)
	at org.apache.logging.log4j.spi.AbstractLogger.info(AbstractLogger.java:1303)
	at org.elasticsearch.node.Node.<init>(Node.java:254)
	at org.elasticsearch.node.Node.<init>(Node.java:245)
	at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:233)
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:233)
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:342)
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:132)
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:123)
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:70)
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:134)
	at org.elasticsearch.cli.Command.main(Command.java:90)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84)
Caused by: java.io.IOException: No space left on device
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:326)
	at org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:262)
	... 37 more
Version-Release number of selected component (if applicable):
logging images version: v3.11.0-0.10.0

How reproducible:
Always

Steps to Reproduce:
1. Check ES pod logs.
2.
3.

Actual results:
"java.io.IOException: No space left on device" for one es pod

Expected results:


Additional info:
Comment 1 Junqi Zhao 2018-08-15 03:32 EDT
Created attachment 1476062 [details]
IOException: No space left on device
Comment 2 Jeff Cantrill 2018-08-16 16:02:33 EDT
There is an issue with the available node and the ES pod in question:

1 node(s) were not ready, 1 node(s) were out of disk space, 15 Insufficient memory, 16 node(s) didn't match node selector, 2 node(s) were unschedulable, 8 Insufficient cpu.
Comment 3 Justin Pierce 2018-08-16 17:26:58 EDT
I managed to get access to the ES PV and wiped it clean. The pod is now running:

[root@free-int-master-3c664 ~]# oc get pods
NAME                                       READY     STATUS             RESTARTS   AGE
logging-es-data-master-f26an8nv-17-v24hw   2/2       Running            6          13d
logging-es-data-master-t7rrl3te-10-5z8cx   2/2       Running            6          12d
logging-es-data-master-w6l9n07t-10-wqnxj   2/2       Running            0          37m
Comment 9 Jeff Cantrill 2018-08-21 12:28:10 EDT
Moving component and updating title to reflect missing rollover params in the deployment.
Comment 11 Anping Li 2018-08-29 06:56:11 EDT
Move to verified.

1) The log are rolling as expected when log4j2.properties are configured.


-rw-r--r--. 1 1000120000 1000120000 8.4M Aug 29 10:53 anlitest.log
-rw-r--r--. 1 1000120000 1000120000  11M Aug 29 10:53 logging-es-2018-08-29.log
-rw-r--r--. 1 1000120000 1000120000 2.9K Aug 29 10:48 logging-es_deprecation.log
-rw-r--r--. 1 1000120000 1000120000    0 Aug 29 10:47 logging-es_index_indexing_slowlog.log
-rw-r--r--. 1 1000120000 1000120000    0 Aug 29 10:47 logging-es_index_search_slowlog.log
[anli@upg_slave_qeos10 311rsyslog]$ oc rsh logging-es-data-master-4y5wzs8t-1-4pvvp ls -lh /elasticsearch/persistent/logging-es/logs
Defaulting container name to elasticsearch.
Use 'oc describe pod/logging-es-data-master-4y5wzs8t-1-4pvvp -n openshift-logging' to see all of the containers in this pod.
total 11M
-rw-r--r--. 1 1000120000 1000120000 336K Aug 29 10:53 anlitest.log
Comment 13 errata-xmlrpc 2018-10-11 03:24:57 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652

Note You need to log in before you can comment on or make changes to this bug.