Bug 1616169
Summary: | Elasticsearch logging missing rollover and max size params which caused out of disk error. | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | ||||
Component: | Logging | Assignee: | ewolinet | ||||
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.11.0 | CC: | adeshpan, aos-bugs, ewolinet, jcantril, jupierce, rmeggins | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.11.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause: Elasticsearch 5's log4j.properties file did not contain a size rollover or max rollover configuration
Consequence: ES logs would continue to rollover and be kept, causing the pod to run out of local storage.
Fix: Add in a rollover policy based on file size and define a maximum file count.
Result: We correctly see files rolled over based on size and date and removed once a maximum amount has been met.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-10-11 07:24:57 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Junqi Zhao
2018-08-15 07:30:57 UTC
Created attachment 1476062 [details]
IOException: No space left on device
There is an issue with the available node and the ES pod in question: 1 node(s) were not ready, 1 node(s) were out of disk space, 15 Insufficient memory, 16 node(s) didn't match node selector, 2 node(s) were unschedulable, 8 Insufficient cpu. I managed to get access to the ES PV and wiped it clean. The pod is now running: [root@free-int-master-3c664 ~]# oc get pods NAME READY STATUS RESTARTS AGE logging-es-data-master-f26an8nv-17-v24hw 2/2 Running 6 13d logging-es-data-master-t7rrl3te-10-5z8cx 2/2 Running 6 12d logging-es-data-master-w6l9n07t-10-wqnxj 2/2 Running 0 37m Moving component and updating title to reflect missing rollover params in the deployment. Move to verified. 1) The log are rolling as expected when log4j2.properties are configured. -rw-r--r--. 1 1000120000 1000120000 8.4M Aug 29 10:53 anlitest.log -rw-r--r--. 1 1000120000 1000120000 11M Aug 29 10:53 logging-es-2018-08-29.log -rw-r--r--. 1 1000120000 1000120000 2.9K Aug 29 10:48 logging-es_deprecation.log -rw-r--r--. 1 1000120000 1000120000 0 Aug 29 10:47 logging-es_index_indexing_slowlog.log -rw-r--r--. 1 1000120000 1000120000 0 Aug 29 10:47 logging-es_index_search_slowlog.log [anli@upg_slave_qeos10 311rsyslog]$ oc rsh logging-es-data-master-4y5wzs8t-1-4pvvp ls -lh /elasticsearch/persistent/logging-es/logs Defaulting container name to elasticsearch. Use 'oc describe pod/logging-es-data-master-4y5wzs8t-1-4pvvp -n openshift-logging' to see all of the containers in this pod. total 11M -rw-r--r--. 1 1000120000 1000120000 336K Aug 29 10:53 anlitest.log Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652 (In reply to Aditya Deshpande from comment #14) > Hello, > > IHAC using OCP 3.9 and facing a similar issue. > Do we have to backport of this fix? This was introduced into 3.9 as part of: https://bugzilla.redhat.com/show_bug.cgi?id=1568361 > > Also, the customer is asking that as per solution described here > https://github.com/openshift/openshift-ansible/pull/9663/files#diff- > 60a291fe55d2965aefe7aa6e5018f658 > Do they need to append the following params on ES config-map, section > logging.yml and re-deploy the POD? > appender.rolling.policies.size.type=SizeBasedTriggeringPolicy > appender.rolling.policies.size.size=100MB > appender.rolling.strategy.type=DefaultRolloverStrategy > appender.rolling.strategy.max=5 > > Or is there any workaround possible as the customer does not want to upgrade > the cluster? One would have to either manually modify the logging-elasticsearch configmap to include the proper configuration block [1] and restart or you can set environment variables and restart per [2] [1] https://github.com/jcantrill/openshift-log4jextras [2] https://github.com/openshift/origin-aggregated-logging/pull/1127/files#diff-05261c9e7776c31e9b2fed5a68db6a3aR43 |