Since 3.4 we have been directing Elasticsearch to only log to a file local to the ES pod. This behavior was implemented to avoid a feedback loop where logs from the Elasticsearch pod itself caused additional logs to be generated. However, we recently observed a problem that results from this state. If an ES pod runs out of disk space for its PV, the Elasticsearch java process can generate a large volume of logs for every operation that fails because of a lack of diskspace. This filled the local disk holding the docker volumes on the infrastructure node where the ES pod was running. We propose three steps to address this problem: 1. Log to a location on the persistent volume 2. Use a log4j file handler that will rotate the file by size instead of date one that will also compress the rotated local file one that will also cap the total number of log files kept 3. Provide defaults for the log files of 10 MB cap, keeping at most 10
Captured in card https://trello.com/c/KqWiDOHT/ as an RFE
Targeting for 3.8 as this can be manually worked around by editing the configmap and pointing the log location to the PV
*** This bug has been marked as a duplicate of bug 1568361 ***