At UNC, it appears that with 3.4.1-7 of openshift installed we do not have a local-to-the-pod log file for Elasticsearch. Only "console" logging is configured. We should actually disable console logging for Elasticsearch and log locally to a file: Like: # you can override this using by setting a system property, for example -Des.logger.level=DEBUG es.logger.level: INFO rootLogger: ${es.logger.level}, file logger: # log action execution errors for easier debugging action: DEBUG # deprecation logging, turn to DEBUG to see them deprecation: INFO, deprecation_log_file # reduce the logging for aws, too much is logged under the default INFO com.amazonaws: WARN # aws will try to do some sketchy JMX stuff, but its not needed. com.amazonaws.jmx.SdkMBeanRegistrySupport: ERROR com.amazonaws.metrics.AwsSdkMetrics: ERROR org.apache.http: INFO index.search.slowlog: TRACE, file index.indexing.slowlog: TRACE, file additivity: index.search.slowlog: false index.indexing.slowlog: false deprecation: false appender: # Use the following log4j-extras RollingFileAppender to enable gzip compression of log files. # For more information see https://logging.apache.org/log4j/extras/apidocs/org/apache/log4j/rolling/RollingFileAppender.html file: type: extrasRollingFile file: ${path.logs}/${cluster.name}.log rollingPolicy: timeBased rollingPolicy.FileNamePattern: ${path.logs}/${cluster.name}.log.%d{yyyy-MM-dd}.gz layout: type: pattern conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n" deprecation_log_file: type: dailyRollingFile file: ${path.logs}/${cluster.name}_deprecation.log datePattern: "'.'yyyy-MM-dd" layout: type: pattern conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
We need to weight the pros and cons of doing this. This will essentially disable using 'oc logs' for the es pods. IF we do this, we should: * Add docs to say "get es logs like: oc exec COMMAND HERE" * Modify fluentd to ignore ES logs
Shoulndt it at least be an optional thing that can be toggled with a configmap option?
We can at least provide some sort of documentation for when to do this (advanced debugging of elasticsearch, problems with feedback loops, too many es log messages, too many ES Java stacktraces, etc.), and how to do this (oc edit configmap logging-elasticsearch, change the logging.yml, oc rollout latest dc/logging-es-xxx)
So the problem is that if you have logs from Elasticsearch itself being fed back into Elasticsearch to leave open a nasty feedback loop. If Elasticsearch starts behaving badly, and we need to debug it, are we going to rely on logs flowing through fluentd where we have the potential to miss logs that might get dropped during that bad behavior period? Won't that hamper debugging efforts? It already has been shown that not having the local log file prevented us from seeing current bad behaviors. Isn't it the case that when everything is working well, the logs for Elasticsearch are not needed? Aren't they needed when things are bad? So if we don't have a local log file for Elasticsearch logs, and only the console output, aren't we opening up the feedback loop and preventing us from having additional information to debug its behavior? If we make it a configmap option, when we need the local log file, won't that be potentially too late, as we'll have to restart that pod missing the capture of the behavior in that state? Let's drop console logging all together. Let's log locally only for Elasticsearch pods. Arguably we should prevent stdout/stderr from flowing through to journal as well from the Elasticsearch instances. The only other consideration, it would seem, would be to ensure we have those pod logs persisted and rotated appropriately. Further, if Elasticsearch pods are not logging to stdout/stderr and only logging to a local file, then no modification would be needed for fluentd to ignore Elasticsearch logs, right? Then the documentation just notes that there will be no logs from Elasticsearch pods in "oc logs", and the admin should pick the log rollover period for the pods. And again, this is for the single Elasticsearch cluster. When an ops Elasticsearch cluster is used, you CAN have logs from the app Elasticsearch cluster flow to the ops Elasticsearch cluster, but then the ops Elasticsearch cluster would only be logging locally. Does that make sense?
Doing some research, need to levelset on what we desire from this issue. We already put the ES logging config in a configmap. This is 3.4 and is done via the deployer: https://github.com/openshift/origin-aggregated-logging/blob/release-1.4/deployer/scripts/install.sh#L272-L275 And is mounted into the pod: https://github.com/openshift/origin-aggregated-logging/blob/release-1.4/deployer/scripts/install.sh#L272-L275 So if we need to modify just whats on and what's off then that will be defaults for 3.6 in the ansible code.
I think we just need to modify the base logging file, see PR https://github.com/openshift/origin-aggregated-logging/pull/349
My only concern is if openshift admins expect `oc logs` and the openshift console to display the elasticsearch logs. If this is a requirement, then we will have to change fluentd to "throw away" Elasticsearch pod logs, or send them to some alternate sink.
The installer fix: https://github.com/openshift/openshift-ansible/pull/3777 Pending docs fix on 3.5 docs being merged so we can make a PR against ansible docs. It would be nice to move this into 3.5 but its not a blocker bug for 3.5 since there is a work around: manually edit configmap
Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/a222702ed4316f99eac497ac755c5349804c66d2 bug 1432607. Allow configuration of ES log destination
Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1439451
Tested with the both value (the default one: 'file' and non-default one :'console') of parameter `openshift_logging_es_log_appenders`, didn't find any difference here: Get same test result in both of the above 2 situations: 1) full es logs were printed out on cli console when cluster-admin run command "$ oc logs ${es-master-pod-name} " on openshift master 2) es logs visible at /elasticsearch/logging-es/logs/logging-es.log after remote shell into es master container: "# oc rsh logging-es-data-master" 3) the logs seen in "file" (as is described in 2)) are identical to those seen from "console" (as is described in 1)) Could you please let us know what's diff between these 2 settings? Also I have to reopen the bz since "es is still logging to a local file in a pod", which is opposite to this bug title. Please correct me if I'm wrong, thanks! Test env: # openshift version openshift v3.6.96 kubernetes v1.6.1+5115d708d7 etcd 3.1.0 Images tested with: brew registry with tag v3.6 -- currently can't get the exact image ids due to https://bugzilla.redhat.com/show_bug.cgi?id=1455472
Created attachment 1285729 [details] file logs when openshift_logging_es_log_appenders is set to "console"
Created attachment 1285730 [details] console logs when openshift_logging_es_log_appenders is set to "console"
Created attachment 1285731 [details] console logs when openshift_logging_es_log_appenders is set to "file"
Created attachment 1285732 [details] file logs when openshift_logging_es_log_appenders is default to "file"
These values are not mutually exclusive. The ansible variable is a list and can be set in the inventory file like: openshift_logging_es_log_appenders=['console','file'] This means 'send logs to both the console and file appenders'. This default was set to file because @peter realized while working with customers that ES can get into a very bad state because it is possible for it to index its own logs. This settings allows 'oc logs' to still function as expected if console logging is specified. Based on your comment from #12 it seems to describe how I would expect the ES logs to function.
@Jeff, Thanks to the explanations. Now I realized I should set openshift_logging_es_log_appenders=['console'] in my inventory file, instead of openshift_logging_es_log_appenders=console as what I did in comment #12. Seems by this way I'm actually always using the default value ['file'] there. Will give this another try today, to see if es still logging to the local file /elasticsearch/logging-es/logs/logging-es.log in the ['console'] case.
Verified on 3.6.99 with ansible version openshift-ansible-playbooks-3.6.89.3-1.git.0.6e50d9a.el7.noarch, no matter console or file log appenders was set in inventory like: openshift_logging_es_log_appenders=['console'] or openshift_logging_es_log_appenders=['file'] or the default value, the following es special logs are only printed on console when running "oc logs" cli, not longer exist in the local file inside es pod /elasticsearch/logging-es/logs/logging-es.log * About to connect() to localhost port 9200 (#0) * Trying ::1... * Connected to localhost (::1) port 9200 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * CAfile: /etc/elasticsearch/secret/admin-ca CApath: none * NSS: client certificate from file * subject: CN=system.admin,OU=OpenShift,O=Logging * start date: Jun 07 17:01:42 2017 GMT * expire date: Jun 07 17:01:42 2019 GMT * common name: system.admin * issuer: CN=logging-signer-test * SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 * Server certificate: * subject: CN=logging-es,OU=OpenShift,O=Logging * start date: Jun 07 17:02:04 2017 GMT * expire date: Jun 07 17:02:04 2019 GMT * common name: logging-es * issuer: CN=logging-signer-test > PUT /_template/com.redhat.viaq-openshift-operations.template.json HTTP/1.1 > User-Agent: curl/7.29.0 > Host: localhost:9200 > Accept: */* > Content-Length: 16125 > Content-Type: application/x-www-form-urlencoded > Expect: 100-continue > < HTTP/1.1 100 Continue } [data not shown] < HTTP/1.1 200 OK < Content-Type: application/json; charset=UTF-8 < Content-Length: 21 < { [data not shown] * Connection #0 to host localhost left intact {"acknowledged":true}Create index template 'com.redhat.viaq-openshift-project.template.json' * About to connect() to localhost port 9200 (#0) * Trying ::1... * Connected to localhost (::1) port 9200 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * CAfile: /etc/elasticsearch/secret/admin-ca CApath: none * NSS: client certificate from file * subject: CN=system.admin,OU=OpenShift,O=Logging * start date: Jun 07 17:01:42 2017 GMT * expire date: Jun 07 17:01:42 2019 GMT * common name: system.admin * issuer: CN=logging-signer-test * SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 * Server certificate: * subject: CN=logging-es,OU=OpenShift,O=Logging * start date: Jun 07 17:02:04 2017 GMT * expire date: Jun 07 17:02:04 2019 GMT * common name: logging-es * issuer: CN=logging-signer-test > PUT /_template/com.redhat.viaq-openshift-project.template.json HTTP/1.1 > User-Agent: curl/7.29.0 > Host: localhost:9200 > Accept: */* > Content-Length: 16121 > Content-Type: application/x-www-form-urlencoded > Expect: 100-continue > < HTTP/1.1 100 Continue } [data not shown] < HTTP/1.1 200 OK < Content-Type: application/json; charset=UTF-8 < Content-Length: 21 < { [data not shown] * Connection #0 to host localhost left intact
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188