| Summary: | Need to doc that logging 3.4.0 no longer support rolled over log entries with json-file log driver | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Xia Zhao <xiazhao> | ||||
| Component: | Documentation | Assignee: | Jeff Cantrill <jcantril> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Xia Zhao <xiazhao> | ||||
| Severity: | medium | Docs Contact: | Vikram Goyal <vigoyal> | ||||
| Priority: | low | ||||||
| Version: | 3.4.0 | CC: | aos-bugs, ewolinet, jcantril, jokerman, mmccomas, rmeggins, tdawson, xiazhao | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||
| Doc Text: |
undefined
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-12-13 21:11:44 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Xia Zhao
2016-11-17 07:28:07 UTC
Images tested with: openshift3/logging-deployer 84594e8a4a8d openshift3/logging-auth-proxy ec334b0c2669 openshift3/logging-fluentd 125ed17f96bc openshift3/logging-elasticsearch 9b9452c0f8c2 openshift3/logging-kibana 7fc9916eea4d openshift3/logging-curator 9af78fc06248 We no longer support rolled over log entries. It caused a skewed view of log ingestion rate and caused special cases to code for. If the older projects had not created container log entries since EFK was created, we would expect that we would not see entries for them. Given that you are testing with the json-file log driver, it is possible that the container logs for those projects are no longer on the file system. Can you confirm if this is the case? https://github.com/openshift/origin-aggregated-logging/tree/master/deployer#have-fluentd-use-the-systemd-journal-as-the-log-source addresses the fact that role over logs are not supported. Will provide a docs PR to make this more explicit. Official documentation explicitly defines the log patterns picked up by fluentd: https://docs.openshift.org/latest/install_config/aggregate_logging.html#aggregated-fluentd Added PR to address in origin repo: https://github.com/openshift/origin-aggregated-logging/pull/285 (In reply to ewolinet from comment #2) > We no longer support rolled over log entries. It caused a skewed view of log > ingestion rate and caused special cases to code for. > > If the older projects had not created container log entries since EFK was > created, we would expect that we would not see entries for them. Given that > you are testing with the json-file log driver, it is possible that the > container logs for those projects are no longer on the file system. Can you > confirm if this is the case? Hi Eric, I never deleted any projects/containers that are older than EFK, so I believe they exist on the node machine. I also tried deploy logging 3.2.0 and 3.3.1 level on the same machine, they managed to collect and display log entries on kibana UI, only the 3.4.0 stacks are not work. Thanks, Xia This should be fixed in logging-deployer v3.4.0.28 or newer. My mistake. That was just a documentation update. Moving this back to Assigned. I'm not able to recreate this on my system. 1) Switch docker to use json-file log driver 2) Create a new project and generate logs there 3) Install logging (3.4) 4) Start Fluentd 5) tail ES logs, observe that log entries for .operations, project.logging and project.test show up Also, doing ls /var/log/*.pos shows the es-containers.log.pos and node.log.pos files as expected. I also see log entries for previous days showing up in the ES logs: [INFO ][cluster.metadata ] [Native] [.operations.2016.07.06] create_mapping [com.redhat.viaq.common] [cluster.metadata ] [Native] [project.test.052a158b-adb2-11e6-a420-0eb88ea33ae4.2016.11.18] update_mapping [com.redhat.viaq.common] Xia, Can you please confirm this again? Please note that if you are testing by reinstalling different versions you should delete the two .pos files above inbetween installs. Fluentd uses these files to know its last read position in those files, these files persist from install to install. If there are not log files in /var/log/containers/ then Fluentd will not be able to read in logs for that pod, whether or not the projects/pods have been deleted in OCP. If Docker has removed files, Fluentd will not be able to read them in. Hi Eric, This issue reproduce to me with clean installed OCP 3.4.0 + logging with json-file log driver. Here are my steps: 1. Install OCP 3.4.0 2. Login with user chunchen, created project chunchen, and populates logs 3. Checked logs exist on /var/log/containers: # ls /var/log/containers | grep chunchen ruby-ex-1-build_chunchen_POD-d3dd99787734e1294313f8bb6a256f0f9cb2aca01013b655f8ec7e1a4b201237.log ruby-ex-1-build_chunchen_sti-build-6136a33201607e39720ca5aeb0829566d105817ba44b1b2516a8c91ac6a30a14.log ruby-ex-1-e1qzf_chunchen_POD-67ae0ec1d7e9c67fdcfc55f0ecbb32263913887fc72e960bb4f7600775db453d.log ruby-ex-1-e1qzf_chunchen_ruby-ex-c6d19671e0154c652fcac5a3a5a07708fa6e589bf105533d3a34a5649dad7e41.log 4. Deploy logging 3.4.0 logging (clean install with use-journal=false specified) 5. Wait for a while, until EFK stacks are up 6. Query from ES for index *chunchen*, see log entries there: # oc exec logging-es-c6sq7po3-1-y5rnr -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://logging-es:9200/*chunchen*/_search | python -mjson.tool | more { "_shards": { "failed": 0, "successful": 1, "total": 1 }, "hits": { "hits": [ { "_id": "AViGHE6bS07AIfNc9o31", "_index": "project.chunchen.c7af9219-afc0-11e6-85f5-42010af00025.2016.11.21", "_score": 1.0, ... } 7. Login kibana UI, go to index project.chunchen.*, workaround bug #1388031, no log entries shown. Attached the screenshot. ############################################################# Also checked on node that the .pos files exist: # ls /var/log/*.pos /var/log/es-containers.log.pos /var/log/node.log.pos I didn't remove them since my scenario is clean install, but thanks for the info, it's important. Created attachment 1222350 [details]
project.chunchen* not workaroundable
The issue reproduced. And confirmed with dev that this is an intended new change for logging 3.4.0 in fluentd: the older log entries (populated before EFK) in /var/log/containers are no longer collected by fluentd, this is in order for saving time collecting older logs for customers with a large amount of log entries in /var/log/containers. Can we update the doc PR to explicitly mention this new change for 3.4.0, comparing with what we did in logging 3.2 and 3.3? Thanks: https://github.com/openshift/origin-aggregated-logging/pull/285/files#diff-baa04b5b463b472c2c2ccbf89a122c15R534 https://docs.openshift.org/latest/install_config/aggregate_logging.html#aggregated-fluentd (In reply to Xia Zhao from comment #13) > The issue reproduced. And confirmed with dev that this is an intended new > change for logging 3.4.0 in fluentd: the older log entries (populated before > EFK) in /var/log/containers are no longer collected by fluentd, this is in > order for saving time collecting older logs for customers with a large > amount of log entries in /var/log/containers. > > Can we update the doc PR to explicitly mention this new change for 3.4.0, > comparing with what we did in logging 3.2 and 3.3? Thanks: > > https://github.com/openshift/origin-aggregated-logging/pull/285/files#diff- > baa04b5b463b472c2c2ccbf89a122c15R534 > > https://docs.openshift.org/latest/install_config/aggregate_logging. > html#aggregated-fluentd And I realized that the above info only applies to json-file log driver. For journald log drivers, logging 3.4.0 persists the ability to collect older log entries prior to EFK stacks by default, until journal-read-from-head was specified to be "false". Thanks Rich for pointing this out. (In reply to Xia Zhao from comment #14) > (In reply to Xia Zhao from comment #13) > > The issue reproduced. And confirmed with dev that this is an intended new > > change for logging 3.4.0 in fluentd: the older log entries (populated before > > EFK) in /var/log/containers are no longer collected by fluentd, this is in > > order for saving time collecting older logs for customers with a large > > amount of log entries in /var/log/containers. > > > > Can we update the doc PR to explicitly mention this new change for 3.4.0, > > comparing with what we did in logging 3.2 and 3.3? Thanks: > > > > https://github.com/openshift/origin-aggregated-logging/pull/285/files#diff- > > baa04b5b463b472c2c2ccbf89a122c15R534 > > > > https://docs.openshift.org/latest/install_config/aggregate_logging. > > html#aggregated-fluentd > > And I realized that the above info only applies to json-file log driver. > > For journald log drivers, logging 3.4.0 persists the ability to collect > older log entries prior to EFK stacks by default, until > journal-read-from-head was specified to be "false". That is - when using the journald log driver, you can easily change the behavior to read from head or not by setting the JOURNAL_READ_FROM_HEAD env var to "true" or "false". The default when using journald is "false", which means by default you will _not_ read old logs. > > Thanks Rich for pointing this out. (In reply to Rich Megginson from comment #15) > (In reply to Xia Zhao from comment #14) > > (In reply to Xia Zhao from comment #13) > > > The issue reproduced. And confirmed with dev that this is an intended new > > > change for logging 3.4.0 in fluentd: the older log entries (populated before > > > EFK) in /var/log/containers are no longer collected by fluentd, this is in > > > order for saving time collecting older logs for customers with a large > > > amount of log entries in /var/log/containers. > > > > > > Can we update the doc PR to explicitly mention this new change for 3.4.0, > > > comparing with what we did in logging 3.2 and 3.3? Thanks: > > > > > > https://github.com/openshift/origin-aggregated-logging/pull/285/files#diff- > > > baa04b5b463b472c2c2ccbf89a122c15R534 > > > > > > https://docs.openshift.org/latest/install_config/aggregate_logging. > > > html#aggregated-fluentd > > > > And I realized that the above info only applies to json-file log driver. > > > > For journald log drivers, logging 3.4.0 persists the ability to collect > > older log entries prior to EFK stacks by default, until > > journal-read-from-head was specified to be "false". > > That is - when using the journald log driver, you can easily change the > behavior to read from head or not by setting the JOURNAL_READ_FROM_HEAD env > var to "true" or "false". The default when using journald is "false", which > means by default you will _not_ read old logs. Hmm... the fact is that I see old logs were actually read and presented in Kibana when JOURNAL_READ_FROM_HEAD is default to false on my machines. I will double confirm this. > > Thanks Rich for pointing this out. https://github.com/openshift/openshift-docs/pull/3291 to explicitly identify not reading old logs Also updated in https://github.com/openshift/origin-aggregated-logging/pull/293 The updated document looks good to me. Will set to verified after PR is merged. Thanks. PR #293 merged, but #3291 is open. Docs changes already published https://docs.openshift.com/container-platform/3.3/welcome/revhistory_full.html#mon-dec-05-2016 |