Bug 1395986 - Need to doc that logging 3.4.0 no longer support rolled over log entries with json-file log driver
Summary: Need to doc that logging 3.4.0 no longer support rolled over log entries with...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: ---
Assignee: Jeff Cantrill
QA Contact: Xia Zhao
Vikram Goyal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-17 07:28 UTC by Xia Zhao
Modified: 2017-03-08 18:43 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2016-12-13 21:11:44 UTC
Target Upstream Version:


Attachments (Terms of Use)
project.chunchen* not workaroundable (105.56 KB, image/png)
2016-11-21 09:51 UTC, Xia Zhao
no flags Details

Description Xia Zhao 2016-11-17 07:28:07 UTC
Description of problem:
For clean installed logging stacks (with json-file log driver), Kibana is only able to present log entries that are younger than EFK stacks itself, but not able to show logs for older projects. Fields is 0 for projects that are younger than EFK if go to the Settings tab and select the index you want view, then go back to the Discover tab and refresh the page( to workaround bug #1388031)

Version-Release number of selected component (if applicable):
the latest logging images on brew registry

How reproducible:
Always

Steps to Reproduce:
1.Install OCP 3.4.0
2.Create some user projects with sufficient data populates logs --> these are older than EFK stacks
3.Deploy logging 3.4.0, wait until EFK stacks are all up
4.Create some user projects with sufficient data populates logs --> these are younger than EFK stacks
5.View logs from kibana, go to the Settings tab and select the index you want view, then go back to the Discover tab and refresh the page.


Actual results:
5. Kibana is only able to present log entries that are younger than EFK stacks itself, but not able to show logs for older projects. Fields is 0 for projects that are younger than EFK if went to the settings tab select the index you want view, then go back to the Discover tab and refresh the page

Expected results:
5. Kibana should be able to present logs for all existing user-projects

Additional info:

Comment 1 Xia Zhao 2016-11-17 07:45:44 UTC
Images tested with:
openshift3/logging-deployer    84594e8a4a8d
openshift3/logging-auth-proxy    ec334b0c2669
openshift3/logging-fluentd    125ed17f96bc
openshift3/logging-elasticsearch    9b9452c0f8c2
openshift3/logging-kibana    7fc9916eea4d
openshift3/logging-curator    9af78fc06248

Comment 2 ewolinet 2016-11-17 14:59:50 UTC
We no longer support rolled over log entries. It caused a skewed view of log ingestion rate and caused special cases to code for.

If the older projects had not created container log entries since EFK was created, we would expect that we would not see entries for them. Given that you are testing with the json-file log driver, it is possible that the container logs for those projects are no longer on the file system. Can you confirm if this is the case?

Comment 3 Jeff Cantrill 2016-11-17 19:54:23 UTC
https://github.com/openshift/origin-aggregated-logging/tree/master/deployer#have-fluentd-use-the-systemd-journal-as-the-log-source addresses the fact that role over logs are not supported.  Will provide a docs PR to make this more explicit.

Comment 4 Jeff Cantrill 2016-11-17 20:24:07 UTC
Official documentation explicitly defines the log patterns picked up by fluentd: https://docs.openshift.org/latest/install_config/aggregate_logging.html#aggregated-fluentd   Added PR to address in origin repo: https://github.com/openshift/origin-aggregated-logging/pull/285

Comment 5 Xia Zhao 2016-11-18 10:30:18 UTC
(In reply to ewolinet from comment #2)
> We no longer support rolled over log entries. It caused a skewed view of log
> ingestion rate and caused special cases to code for.
> 
> If the older projects had not created container log entries since EFK was
> created, we would expect that we would not see entries for them. Given that
> you are testing with the json-file log driver, it is possible that the
> container logs for those projects are no longer on the file system. Can you
> confirm if this is the case?

Hi Eric,

I never deleted any projects/containers that are older than EFK, so I believe they exist on the node machine. I also tried deploy logging 3.2.0 and 3.3.1 level on the same machine, they managed to collect and display log entries on kibana UI, only the 3.4.0 stacks are not work.

Thanks,
Xia

Comment 6 Troy Dawson 2016-11-18 17:43:42 UTC
This should be fixed in logging-deployer v3.4.0.28 or newer.

Comment 7 Troy Dawson 2016-11-18 18:39:25 UTC
My mistake.  That was just a documentation update.  Moving this back to Assigned.

Comment 8 ewolinet 2016-11-18 22:09:36 UTC
I'm not able to recreate this on my system.

1) Switch docker to use json-file log driver
2) Create a new project and generate logs there
3) Install logging (3.4)
4) Start Fluentd
5) tail ES logs, observe that log entries for .operations, project.logging and project.test show up

Also, doing ls /var/log/*.pos shows the es-containers.log.pos and node.log.pos files as expected.

I also see log entries for previous days showing up in the ES logs:

[INFO ][cluster.metadata         ] [Native] [.operations.2016.07.06] create_mapping [com.redhat.viaq.common]
[cluster.metadata         ] [Native] [project.test.052a158b-adb2-11e6-a420-0eb88ea33ae4.2016.11.18] update_mapping [com.redhat.viaq.common]


Xia,

Can you please confirm this again? Please note that if you are testing by reinstalling different versions you should delete the two .pos files above inbetween installs. Fluentd uses these files to know its last read position in those files, these files persist from install to install.

If there are not log files in /var/log/containers/ then Fluentd will not be able to read in logs for that pod, whether or not the projects/pods have been deleted in OCP. If Docker has removed files, Fluentd will not be able to read them in.

Comment 9 Xia Zhao 2016-11-21 09:50:29 UTC
Hi Eric, 

This issue reproduce to me with clean installed OCP 3.4.0 + logging with json-file log driver. Here are my steps:

1. Install OCP 3.4.0
2. Login with user chunchen, created project chunchen, and populates logs
3. Checked logs exist on /var/log/containers:

# ls /var/log/containers | grep chunchen
ruby-ex-1-build_chunchen_POD-d3dd99787734e1294313f8bb6a256f0f9cb2aca01013b655f8ec7e1a4b201237.log
ruby-ex-1-build_chunchen_sti-build-6136a33201607e39720ca5aeb0829566d105817ba44b1b2516a8c91ac6a30a14.log
ruby-ex-1-e1qzf_chunchen_POD-67ae0ec1d7e9c67fdcfc55f0ecbb32263913887fc72e960bb4f7600775db453d.log
ruby-ex-1-e1qzf_chunchen_ruby-ex-c6d19671e0154c652fcac5a3a5a07708fa6e589bf105533d3a34a5649dad7e41.log

4. Deploy logging 3.4.0 logging (clean install with use-journal=false specified)
5. Wait for a while, until EFK stacks are up
6. Query from ES for index *chunchen*, see log entries there:
# oc exec logging-es-c6sq7po3-1-y5rnr -- curl -s -k --cert  /etc/elasticsearch/secret/admin-cert --key  /etc/elasticsearch/secret/admin-key https://logging-es:9200/*chunchen*/_search | python -mjson.tool | more
{
    "_shards": {
        "failed": 0,
        "successful": 1,
        "total": 1
    },
    "hits": {
        "hits": [
            {
                "_id": "AViGHE6bS07AIfNc9o31",
                "_index": "project.chunchen.c7af9219-afc0-11e6-85f5-42010af00025.2016.11.21",
                "_score": 1.0,
...
}

7. Login kibana UI, go to index project.chunchen.*, workaround bug #1388031, no log entries shown. Attached the screenshot.

#############################################################
Also checked on node that the .pos files exist:

# ls /var/log/*.pos
/var/log/es-containers.log.pos  /var/log/node.log.pos

I didn't remove them since my scenario is clean install, but thanks for the info, it's important.

Comment 10 Xia Zhao 2016-11-21 09:51:24 UTC
Created attachment 1222350 [details]
project.chunchen*  not workaroundable

Comment 13 Xia Zhao 2016-11-22 03:54:25 UTC
The issue reproduced. And confirmed with dev that this is an intended new change for logging 3.4.0 in fluentd: the older log entries (populated before EFK) in /var/log/containers are no longer collected by fluentd, this is in order for saving time collecting older logs for customers with a large amount of log entries in /var/log/containers.

Can we update the doc PR to explicitly mention this new change for 3.4.0, comparing with what we did in logging 3.2 and 3.3? Thanks: 

https://github.com/openshift/origin-aggregated-logging/pull/285/files#diff-baa04b5b463b472c2c2ccbf89a122c15R534 

https://docs.openshift.org/latest/install_config/aggregate_logging.html#aggregated-fluentd

Comment 14 Xia Zhao 2016-11-22 04:07:25 UTC
(In reply to Xia Zhao from comment #13)
> The issue reproduced. And confirmed with dev that this is an intended new
> change for logging 3.4.0 in fluentd: the older log entries (populated before
> EFK) in /var/log/containers are no longer collected by fluentd, this is in
> order for saving time collecting older logs for customers with a large
> amount of log entries in /var/log/containers.
> 
> Can we update the doc PR to explicitly mention this new change for 3.4.0,
> comparing with what we did in logging 3.2 and 3.3? Thanks: 
> 
> https://github.com/openshift/origin-aggregated-logging/pull/285/files#diff-
> baa04b5b463b472c2c2ccbf89a122c15R534 
> 
> https://docs.openshift.org/latest/install_config/aggregate_logging.
> html#aggregated-fluentd

And I realized that the above info only applies to json-file log driver. 

For journald log drivers, logging 3.4.0 persists the ability to collect older log entries prior to EFK stacks by default, until journal-read-from-head was specified to be "false".

Thanks Rich for pointing this out.

Comment 15 Rich Megginson 2016-11-22 15:21:00 UTC
(In reply to Xia Zhao from comment #14)
> (In reply to Xia Zhao from comment #13)
> > The issue reproduced. And confirmed with dev that this is an intended new
> > change for logging 3.4.0 in fluentd: the older log entries (populated before
> > EFK) in /var/log/containers are no longer collected by fluentd, this is in
> > order for saving time collecting older logs for customers with a large
> > amount of log entries in /var/log/containers.
> > 
> > Can we update the doc PR to explicitly mention this new change for 3.4.0,
> > comparing with what we did in logging 3.2 and 3.3? Thanks: 
> > 
> > https://github.com/openshift/origin-aggregated-logging/pull/285/files#diff-
> > baa04b5b463b472c2c2ccbf89a122c15R534 
> > 
> > https://docs.openshift.org/latest/install_config/aggregate_logging.
> > html#aggregated-fluentd
> 
> And I realized that the above info only applies to json-file log driver. 
> 
> For journald log drivers, logging 3.4.0 persists the ability to collect
> older log entries prior to EFK stacks by default, until
> journal-read-from-head was specified to be "false".

That is - when using the journald log driver, you can easily change the behavior to read from head or not by setting the JOURNAL_READ_FROM_HEAD env var to "true" or "false".  The default when using journald is "false", which means by default you will _not_ read old logs.

> 
> Thanks Rich for pointing this out.

Comment 16 Xia Zhao 2016-11-24 06:01:29 UTC
(In reply to Rich Megginson from comment #15)
> (In reply to Xia Zhao from comment #14)
> > (In reply to Xia Zhao from comment #13)
> > > The issue reproduced. And confirmed with dev that this is an intended new
> > > change for logging 3.4.0 in fluentd: the older log entries (populated before
> > > EFK) in /var/log/containers are no longer collected by fluentd, this is in
> > > order for saving time collecting older logs for customers with a large
> > > amount of log entries in /var/log/containers.
> > > 
> > > Can we update the doc PR to explicitly mention this new change for 3.4.0,
> > > comparing with what we did in logging 3.2 and 3.3? Thanks: 
> > > 
> > > https://github.com/openshift/origin-aggregated-logging/pull/285/files#diff-
> > > baa04b5b463b472c2c2ccbf89a122c15R534 
> > > 
> > > https://docs.openshift.org/latest/install_config/aggregate_logging.
> > > html#aggregated-fluentd
> > 
> > And I realized that the above info only applies to json-file log driver. 
> > 
> > For journald log drivers, logging 3.4.0 persists the ability to collect
> > older log entries prior to EFK stacks by default, until
> > journal-read-from-head was specified to be "false".
> 
> That is - when using the journald log driver, you can easily change the
> behavior to read from head or not by setting the JOURNAL_READ_FROM_HEAD env
> var to "true" or "false".  The default when using journald is "false", which
> means by default you will _not_ read old logs.

Hmm... the fact is that I see old logs were actually read and presented in Kibana when JOURNAL_READ_FROM_HEAD is default to false on my machines. I will double confirm this.

> > Thanks Rich for pointing this out.

Comment 17 Jeff Cantrill 2016-11-28 20:54:09 UTC
https://github.com/openshift/openshift-docs/pull/3291 to explicitly identify not reading old logs

Comment 18 Jeff Cantrill 2016-11-28 21:16:29 UTC
Also updated in https://github.com/openshift/origin-aggregated-logging/pull/293

Comment 19 Xia Zhao 2016-11-29 05:14:08 UTC
The updated document looks good to me. Will set to verified after PR is merged. Thanks.

Comment 20 Xia Zhao 2016-12-02 03:24:46 UTC
PR #293 merged, but #3291 is open.

Comment 21 Scott Dodson 2016-12-13 21:12:36 UTC
Docs changes already published https://docs.openshift.com/container-platform/3.3/welcome/revhistory_full.html#mon-dec-05-2016


Note You need to log in before you can comment on or make changes to this bug.