Bug 1466152 - Json-file log driver: Neither "openshift_logging_fluentd_use_journal=false" nor omitted collects the log entries
Summary: Json-file log driver: Neither "openshift_logging_fluentd_use_journal=false" n...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.7.0
Assignee: Rich Megginson
QA Contact: Xia Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-29 07:56 UTC by Xia Zhao
Modified: 2017-11-28 21:59 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The ansible parameter name is confusing to use and does not properly reflect how it is consumed by fluentd Consequence: Fix: Remove the parameter Result: Fluentd consistently collects logs based on the source it detects
Clone Of:
Environment:
Last Closed: 2017-11-28 21:59:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
fluentd log (1.63 KB, text/plain)
2017-06-29 07:56 UTC, Xia Zhao
no flags Details
es log (14.53 KB, text/plain)
2017-06-29 07:57 UTC, Xia Zhao
no flags Details
inventory file used for logging deployment (1.08 KB, text/plain)
2017-06-29 07:59 UTC, Xia Zhao
no flags Details
screenshot of logging UI (254.77 KB, image/png)
2017-06-29 07:59 UTC, Xia Zhao
no flags Details
kibana UI, time range is within 30 minutes (113.24 KB, image/png)
2017-06-29 10:15 UTC, Junqi Zhao
no flags Details
kibna UI, select time range as today (200.64 KB, image/png)
2017-06-29 10:16 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 4696 0 None closed Bug 1466152 - Json-file log driver: Neither "openshift_logging_fluentd_use_journal=false" nor omitted collects the log e... 2020-11-17 12:41:07 UTC
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Xia Zhao 2017-06-29 07:56:46 UTC
Created attachment 1292803 [details]
fluentd log

Description of problem:
This issue only happen for json-file log driver:
Deploy logging without ops cluster enabled, wait enough time until es is in GREEN status, the log entries from user indices are not presented on logging UI, only those within indices .all and .operations presented on logging UI. Please refer to attached screenshot.

Version-Release number of selected component (if applicable):
openshift3/logging-kibana    ba0b9829ea3d
openshift3/logging-kibana    c77868ea2334
openshift3/logging-elasticsearch    b941ca06dd0d
openshift3/logging-fluentd    81853532d392
openshift3/logging-auth-proxy    fb9141c52db0
openshift3/logging-curator    757cf4948069

ansible version:
openshift-ansible-playbooks-3.6.126.1-1.git.0.41d2313.el7.noarch

# openshift version
openshift v3.6.126.1
kubernetes v1.6.1+5115d708d7
etcd 3.2.0

How reproducible:
Always

Steps to Reproduce:
1.Install OCP 3.6 with json-file log driver enabled on master and nodes
2.Deploy logging without ops cluster enabled, no matter if mux service is deployed
3.Wait enough time until EFK stacks are all on, and see es is in GREEN status
4.Check if log entries from user indices are not presented on logging UI

Actual results:
The log entries from user indices are not presented on logging UI, only those within indices .all and .operations presented on logging UI. Please refer to attached screenshot.

Expected results:
The log entries from user indices are not presented on logging UI

Additional info:
es and fluentd logs are attached
test env provided

Comment 1 Xia Zhao 2017-06-29 07:57:37 UTC
Created attachment 1292804 [details]
es log

Comment 2 Xia Zhao 2017-06-29 07:59:03 UTC
Created attachment 1292805 [details]
inventory file used for logging deployment

Comment 3 Xia Zhao 2017-06-29 07:59:35 UTC
Created attachment 1292806 [details]
screenshot of logging UI

Comment 5 Xia Zhao 2017-06-29 10:09:07 UTC
Please ignore the image versions in comment #0, they should be:

openshift3/logging-elasticsearch    0d16be758fca
openshift3/logging-kibana    ba0b9829ea3d
openshift3/logging-fluentd    05db4f17fc26
openshift3/logging-auth-proxy    22560dbd7008
openshift3/logging-curator    757cf4948069

Comment 6 Junqi Zhao 2017-06-29 10:14:53 UTC
I found the same issue:

Description:
Set log-driver as json-file in /etc/docker/daemon.json, and not enabled ops cluster, only ".operations.*" and ".all" could be viewd in Kibana UI, no logs could be shown in Kibana within 30 minutes, if select the time range as "Today",  there are ".operations.*" and ".all" logs could be shown. See the attached pictures.


Steps to Reproduce:
1. Set log driver as json-file in /etc/docker/daemon.json
# cat /etc/docker/daemon.json 
{
  "log-driver": "json-file"
}

Comment out the log driver settings in /etc/sysconfig/docker
# cat /etc/sysconfig/docker
# /etc/sysconfig/docker

# Modify these options if you want to change the way the docker daemon runs
# OPTIONS=' --selinux-enabled  --log-driver=journald'

2. Restart docker: systemctl restart docker and check log dirver by 'docker info | grep "Logging Driver"'
3. Deploy logging 3.6.0 and check Kibana UI

Actual results:
2. # docker info | grep "Logging Driver"
Logging Driver: json-file

3. only ".operations.*" could be viewd in Kibana UI

Expected results:
3. log entries could be found in Kibana UI

Additional info:
Inventory file
[OSEv3:children]
masters

[masters]
${MASTER} openshift_public_hostname=${MASTER}

[OSEv3:vars]
ansible_ssh_user=root
ansible_ssh_private_key_file="~/libra.pem"
deployment_type=openshift-enterprise

# Logging
openshift_logging_install_logging=true
openshift_logging_kibana_hostname=kibana.${SUB_DOMAIN}
openshift_logging_kibana_ops_hostname=kibana-ops.${SUB_DOMAIN}
public_master_url=https://${MASTER}:${PORT}
openshift_logging_image_prefix=${IMAGE_PREFIX}
openshift_logging_image_version=v3.6
openshift_logging_namespace=logging

Comment 7 Junqi Zhao 2017-06-29 10:15:58 UTC
Created attachment 1292834 [details]
kibana UI, time range is within 30 minutes

Comment 8 Junqi Zhao 2017-06-29 10:16:53 UTC
Created attachment 1292836 [details]
kibna UI, select time range as today

Comment 9 Jeff Cantrill 2017-06-29 13:30:46 UTC
* Is it possible for you to provid the image versions (e.g. 3.6.1-4) instead of the SHA?  That would help us further understand which version you are dealing with since I have no way to ref the sha.

* What logs are you expecting from which projects?  Fluentd will only read from the end of a file unless you specify otherwise.  Also, this change https://github.com/openshift/origin-aggregated-logging/pull/506 modifies the kibana mappings presented to a user.  You will only see mappings for project that have data?  Is your pod producing log messages?

Comment 10 Junqi Zhao 2017-06-30 04:19:54 UTC
Tested today, same phenomenon as this defect reported.

1. Attached specified image id
logging-elasticsearch            v3.6.126.4-1        7969b6dcb11e        4 hours ago         404.2 MB
logging-kibana                   v3.6.126.4-1        5349edde24e5        4 hours ago         342.4 MB
logging-auth-proxy               v3.6.126.4-1        373f69a74465        4 hours ago         214.8 MB
logging-fluentd                  v3.6.126.4-1        218c2e3246e4        4 hours ago         232.5 MB
logging-curator                  3.6.99-3            757cf4948069        2 days ago          221.5 MB

2. only .all and .operation indices were presented on logging UI, created user project, user project log index and logging project index could not be found on Kibana UI. See the attached picture "screenshot of logging UI"

# oc rsh ${ES_POD}
sh-4.2$ env | grep -i ver
OSE_ES_VER=2.4.4.9
RECOVER_EXPECTED_NODES=1
ES_VER=2.4.4
RECOVER_AFTER_NODES=1
ES_CLOUD_K8S_VER=2.4.4
RECOVER_AFTER_TIME=5m
JAVA_VER=1.8.0

Comment 11 Rich Megginson 2017-06-30 17:27:12 UTC
Once again, I regret ever adding these settings to be exposed to end users . . .

Note: DO NOT SET USE_JOURNAL or openshift_logging_fluentd_use_journal.  Leave them out of the inventory.  Do not set USE_JOURNAL in the fluentd daemonset env.

This setting only works if it is the same as the docker log driver setting.

If you set openshift_logging_fluentd_use_journal=true and/or USE_JOURNAL=true and docker --log-driver=json-file you will not get logs.

If you set openshift_logging_fluentd_use_journal=false and/or USE_JOURNAL=false and docker --log-driver=journald you will not get logs.

If you do not set openshift_logging_fluentd_use_journal, and do not set USE_JOURNAL, then fluentd should figure out which log driver is being used.

This is what I did:

Install the latest 3.6 with logging - NOTE: docker uses --log-driver=journald by default.

I edited /etc/sysconfig/docker to look like this:

OPTIONS=' --selinux-enabled  '
#OPTIONS=' --selinux-enabled  --log-driver=journald'

NOTE: You have to leave the '--selinux-enabled' in there - you can't comment that part out.

I created /etc/docker/daemon.json to look like this:

{
  "log-driver": "json-file"
}

I restarted docker: systemctl restart docker

I edited the fluentd daemonset to make sure that USE_JOURNAL did not set a value.  This restarted the fluentd pods.

I am able to see the latest container logs:

# oc exec $espod -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/project.*/_search?sort=@timestamp:desc | python -mjson.tool | more
{
    "_shards": {
...
                "_index": "project.logging.b04b2f5a-5ce3-11e7-91f1-02ea503f0558.
2017.06.30",
                "_score": null,
                "_source": {
                    "@timestamp": "2017-06-30T17:20:42.000000+00:00",
...

# date -u
Fri Jun 30 17:21:33 UTC 2017

As you can see, Elasticsearch is pretty much up-to-date with the latest container logs.

Comment 12 Rich Megginson 2017-06-30 17:40:32 UTC
Also - openshift-ansible will set USE_JOURNAL in the fluentd daemonset like this:

https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging/defaults/main.yml#L75

That is, it will use the setting from the following ansible parameters, in order, if set: openshift_hosted_logging_use_journal docker_log_driver openshift.docker.log_driver openshift.docker.options

If any of these are set to use journald or json-file, then the fluentd daemonset USE_JOURNAL will be set to "true" or "false".

So, if you want to test changing the log driver to journald or json-file, YOU MUST ALSO EDIT daemonset logging-fluentd AND REMOVE THE VALUE from USE_JOURNAL.

Comment 13 Junqi Zhao 2017-07-04 07:33:58 UTC
@Rich, log entries could be found on Kibana UI now, but I want to write the followings:

I edited /etc/sysconfig/docker to look like this:

OPTIONS=' --selinux-enabled',

I created /etc/docker/daemon.json to look like this:

{
  "log-driver": "json-file"
}

restarted docker: systemctl restart docker,log driver is json-file now, then I deployed logging 3.6.0(did not set openshift_hosted_logging_use_journal in inventory file). and found the USE_JOURNAL value is true
# oc get daemonset logging-fluentd -o yaml | grep -i USE_JOURNAL -C 2
        - name: ES_COPY
          value: "false"
        - name: USE_JOURNAL
          value: "true"
        - name: JOURNAL_SOURCE
 
I don't think it is a good idea to let customer do the manual change to make sure that USE_JOURNAL did not set a value and restart flunted pods again, and I think our program should not set USE_JOURNAL=true or USE_JOURNAL=false, it should leave this parameter's value to empty

Comment 14 Xia Zhao 2017-07-05 08:43:17 UTC
Removed openshift_logging_fluentd_use_journal in inventory file, the original issue in comment #0 still repro. 

Looked into fluentd pod, seems the root cause is the auto detect log driver function is broken, fluentd is using journald log driver even openshift_logging_fluentd_use_journal is not specified in inventory file:

# oc rsh logging-fluentd-2v1dm
sh-4.2# env | grep JOUR
USE_JOURNAL=true

Images tested with:
logging-fluentd            v3.6                ba4e34d67dbe        24 hours ago        231.5 MB

# rpm -qa | grep ansible
openshift-ansible-callback-plugins-3.6.133-1.git.0.950bb48.el7.noarch
openshift-ansible-docs-3.6.133-1.git.0.950bb48.el7.noarch
openshift-ansible-lookup-plugins-3.6.133-1.git.0.950bb48.el7.noarch
openshift-ansible-filter-plugins-3.6.133-1.git.0.950bb48.el7.noarch
openshift-ansible-playbooks-3.6.133-1.git.0.950bb48.el7.noarch
ansible-2.2.3.0-1.el7.noarch
openshift-ansible-3.6.133-1.git.0.950bb48.el7.noarch
openshift-ansible-roles-3.6.133-1.git.0.950bb48.el7.noarch

# openshift version
openshift v3.6.133
kubernetes v1.6.1+5115d708d7
etcd 3.2.1

Comment 15 Xia Zhao 2017-07-05 08:45:05 UTC
Please ignore the repro steps in comment #6 since it's different than comment #0, and the test configuration issue was fixed there.

Comment 16 Rich Megginson 2017-07-05 14:43:14 UTC
(In reply to Xia Zhao from comment #14)
> Removed openshift_logging_fluentd_use_journal in inventory file, the
> original issue in comment #0 still repro. 
> 
> Looked into fluentd pod, seems the root cause is the auto detect log driver
> function is broken, fluentd is using journald log driver even
> openshift_logging_fluentd_use_journal is not specified in inventory file:

Are you saying that you were able to reproduce the problem, by using the same inventory file from https://bugzilla.redhat.com/attachment.cgi?id=1292805, except that you removed openshift_logging_fluentd_use_journal=false from the inventory file?

If so . . .

> 
> # oc rsh logging-fluentd-2v1dm
> sh-4.2# env | grep JOUR
> USE_JOURNAL=true

. . . then this is correct.  The default behavior for openshift is to use `--log-driver=journald`.  And it appears that fluentd is detecting this and setting USE_JOURNAL=true correctly.

What does:

docker info | grep -i log

say?  If it says "json-file" then yes, this bug is somehow due to the inability of the logging code in openshift-ansible or fluentd to detect the log driver.

If it says "journald", then we need to figure out why the logs are not showing up in Kibana.  Please provide the following additional information:

cat /var/log/journal.pos

journalctl -o export | grep CONTAINER_NAME | tail

oc exec $espod -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/project.*/_search?sort=@timestamp:desc | python -mjson.tool | more

date -u


> 
> Images tested with:
> logging-fluentd            v3.6                ba4e34d67dbe        24 hours
> ago        231.5 MB
> 
> # rpm -qa | grep ansible
> openshift-ansible-callback-plugins-3.6.133-1.git.0.950bb48.el7.noarch
> openshift-ansible-docs-3.6.133-1.git.0.950bb48.el7.noarch
> openshift-ansible-lookup-plugins-3.6.133-1.git.0.950bb48.el7.noarch
> openshift-ansible-filter-plugins-3.6.133-1.git.0.950bb48.el7.noarch
> openshift-ansible-playbooks-3.6.133-1.git.0.950bb48.el7.noarch
> ansible-2.2.3.0-1.el7.noarch
> openshift-ansible-3.6.133-1.git.0.950bb48.el7.noarch
> openshift-ansible-roles-3.6.133-1.git.0.950bb48.el7.noarch
> 
> # openshift version
> openshift v3.6.133
> kubernetes v1.6.1+5115d708d7
> etcd 3.2.1

Comment 17 Xia Zhao 2017-07-06 02:10:58 UTC
(In reply to Rich Megginson from comment #16)
> (In reply to Xia Zhao from comment #14)

> Are you saying that you were able to reproduce the problem, by using the
> same inventory file from
> https://bugzilla.redhat.com/attachment.cgi?id=1292805, except that you
> removed openshift_logging_fluentd_use_journal=false from the inventory file?
Yes, exactly.

Please note that I'm testing with json-file log driver on my scenario: On the  openshift master and node machine, I set "OPTIONS=' --log-driver=json-file' " in /etc/sysconfig/docker.

I don't have such a file touched in my systems /etc/docker/daemon.json, so when I reopen this bz, please consider it no longer related with comment #6 here.

BTW, journald log driver worked fine in my testing these days.

Comment 18 Xia Zhao 2017-07-06 02:17:30 UTC
bug title changed to reflecting the problem more accurately.

Comment 19 Rich Megginson 2017-07-06 02:23:32 UTC
(In reply to Xia Zhao from comment #17)
> (In reply to Rich Megginson from comment #16)
> > (In reply to Xia Zhao from comment #14)
> 
> > Are you saying that you were able to reproduce the problem, by using the
> > same inventory file from
> > https://bugzilla.redhat.com/attachment.cgi?id=1292805, except that you
> > removed openshift_logging_fluentd_use_journal=false from the inventory file?
> Yes, exactly.
> 
> Please note that I'm testing with json-file log driver on my scenario: On
> the  openshift master and node machine, I set "OPTIONS='
> --log-driver=json-file' " in /etc/sysconfig/docker.

When?  After install?  I'm still trying to understand the exact steps I need to reproduce this problem.

0) start with a clean machine

1) Install OpenShift + logging - with the above inventory file, with openshift_logging_fluentd_use_journal removed?

Note that OpenShift will configure docker to use --log-driver=journald by default.  I'm not sure if it uses /etc/docker/daemon.json or /etc/sysconfig/docker to do this.  So after this step, what does `docker info|grep -i log` say?  Also note that this will set USE_JOURNAL=true in the fluentd daemonset.  So if you change docker to use a different log driver at this point, you will either have to edit the fluentd daemonset to change the USE_JOURNAL setting, or remove it altogether from the daemonset.

Or, were you somehow able to install OpenShift and tell ansible to deploy docker with --log-driver=json-file?

2) edit /etc/sysconfig/docker on the master and node machines to have OPTIONS='--log-driver=json-file'

See above.  Since OpenShift was deployed with journald, the ansible logging code detected this, and set USE_JOURNAL=true in the fluentd daemonset.  So if you change the docker logging driver at this point, you will either have to edit the fluentd daemonset to change the USE_JOURNAL setting, or remove it altogether from the daemonset.


> 
> I don't have such a file touched in my systems /etc/docker/daemon.json, so
> when I reopen this bz, please consider it no longer related with comment #6
> here.
> 
> BTW, journald log driver worked fine in my testing these days.

Probably because that is the default when you deploy OpenShift.

Comment 20 Rich Megginson 2017-07-06 02:24:53 UTC
Note that if you deploy OpenShift + logging with openshift_logging_fluentd_use_journal=false, _but you do not tell OpenShift ansible to deploy docker using --log-driver=json-file, you will just confuse fluentd, and it will not be able to read the logs_.

Comment 21 Xia Zhao 2017-07-06 03:05:11 UTC
@Rich,

Yeah, I payed attention to your warning during test, and am clear with the logic where I'm testing with -- the json-file log driver scenario used to work for me in the past. 

In my scenario I changed docker log driver right after openshift is installed, and before logging was deployed. So please consider that I'm eventually working with a non-default json-file log driver openshift system, instead of the default journald log driver. 

Here is what I get when querying the docker log driver on both master/node machines:

# docker info|grep -i log
Logging Driver: json-file

As a clarification, I'm not aiming at confuse fluentd by setting conflict value to openshift_logging_fluentd_use_journal than the real log dirver of openshift, that's not my case.

About this question:
> were you somehow able to install OpenShift and tell ansible to deploy docker with --log-driver=json-file?
The answer is yes, let me try to find it out from the openshift cluster installation's inventory file, will update you later.

Comment 22 Xia Zhao 2017-07-06 06:49:03 UTC
(In reply to Xia Zhao from comment #21)
> @Rich,

> About this question:
> > were you somehow able to install OpenShift and tell ansible to deploy docker with --log-driver=json-file?
> The answer is yes, let me try to find it out from the openshift cluster
> installation's inventory file, will update you later.

According to https://github.com/openshift/openshift-ansible/pull/2681/,
we could launch the jenkins job adding "openshift_docker_options: --log-driver=journald" to overwrite the installer's default parameter

Comment 23 Rich Megginson 2017-07-06 13:45:57 UTC
(In reply to Xia Zhao from comment #22)
> (In reply to Xia Zhao from comment #21)
> > @Rich,
> 
> > About this question:
> > > were you somehow able to install OpenShift and tell ansible to deploy docker with --log-driver=json-file?
> > The answer is yes, let me try to find it out from the openshift cluster
> > installation's inventory file, will update you later.
> 
> According to https://github.com/openshift/openshift-ansible/pull/2681/,
> we could launch the jenkins job adding "openshift_docker_options:
> --log-driver=journald" to overwrite the installer's default parameter

But is that what you did?

Comment 25 openshift-github-bot 2017-07-06 19:10:16 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/c9e92b47333d22a3cf47e76ac7248deb23263500
Bug 1466152 - Json-file log driver: Neither "openshift_logging_fluentd_use_journal=false" nor omitted collects the log entries

https://bugzilla.redhat.com/show_bug.cgi?id=1466152
Do not set openshift_logging_fluentd_use_journal or USE_JOURNAL at
all unless it is explicitly set as an ansible param.  It is almost
always better to let fluentd figure out which log driver docker
is using.

https://github.com/openshift/openshift-ansible/commit/99ff4fdca9a24a1d5e3953fc7836aee898e62101
Merge pull request #4696 from richm/bug-1466152

Bug 1466152 - Json-file log driver: Neither "openshift_logging_fluentd_use_journal=false" nor omitted collects the log entries

Comment 26 Xia Zhao 2017-07-07 01:59:08 UTC
(In reply to Rich Megginson from comment #23)
> (In reply to Xia Zhao from comment #22)
> > (In reply to Xia Zhao from comment #21)
> > > @Rich,
> > 
> > > About this question:
> > > > were you somehow able to install OpenShift and tell ansible to deploy docker with --log-driver=json-file?
> > > The answer is yes, let me try to find it out from the openshift cluster
> > > installation's inventory file, will update you later.
> > 
> > According to https://github.com/openshift/openshift-ansible/pull/2681/,
> > we could launch the jenkins job adding "openshift_docker_options:
> > --log-driver=journald" to overwrite the installer's default parameter
> 
> But is that what you did?

My test env is exactly the same.

Comment 27 Rich Megginson 2017-07-07 18:46:42 UTC
fix is in openshift-ansible-3.6.137-1.git.0.12654fb.el7

Comment 28 Xia Zhao 2017-07-10 07:21:48 UTC
Verified with openshift-ansible-playbooks-3.6.139-1.git.0.4ff49c6.el7.noarch, this issue was fixed well. Can get log entries from kibana UI for mgmt/user indices when openshift_logging_fluentd_use_journal is not specified in inventory file.

Comment 32 errata-xmlrpc 2017-11-28 21:59:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.