Created attachment 1292803 [details] fluentd log Description of problem: This issue only happen for json-file log driver: Deploy logging without ops cluster enabled, wait enough time until es is in GREEN status, the log entries from user indices are not presented on logging UI, only those within indices .all and .operations presented on logging UI. Please refer to attached screenshot. Version-Release number of selected component (if applicable): openshift3/logging-kibana ba0b9829ea3d openshift3/logging-kibana c77868ea2334 openshift3/logging-elasticsearch b941ca06dd0d openshift3/logging-fluentd 81853532d392 openshift3/logging-auth-proxy fb9141c52db0 openshift3/logging-curator 757cf4948069 ansible version: openshift-ansible-playbooks-3.6.126.1-1.git.0.41d2313.el7.noarch # openshift version openshift v3.6.126.1 kubernetes v1.6.1+5115d708d7 etcd 3.2.0 How reproducible: Always Steps to Reproduce: 1.Install OCP 3.6 with json-file log driver enabled on master and nodes 2.Deploy logging without ops cluster enabled, no matter if mux service is deployed 3.Wait enough time until EFK stacks are all on, and see es is in GREEN status 4.Check if log entries from user indices are not presented on logging UI Actual results: The log entries from user indices are not presented on logging UI, only those within indices .all and .operations presented on logging UI. Please refer to attached screenshot. Expected results: The log entries from user indices are not presented on logging UI Additional info: es and fluentd logs are attached test env provided
Created attachment 1292804 [details] es log
Created attachment 1292805 [details] inventory file used for logging deployment
Created attachment 1292806 [details] screenshot of logging UI
Please ignore the image versions in comment #0, they should be: openshift3/logging-elasticsearch 0d16be758fca openshift3/logging-kibana ba0b9829ea3d openshift3/logging-fluentd 05db4f17fc26 openshift3/logging-auth-proxy 22560dbd7008 openshift3/logging-curator 757cf4948069
I found the same issue: Description: Set log-driver as json-file in /etc/docker/daemon.json, and not enabled ops cluster, only ".operations.*" and ".all" could be viewd in Kibana UI, no logs could be shown in Kibana within 30 minutes, if select the time range as "Today", there are ".operations.*" and ".all" logs could be shown. See the attached pictures. Steps to Reproduce: 1. Set log driver as json-file in /etc/docker/daemon.json # cat /etc/docker/daemon.json { "log-driver": "json-file" } Comment out the log driver settings in /etc/sysconfig/docker # cat /etc/sysconfig/docker # /etc/sysconfig/docker # Modify these options if you want to change the way the docker daemon runs # OPTIONS=' --selinux-enabled --log-driver=journald' 2. Restart docker: systemctl restart docker and check log dirver by 'docker info | grep "Logging Driver"' 3. Deploy logging 3.6.0 and check Kibana UI Actual results: 2. # docker info | grep "Logging Driver" Logging Driver: json-file 3. only ".operations.*" could be viewd in Kibana UI Expected results: 3. log entries could be found in Kibana UI Additional info: Inventory file [OSEv3:children] masters [masters] ${MASTER} openshift_public_hostname=${MASTER} [OSEv3:vars] ansible_ssh_user=root ansible_ssh_private_key_file="~/libra.pem" deployment_type=openshift-enterprise # Logging openshift_logging_install_logging=true openshift_logging_kibana_hostname=kibana.${SUB_DOMAIN} openshift_logging_kibana_ops_hostname=kibana-ops.${SUB_DOMAIN} public_master_url=https://${MASTER}:${PORT} openshift_logging_image_prefix=${IMAGE_PREFIX} openshift_logging_image_version=v3.6 openshift_logging_namespace=logging
Created attachment 1292834 [details] kibana UI, time range is within 30 minutes
Created attachment 1292836 [details] kibna UI, select time range as today
* Is it possible for you to provid the image versions (e.g. 3.6.1-4) instead of the SHA? That would help us further understand which version you are dealing with since I have no way to ref the sha. * What logs are you expecting from which projects? Fluentd will only read from the end of a file unless you specify otherwise. Also, this change https://github.com/openshift/origin-aggregated-logging/pull/506 modifies the kibana mappings presented to a user. You will only see mappings for project that have data? Is your pod producing log messages?
Tested today, same phenomenon as this defect reported. 1. Attached specified image id logging-elasticsearch v3.6.126.4-1 7969b6dcb11e 4 hours ago 404.2 MB logging-kibana v3.6.126.4-1 5349edde24e5 4 hours ago 342.4 MB logging-auth-proxy v3.6.126.4-1 373f69a74465 4 hours ago 214.8 MB logging-fluentd v3.6.126.4-1 218c2e3246e4 4 hours ago 232.5 MB logging-curator 3.6.99-3 757cf4948069 2 days ago 221.5 MB 2. only .all and .operation indices were presented on logging UI, created user project, user project log index and logging project index could not be found on Kibana UI. See the attached picture "screenshot of logging UI" # oc rsh ${ES_POD} sh-4.2$ env | grep -i ver OSE_ES_VER=2.4.4.9 RECOVER_EXPECTED_NODES=1 ES_VER=2.4.4 RECOVER_AFTER_NODES=1 ES_CLOUD_K8S_VER=2.4.4 RECOVER_AFTER_TIME=5m JAVA_VER=1.8.0
Once again, I regret ever adding these settings to be exposed to end users . . . Note: DO NOT SET USE_JOURNAL or openshift_logging_fluentd_use_journal. Leave them out of the inventory. Do not set USE_JOURNAL in the fluentd daemonset env. This setting only works if it is the same as the docker log driver setting. If you set openshift_logging_fluentd_use_journal=true and/or USE_JOURNAL=true and docker --log-driver=json-file you will not get logs. If you set openshift_logging_fluentd_use_journal=false and/or USE_JOURNAL=false and docker --log-driver=journald you will not get logs. If you do not set openshift_logging_fluentd_use_journal, and do not set USE_JOURNAL, then fluentd should figure out which log driver is being used. This is what I did: Install the latest 3.6 with logging - NOTE: docker uses --log-driver=journald by default. I edited /etc/sysconfig/docker to look like this: OPTIONS=' --selinux-enabled ' #OPTIONS=' --selinux-enabled --log-driver=journald' NOTE: You have to leave the '--selinux-enabled' in there - you can't comment that part out. I created /etc/docker/daemon.json to look like this: { "log-driver": "json-file" } I restarted docker: systemctl restart docker I edited the fluentd daemonset to make sure that USE_JOURNAL did not set a value. This restarted the fluentd pods. I am able to see the latest container logs: # oc exec $espod -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/project.*/_search?sort=@timestamp:desc | python -mjson.tool | more { "_shards": { ... "_index": "project.logging.b04b2f5a-5ce3-11e7-91f1-02ea503f0558. 2017.06.30", "_score": null, "_source": { "@timestamp": "2017-06-30T17:20:42.000000+00:00", ... # date -u Fri Jun 30 17:21:33 UTC 2017 As you can see, Elasticsearch is pretty much up-to-date with the latest container logs.
Also - openshift-ansible will set USE_JOURNAL in the fluentd daemonset like this: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging/defaults/main.yml#L75 That is, it will use the setting from the following ansible parameters, in order, if set: openshift_hosted_logging_use_journal docker_log_driver openshift.docker.log_driver openshift.docker.options If any of these are set to use journald or json-file, then the fluentd daemonset USE_JOURNAL will be set to "true" or "false". So, if you want to test changing the log driver to journald or json-file, YOU MUST ALSO EDIT daemonset logging-fluentd AND REMOVE THE VALUE from USE_JOURNAL.
@Rich, log entries could be found on Kibana UI now, but I want to write the followings: I edited /etc/sysconfig/docker to look like this: OPTIONS=' --selinux-enabled', I created /etc/docker/daemon.json to look like this: { "log-driver": "json-file" } restarted docker: systemctl restart docker,log driver is json-file now, then I deployed logging 3.6.0(did not set openshift_hosted_logging_use_journal in inventory file). and found the USE_JOURNAL value is true # oc get daemonset logging-fluentd -o yaml | grep -i USE_JOURNAL -C 2 - name: ES_COPY value: "false" - name: USE_JOURNAL value: "true" - name: JOURNAL_SOURCE I don't think it is a good idea to let customer do the manual change to make sure that USE_JOURNAL did not set a value and restart flunted pods again, and I think our program should not set USE_JOURNAL=true or USE_JOURNAL=false, it should leave this parameter's value to empty
Removed openshift_logging_fluentd_use_journal in inventory file, the original issue in comment #0 still repro. Looked into fluentd pod, seems the root cause is the auto detect log driver function is broken, fluentd is using journald log driver even openshift_logging_fluentd_use_journal is not specified in inventory file: # oc rsh logging-fluentd-2v1dm sh-4.2# env | grep JOUR USE_JOURNAL=true Images tested with: logging-fluentd v3.6 ba4e34d67dbe 24 hours ago 231.5 MB # rpm -qa | grep ansible openshift-ansible-callback-plugins-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-docs-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-lookup-plugins-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-filter-plugins-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-playbooks-3.6.133-1.git.0.950bb48.el7.noarch ansible-2.2.3.0-1.el7.noarch openshift-ansible-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-roles-3.6.133-1.git.0.950bb48.el7.noarch # openshift version openshift v3.6.133 kubernetes v1.6.1+5115d708d7 etcd 3.2.1
Please ignore the repro steps in comment #6 since it's different than comment #0, and the test configuration issue was fixed there.
(In reply to Xia Zhao from comment #14) > Removed openshift_logging_fluentd_use_journal in inventory file, the > original issue in comment #0 still repro. > > Looked into fluentd pod, seems the root cause is the auto detect log driver > function is broken, fluentd is using journald log driver even > openshift_logging_fluentd_use_journal is not specified in inventory file: Are you saying that you were able to reproduce the problem, by using the same inventory file from https://bugzilla.redhat.com/attachment.cgi?id=1292805, except that you removed openshift_logging_fluentd_use_journal=false from the inventory file? If so . . . > > # oc rsh logging-fluentd-2v1dm > sh-4.2# env | grep JOUR > USE_JOURNAL=true . . . then this is correct. The default behavior for openshift is to use `--log-driver=journald`. And it appears that fluentd is detecting this and setting USE_JOURNAL=true correctly. What does: docker info | grep -i log say? If it says "json-file" then yes, this bug is somehow due to the inability of the logging code in openshift-ansible or fluentd to detect the log driver. If it says "journald", then we need to figure out why the logs are not showing up in Kibana. Please provide the following additional information: cat /var/log/journal.pos journalctl -o export | grep CONTAINER_NAME | tail oc exec $espod -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/project.*/_search?sort=@timestamp:desc | python -mjson.tool | more date -u > > Images tested with: > logging-fluentd v3.6 ba4e34d67dbe 24 hours > ago 231.5 MB > > # rpm -qa | grep ansible > openshift-ansible-callback-plugins-3.6.133-1.git.0.950bb48.el7.noarch > openshift-ansible-docs-3.6.133-1.git.0.950bb48.el7.noarch > openshift-ansible-lookup-plugins-3.6.133-1.git.0.950bb48.el7.noarch > openshift-ansible-filter-plugins-3.6.133-1.git.0.950bb48.el7.noarch > openshift-ansible-playbooks-3.6.133-1.git.0.950bb48.el7.noarch > ansible-2.2.3.0-1.el7.noarch > openshift-ansible-3.6.133-1.git.0.950bb48.el7.noarch > openshift-ansible-roles-3.6.133-1.git.0.950bb48.el7.noarch > > # openshift version > openshift v3.6.133 > kubernetes v1.6.1+5115d708d7 > etcd 3.2.1
(In reply to Rich Megginson from comment #16) > (In reply to Xia Zhao from comment #14) > Are you saying that you were able to reproduce the problem, by using the > same inventory file from > https://bugzilla.redhat.com/attachment.cgi?id=1292805, except that you > removed openshift_logging_fluentd_use_journal=false from the inventory file? Yes, exactly. Please note that I'm testing with json-file log driver on my scenario: On the openshift master and node machine, I set "OPTIONS=' --log-driver=json-file' " in /etc/sysconfig/docker. I don't have such a file touched in my systems /etc/docker/daemon.json, so when I reopen this bz, please consider it no longer related with comment #6 here. BTW, journald log driver worked fine in my testing these days.
bug title changed to reflecting the problem more accurately.
(In reply to Xia Zhao from comment #17) > (In reply to Rich Megginson from comment #16) > > (In reply to Xia Zhao from comment #14) > > > Are you saying that you were able to reproduce the problem, by using the > > same inventory file from > > https://bugzilla.redhat.com/attachment.cgi?id=1292805, except that you > > removed openshift_logging_fluentd_use_journal=false from the inventory file? > Yes, exactly. > > Please note that I'm testing with json-file log driver on my scenario: On > the openshift master and node machine, I set "OPTIONS=' > --log-driver=json-file' " in /etc/sysconfig/docker. When? After install? I'm still trying to understand the exact steps I need to reproduce this problem. 0) start with a clean machine 1) Install OpenShift + logging - with the above inventory file, with openshift_logging_fluentd_use_journal removed? Note that OpenShift will configure docker to use --log-driver=journald by default. I'm not sure if it uses /etc/docker/daemon.json or /etc/sysconfig/docker to do this. So after this step, what does `docker info|grep -i log` say? Also note that this will set USE_JOURNAL=true in the fluentd daemonset. So if you change docker to use a different log driver at this point, you will either have to edit the fluentd daemonset to change the USE_JOURNAL setting, or remove it altogether from the daemonset. Or, were you somehow able to install OpenShift and tell ansible to deploy docker with --log-driver=json-file? 2) edit /etc/sysconfig/docker on the master and node machines to have OPTIONS='--log-driver=json-file' See above. Since OpenShift was deployed with journald, the ansible logging code detected this, and set USE_JOURNAL=true in the fluentd daemonset. So if you change the docker logging driver at this point, you will either have to edit the fluentd daemonset to change the USE_JOURNAL setting, or remove it altogether from the daemonset. > > I don't have such a file touched in my systems /etc/docker/daemon.json, so > when I reopen this bz, please consider it no longer related with comment #6 > here. > > BTW, journald log driver worked fine in my testing these days. Probably because that is the default when you deploy OpenShift.
Note that if you deploy OpenShift + logging with openshift_logging_fluentd_use_journal=false, _but you do not tell OpenShift ansible to deploy docker using --log-driver=json-file, you will just confuse fluentd, and it will not be able to read the logs_.
@Rich, Yeah, I payed attention to your warning during test, and am clear with the logic where I'm testing with -- the json-file log driver scenario used to work for me in the past. In my scenario I changed docker log driver right after openshift is installed, and before logging was deployed. So please consider that I'm eventually working with a non-default json-file log driver openshift system, instead of the default journald log driver. Here is what I get when querying the docker log driver on both master/node machines: # docker info|grep -i log Logging Driver: json-file As a clarification, I'm not aiming at confuse fluentd by setting conflict value to openshift_logging_fluentd_use_journal than the real log dirver of openshift, that's not my case. About this question: > were you somehow able to install OpenShift and tell ansible to deploy docker with --log-driver=json-file? The answer is yes, let me try to find it out from the openshift cluster installation's inventory file, will update you later.
(In reply to Xia Zhao from comment #21) > @Rich, > About this question: > > were you somehow able to install OpenShift and tell ansible to deploy docker with --log-driver=json-file? > The answer is yes, let me try to find it out from the openshift cluster > installation's inventory file, will update you later. According to https://github.com/openshift/openshift-ansible/pull/2681/, we could launch the jenkins job adding "openshift_docker_options: --log-driver=journald" to overwrite the installer's default parameter
(In reply to Xia Zhao from comment #22) > (In reply to Xia Zhao from comment #21) > > @Rich, > > > About this question: > > > were you somehow able to install OpenShift and tell ansible to deploy docker with --log-driver=json-file? > > The answer is yes, let me try to find it out from the openshift cluster > > installation's inventory file, will update you later. > > According to https://github.com/openshift/openshift-ansible/pull/2681/, > we could launch the jenkins job adding "openshift_docker_options: > --log-driver=journald" to overwrite the installer's default parameter But is that what you did?
https://github.com/openshift/openshift-ansible/pull/4696
Commits pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/c9e92b47333d22a3cf47e76ac7248deb23263500 Bug 1466152 - Json-file log driver: Neither "openshift_logging_fluentd_use_journal=false" nor omitted collects the log entries https://bugzilla.redhat.com/show_bug.cgi?id=1466152 Do not set openshift_logging_fluentd_use_journal or USE_JOURNAL at all unless it is explicitly set as an ansible param. It is almost always better to let fluentd figure out which log driver docker is using. https://github.com/openshift/openshift-ansible/commit/99ff4fdca9a24a1d5e3953fc7836aee898e62101 Merge pull request #4696 from richm/bug-1466152 Bug 1466152 - Json-file log driver: Neither "openshift_logging_fluentd_use_journal=false" nor omitted collects the log entries
(In reply to Rich Megginson from comment #23) > (In reply to Xia Zhao from comment #22) > > (In reply to Xia Zhao from comment #21) > > > @Rich, > > > > > About this question: > > > > were you somehow able to install OpenShift and tell ansible to deploy docker with --log-driver=json-file? > > > The answer is yes, let me try to find it out from the openshift cluster > > > installation's inventory file, will update you later. > > > > According to https://github.com/openshift/openshift-ansible/pull/2681/, > > we could launch the jenkins job adding "openshift_docker_options: > > --log-driver=journald" to overwrite the installer's default parameter > > But is that what you did? My test env is exactly the same.
fix is in openshift-ansible-3.6.137-1.git.0.12654fb.el7
Verified with openshift-ansible-playbooks-3.6.139-1.git.0.4ff49c6.el7.noarch, this issue was fixed well. Can get log entries from kibana UI for mgmt/user indices when openshift_logging_fluentd_use_journal is not specified in inventory file.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188