Description of problem: There are no any projects.* index in elasticsearch. fluend log show there are lots of orphaned and the ES stack fill in orphaned documents. Version-Release number of selected component (if applicable): ocp:v3.9.0-0.16.0 openshift3/logging-fluentd/images/v3.9.0-0.16.0.2 How reproducible: always Steps to Reproduce: 1. deploy loggging 2. create projects and applications oc new-project anlitest oc new-app httpd-example 3. check the indices in Elasticsearch #oc exec -c elasticsearch logging-es-data-master-y8m32zuh-2-6dfwl -- curl -s -XGET --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/indices?v 4. Check the fluentd logs Actual results: step 3) No projects.xxx index #oc exec -c elasticsearch logging-es-data-master-y8m32zuh-2-6dfwl -- curl -s -XGET --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/indices?v health status index pri rep docs.count docs.deleted store.size pri.store.size green open .searchguard.logging-es-data-master-y8m32zuh 1 0 5 0 33.5kb 33.5kb green open .operations.2018.01.10 1 0 5688 0 3.3mb 3.3mb green open .kibana 1 0 1 0 3.1kb 3.1kb green open .orphaned.2018.01.10 1 0 319114 0 502.2mb 502.2mb Step 4) lot of error as following 2018-01-10 00:02:16 -0500 [error]: record cannot use elasticsearch index name type project_full: record is missing kubernetes.namespace_id field: {"docker"=>{"container_id"=>"ea71e08fceefe10b4499b6deba0d64a20bc9163dcb931b29dfd65b7b96704966"}, "kubernetes"=>{"container_name"=>"fluentd-elasticsearch", "namespace_name"=>"logging", "pod_name"=>"logging-fluentd-lc6qm", "pod_id"=>"64a74cb5-f5c3-11e7-9e19-fa163e78c39e", "labels"=>{"component"=>"fluentd", "controller-revision-hash"=>"254787015", "logging-infra"=>"fluentd", "pod-template-generation"=>"3", "provider"=>"openshift"}, "host"=>"192.168.1.223", "master_url"=>"https://kubernetes.default.svc.cluster.local"}, "message"=>"umount: /var/lib/docker/containers/fc901a4e983f1639b4c09ca045fae84cf1aa879c0342be5004429e27ac45ce74/shm: not mounted\n", "level"=>"err", "hostname"=>"192.168.1.223", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.128.0.17", "ipaddr6"=>"fe80::403:aeff:fe5c:886b", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2018-01-10T05:02:16.571670+00:00", "version"=>"0.12.42 1.6.0"}}, "@timestamp"=>"2018-01-10T05:02:10.186175+00:00"} 2018-01-10 00:02:16 -0500 [error]: record cannot use elasticsearch index name type project_full: record is missing kubernetes.namespace_id field: {"docker"=>{"container_id"=>"ea71e08fceefe10b4499b6deba0d64a20bc9163dcb931b29dfd65b7b96704966"}, "kubernetes"=>{"container_name"=>"fluentd-elasticsearch", "namespace_name"=>"logging", "pod_name"=>"logging-fluentd-lc6qm", "pod_id"=>"64a74cb5-f5c3-11e7-9e19-fa163e78c39e", "labels"=>{"component"=>"fluentd", "controller-revision-hash"=>"254787015", "logging-infra"=>"fluentd", "pod-template-generation"=>"3", "provider"=>"openshift"}, "host"=>"192.168.1.223", "master_url"=>"https://kubernetes.default.svc.cluster.local"}, "message"=>"umount: /var/lib/docker/containers/ff72869a18e86f380e7b5b6b31928372a9f71e633ae4b66d57007cfa5285ca4c/shm: not mounted\n", "level"=>"err", "hostname"=>"192.168.1.223", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.128.0.17", "ipaddr6"=>"fe80 Expected results: There are indices named like projects.logging.**.2018.01.10, projects.anlitest.**.2018.01.10 There isn't so many .orphaned documents Additional info:
Created attachment 1379359 [details] Log dump files
Created attachment 1379361 [details] fluentd log Two many fluent logs, I only attached one from the openshift master.
Container logs collection testing are blocked.
For the containers log can be found in Openshift:v3.9.0-0.19.0 when using rpm installation, so move bug to medium Severity. I will continue to check if the issue exist in containerized installation.
Still no the project index in ES when use logging-fluentd/v3.9.0-0.20.0.0 in one cluster. There are two many fluentd logs. Will try again when https://bugzilla.redhat.com/show_bug.cgi?id=1531157 is fixed.
After running for a few minutes, there are so many es pods which status is Evicted, describe one es pod, and find the events: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Evicted 4m kubelet, qe-juzhao-39-gcs-1-nrr-1 The node was low on resource: [DiskPressure]. # oc get po | grep logging-es logging-es-data-master-ea1bmc86-1-5tnqs 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-78rvl 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-8v22l 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-b57vf 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-djdhq 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-fd8bb 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-gx48r 0/2 Evicted 0 44m logging-es-data-master-ea1bmc86-1-h9ksk 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-jbvn2 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-jfddt 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-jfxs7 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-lm25j 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-m7fds 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-mvrwz 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-pmmfs 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-pt78j 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-sc66g 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-tdm8w 0/2 Pending 0 4m logging-es-data-master-ea1bmc86-1-zdbpv 0/2 Evicted 0 4m logging-es-data-master-ea1bmc86-1-zwhq4 0/2 Evicted 0 4m
The project indices can be found with the following workaround. 1. delete /var/log/es-containers.log.pos 2. modify daemonset to v3.7 fluentd 3. modify daemonset back to v3.9 fluentd
(In reply to Junqi Zhao from comment #6) > After running for a few minutes, there are so many es pods which status is > Evicted, describe one es pod, and find the events: > Events: > Type Reason Age From Message > ---- ------ ---- ---- ------- > Warning Evicted 4m kubelet, qe-juzhao-39-gcs-1-nrr-1 The node was > low on resource: [DiskPressure]. If you deploy logging on a system with enough disk space, does it fix this bug? > > # oc get po | grep logging-es > logging-es-data-master-ea1bmc86-1-5tnqs 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-78rvl 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-8v22l 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-b57vf 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-djdhq 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-fd8bb 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-gx48r 0/2 Evicted 0 44m > logging-es-data-master-ea1bmc86-1-h9ksk 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-jbvn2 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-jfddt 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-jfxs7 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-lm25j 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-m7fds 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-mvrwz 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-pmmfs 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-pt78j 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-sc66g 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-tdm8w 0/2 Pending 0 4m > logging-es-data-master-ea1bmc86-1-zdbpv 0/2 Evicted 0 4m > logging-es-data-master-ea1bmc86-1-zwhq4 0/2 Evicted 0 4m
(In reply to Rich Megginson from comment #8) > (In reply to Junqi Zhao from comment #6) > > After running for a few minutes, there are so many es pods which status is > > Evicted, describe one es pod, and find the events: > > Events: > > Type Reason Age From Message > > ---- ------ ---- ---- ------- > > Warning Evicted 4m kubelet, qe-juzhao-39-gcs-1-nrr-1 The node was > > low on resource: [DiskPressure]. > > If you deploy logging on a system with enough disk space, does it fix this > bug? It is not reproduced every time, I did not find this issue today, so not sure about your question
Moving to Modified with the merge of https://github.com/openshift/origin-aggregated-logging/pull/898
The fix isn't in 3.9.0-0.22.0.0 waiting for the next build
The fix wasn't merged into logging-fluentd/images/v3.9.0-0.23.0.0 sh-4.2# gem list |grep fluent-plugin-kubernetes_metadata_filter fluent-plugin-kubernetes_metadata_filter (0.33.0)
(In reply to Rich Megginson from comment #8) > (In reply to Junqi Zhao from comment #6) > > After running for a few minutes, there are so many es pods which status is > > Evicted, describe one es pod, and find the events: > > Events: > > Type Reason Age From Message > > ---- ------ ---- ---- ------- > > Warning Evicted 4m kubelet, qe-juzhao-39-gcs-1-nrr-1 The node was > > low on resource: [DiskPressure]. > > If you deploy logging on a system with enough disk space, does it fix this > bug? Reproduced today, maybe related to https://bugzilla.redhat.com/show_bug.cgi?id=1531157
The bug is fixed on logging-fluentd:v3.9.0-0.24.0.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489