Bug 1408397 - Sometimes, the ES logs are not completely displayed
Summary: Sometimes, the ES logs are not completely displayed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 3.4.z
Assignee: Jeff Cantrill
QA Contact: Xia Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-23 09:31 UTC by Xia Zhao
Modified: 2020-07-16 09:05 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-06-15 18:36:38 UTC
Target Upstream Version:


Attachments (Terms of Use)
es log #2 (7.64 KB, text/plain)
2016-12-23 09:32 UTC, Xia Zhao
no flags Details
es_ops log #1 (464 bytes, text/plain)
2016-12-23 09:33 UTC, Xia Zhao
no flags Details
es_ops log #2 (4.76 KB, text/plain)
2016-12-23 09:33 UTC, Xia Zhao
no flags Details
kibana log (84.13 KB, text/plain)
2016-12-23 09:34 UTC, Xia Zhao
no flags Details
kibana_ops log (90.00 KB, text/plain)
2016-12-23 09:39 UTC, Xia Zhao
no flags Details
Attached ES two pods log (28.28 KB, text/plain)
2017-01-04 09:23 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1425 0 normal SHIPPED_LIVE OpenShift Container Platform 3.5, 3.4, 3.3, and 3.2 bug fix update 2017-06-15 22:35:53 UTC

Comment 1 Xia Zhao 2016-12-23 09:32:44 UTC
Created attachment 1234970 [details]
es log #2

Comment 2 Xia Zhao 2016-12-23 09:33:15 UTC
Created attachment 1234971 [details]
es_ops log #1

Comment 3 Xia Zhao 2016-12-23 09:33:38 UTC
Created attachment 1234972 [details]
es_ops log #2

Comment 4 Xia Zhao 2016-12-23 09:34:00 UTC
Created attachment 1234973 [details]
kibana log

Comment 5 Xia Zhao 2016-12-23 09:39:10 UTC
Created attachment 1234977 [details]
kibana_ops log

Comment 7 Junqi Zhao 2017-01-04 09:20:39 UTC
Tested on OCP 3.4.0 V38 + Logging 3.4.0, this issue doesn't re-produced.
I guess this issue is related to memory size,  I used "vm_type: n1-standard-2" for GCE, it tooks a about one hour to show the logs on Kibana, much faster than xiazhao's two hours.

oc get configmap logging-deployer -o yaml
apiVersion: v1
data:
  enable-ops-cluster: "true"
  es-cluster-size: "2"
  es-instance-ram: 1G
  es-pvc-dynamic: "true"
  es-pvc-prefix: ""
  es-pvc-size: ""
  kibana-hostname: kibana.0104-03q.qe.rhcloud.com
  kibana-ops-hostname: kibana-ops.0104-03q.qe.rhcloud.com
  public-master-url: https://104.155.144.193:8443
  storage-group: ""
  use-journal: "true"
kind: ConfigMap
metadata:
  creationTimestamp: 2017-01-04T07:06:20Z
  name: logging-deployer
  namespace: logging
  resourceVersion: "3608"
  selfLink: /api/v1/namespaces/logging/configmaps/logging-deployer
  uid: 4ba48527-d24c-11e6-a995-42010af0000e


oc get po
NAME                              READY     STATUS      RESTARTS   AGE
logging-curator-1-fghq5           1/1       Running     0          2h
logging-curator-ops-1-z6jwi       1/1       Running     0          2h
logging-deployer-scs62            0/1       Completed   0          2h
logging-es-0sn7afy4-1-onp0i       1/1       Running     0          2h
logging-es-jqsekt6y-1-h3ogz       1/1       Running     0          2h
logging-es-ops-0dusq0sn-1-avulp   1/1       Running     0          2h
logging-es-ops-jrk7pk5s-1-2oz7i   1/1       Running     0          2h
logging-fluentd-faq7m             1/1       Running     0          2h
logging-kibana-1-oajon            2/2       Running     0          2h
logging-kibana-ops-1-216y9        2/2       Running     0          2h

Image ID
openshift3/logging-auth-proxy    e96b37a99960
openshift3/logging-kibana    27f978fc2946
openshift3/logging-fluentd    c493f8b4553b
openshift3/logging-elasticsearch    3ca95a8a9433
openshift3/logging-curator    e39988877cd9
openshift3/logging-deployer    1033ccb0557b

Comment 8 Junqi Zhao 2017-01-04 09:23:58 UTC
Created attachment 1237079 [details]
Attached ES two pods log

Comment 9 Rich Megginson 2017-02-23 17:34:58 UTC
@junqi - Is the problem that elasticsearch is missing data that is found in oc logs?  Is it missing permanently, or does it take too long to show up?

Is this a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1426137 and https://bugzilla.redhat.com/show_bug.cgi?id=1397293 or is this a different issue?

Comment 10 Junqi Zhao 2017-02-24 00:58:09 UTC
(In reply to Rich Megginson from comment #9)
> @junqi - Is the problem that elasticsearch is missing data that is found in
> oc logs?  Is it missing permanently, or does it take too long to show up?
> 
> Is this a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1426137
> and https://bugzilla.redhat.com/show_bug.cgi?id=1397293 or is this a
> different issue?

pre-condition when this issue was found:
use_journal = true, enable_kibana_ops = true and elastic_cluster_size = 2

issue descripton:
ES pod logs were not shown completely when using 'oc logs $es-pod' to check log, for example, it only have the following log for es pod(you could also see attachment  es log #1, and es log #2):
*******************************************************************************
[2016-12-23 07:36:57,142][WARN ][com.floragunn.searchguard.configuration.PrivilegesEvaluator] project.logging.5cf5542b-c8d8-11e6-bf83-42010af00028.* does not exist in cluster metadata
[2016-12-23 08:37:16,023][WARN ][com.floragunn.searchguard.configuration.PrivilegesEvaluator] logstash-* does not exist in cluster metadata
[2016-12-23 08:37:16,024][WARN ][com.floragunn.searchguard.configuration.PrivilegesEvaluator] logstash-* does not exist in cluster metadata

*******************************************************************************


This issue was originally found in OCP 3.4.0, and happened occasionally, It was not happen in my env.

It is not a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1426137, BZ #1426137 was found on logging 3.5 ansible installation.

It is not a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1397293,  we use jourald as logging driver, although it took a long time to show logs on Kibana UI, the key point of this defect is ES logs are not completely displayed

Comment 11 Rich Megginson 2017-02-24 02:52:20 UTC
I don't understand what you mean by "logs are not completely displayed".  Are you saying that the content of es log #1 and es log #2 should be identical?  Because they should not be.  They are the logs of different instances of elasticsearch serving different clients and different roles in the cluster.

Comment 12 Junqi Zhao 2017-02-27 01:08:57 UTC
(In reply to Rich Megginson from comment #11)
> I don't understand what you mean by "logs are not completely displayed". 
> Are you saying that the content of es log #1 and es log #2 should be
> identical?  Because they should not be.  They are the logs of different
> instances of elasticsearch serving different clients and different roles in
> the cluster.

I should explain clearly to you.
See the attached log #1, description of this defect: when the logging stacks are running, there are only 3 lines record in ES pod log, and don't have other logs any more, that is what I mean "logs are not completely displayed". The attached log #2, is one sample of ES log should be like, for example, we can find update_mapping info and Cluster health status changed to GREEN.

Comment 13 Rich Megginson 2017-02-27 03:50:50 UTC
> when the logging stacks are running, there are only 3 lines record in ES pod log, and don't have other logs any more, that is what I mean "logs are not completely displayed".

Why is it a problem if there are only 3 lines record in one of the ES pod logs?

Comment 14 Junqi Zhao 2017-02-27 09:43:57 UTC
(In reply to Rich Megginson from comment #13)
> > when the logging stacks are running, there are only 3 lines record in ES pod log, and don't have other logs any more, that is what I mean "logs are not completely displayed".
> 
> Why is it a problem if there are only 3 lines record in one of the ES pod
> logs?

See the attached file es log #2, this is what the ES log looks like after ES pods are running, we can find user project update_mapping info and Cluster health status changed to GREEN.

This defect is not re-produced in my env and I did not encounter this error before. we can continue to monitor ES log, if this defect does not happen, I think we can close it as 'Not a Bug' or 'Works for me'

Comment 15 Junqi Zhao 2017-02-27 09:45:42 UTC
Changed the status to closed by mistake, reset it.

Comment 16 Rich Megginson 2017-02-27 18:18:48 UTC
I'm not sure how Elasticsearch logging works when running in a cluster, and I'm also not sure if this behavior changed from ES 1.5 to 2.x (OSE 3.3 -> 3.5), or changed from ES 2.1 -> 2.4.  Assuming when fluentd is talking to Elasticsearch through the Service, the Service may do some sort of load balancing/round robin so that fluentd sends data to different Elasticsearch members of the cluster at different times.  So each Elasticsearch may not register the same traffic as the other Elasticsearch.  For example, if fluentd adds an index to ElasticsearchA, we would expect a message like this in the ElasticsearchA log:

[2017-01-04 07:52:55,356][INFO ][cluster.metadata         ] [Earthquake] [project.install-test.5a80392c-d23f-11e6-9bb9-42010af0000e.2017.01.04] creating index, cause [auto(bulk api)], templates [com.redhat.viaq-openshift-project.template.json], shards [1]/[0], mappings [_default_]

But we wouldn't necessarily expect a log message like this in the ElasticsearchB pod log.  We would expect that doing a search on ElasticsearchB would return the data in that index.  But the OpenShift Elasticsearch service hides that from you - if you query Elasticsearch from outside the pod (e.g. via Kibana, not via oc exec $espod -- curl ...), you don't really know if you are talking to ElasticsearchA or B.  Same with fluentd - when fluentd opens a connection to send data to Elasticsearch, it is using the Elasticsearch Service - it may be talking to ElasticsearchA or ElasticsearchB for the duration of its connection.

Does this explain why there are different logs in the ElasticsearchA and ElasticsearchB logs?  If not, can you explain exactly which log messages are completely displayed, and which are not?

I'm including Lukáš and Peter to see if they can help explain why different members of the Elasticsearch cluster would log different messages.

Comment 17 Rich Megginson 2017-03-13 22:43:09 UTC
ping - can someone please comment on https://bugzilla.redhat.com/show_bug.cgi?id=1408397#c16 ?

I would like to close this bug if it really isn't a bug, or somehow fix it.

Comment 18 Junqi Zhao 2017-03-14 00:33:03 UTC
(In reply to Rich Megginson from comment #17)
> ping - can someone please comment on
> https://bugzilla.redhat.com/show_bug.cgi?id=1408397#c16 ?
> 
> I would like to close this bug if it really isn't a bug, or somehow fix it.

Yes, it happened occasionally, and not re-produced in my env, I think we can close it as 'Not a Bug' or 'Works for me' if Lukáš and Peter do not have different opinions.

Comment 19 Rich Megginson 2017-03-14 02:55:55 UTC
(In reply to Junqi Zhao from comment #18)
> (In reply to Rich Megginson from comment #17)
> > ping - can someone please comment on
> > https://bugzilla.redhat.com/show_bug.cgi?id=1408397#c16 ?
> > 
> > I would like to close this bug if it really isn't a bug, or somehow fix it.
> 
> Yes, it happened occasionally, and not re-produced in my env, I think we can
> close it as 'Not a Bug' or 'Works for me' if Lukáš and Peter do not have
> different opinions.

Either this is a bug in Elasticsearch, that it somehow doesn't print its own logs, or it is a bug in docker, that it is missing or not displaying some of the logs it gets from containers/pods.  At this point I don't know which.

Comment 20 Peter Portante 2017-03-14 03:10:22 UTC
We can verify what logs are missing exactly by considering what logs are written in the various ES pods to the local Elasticsearch log files.

Comment 21 Junqi Zhao 2017-03-14 05:45:20 UTC
(In reply to Rich Megginson from comment #19)

> Either this is a bug in Elasticsearch, that it somehow doesn't print its own
> logs, or it is a bug in docker, that it is missing or not displaying some of
> the logs it gets from containers/pods.  At this point I don't know which.

This defect was not happen in my environment when I tried to re-produced it, and it does not exist in our environments now, so we don't know the root cause. In my opinion, we will try to check if this error happens again and if logs are missing according to Perter's comments in Comment 20. If it does not happen again, we should not take so much effort on it, then we can close it.

What do you think?

Comment 22 Lukas Vlcek 2017-03-14 14:28:51 UTC
May be I am missing something here but in general logs for each ES node are expected to be different. Specifically, logs like:

- creating index
- update_mapping
- auto expanded replicas to [?]
- Cluster health status changed from [RED] to [YELLOW]
- Cluster health status changed from [YELLOW] to [GREEN]

can be expected only on current cluster master node. They all refer to cluster state change and at any given point in time there should be only a single node that is in charge of making this change - and that is the master node. Notice, that if existing master node goes away then another master-enabled node is elected to take the role of cluster master and then it will be logs of this node where we will find these logs going forward.

I tested this with ES 5.2.2, two node cluster, no matter if you send index create request to first or second node (9200 or 9201) the "creating index" always appear only in master node log. I am not sure how this worked in previous ES versions but I would be very surprised to find out it worked differently.

FYI in the future releases we can expect more role specific nodes (master only, data only, coordinating [aka client] nodes ... etc) thus we can expect even more diversity in logs.

Other reason why logs are different are typically JVM GC logs, which are really node specific only.

HTH

Comment 23 Peter Portante 2017-05-09 01:00:05 UTC
Is this still a problem with the latest 3.4.1 release?

Comment 24 Xia Zhao 2017-05-09 03:00:32 UTC
@Peter,

Checked with the latest 3.4.1 images, this issue did not repro anymore, can see the full log of elasticsearch there. Please feel free to transfer back to ON_QA for closure, thanks!

Images tested with:
openshift3/logging-deployer    dcee53833a87
openshift3/logging-kibana    e7b6eb3c6d3c
openshift3/logging-elasticsearch    fab3ad9b2410
openshift3/logging-fluentd    10c6287124fc
openshift3/logging-curator    e8b1bbdfa30f
openshift3/logging-auth-proxy    f2750505bbf8

# openshift version
openshift v3.4.1.18
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Comment 25 Xia Zhao 2017-05-10 01:43:56 UTC
Set to verified according to comment #24

Comment 29 errata-xmlrpc 2017-06-15 18:36:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1425


Note You need to log in before you can comment on or make changes to this bug.