Bug 1530866 - [free-stg][free-int][online-int]project indices name miss in kibana
Summary: [free-stg][free-int][online-int]project indices name miss in kibana
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Logging
Version: 3.x
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.x
Assignee: Jeff Cantrill
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 1511432
Blocks: 1512495
TreeView+ depends on / blocked
 
Reported: 2018-01-04 04:38 UTC by Junqi Zhao
Modified: 2018-05-10 20:52 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1511432
Environment:
Last Closed: 2018-05-10 20:52:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
could view user project logs on kibana UI -- free-int (164.50 KB, image/png)
2018-01-31 08:38 UTC, Junqi Zhao
no flags Details
could not view user project logs on kibana UI -- online-int (122.91 KB, image/png)
2018-02-09 05:29 UTC, Junqi Zhao
no flags Details
es pod log --online-int cluster (19.47 KB, text/plain)
2018-02-09 05:30 UTC, Junqi Zhao
no flags Details
no project index -- pro-us-east-1 cluster (117.49 KB, image/png)
2018-02-11 00:52 UTC, Junqi Zhao
no flags Details
Kibana View of admin user in pro-us-east-1 cluster (233.43 KB, image/png)
2018-02-12 14:04 UTC, Jeff Cantrill
no flags Details
could view user project logs on kibana UI -- pro-us-east-1 (218.19 KB, image/png)
2018-02-13 11:19 UTC, Junqi Zhao
no flags Details
no project indices in kibana -- free-stg (78.98 KB, image/png)
2018-03-21 05:07 UTC, Junqi Zhao
no flags Details

Comment 1 Junqi Zhao 2018-01-31 08:37:53 UTC
Tested, could show project index and logs in kibana now
Please change to ON_QA

env:
OpenShift Master:v3.9.0-0.33.0
Kubernetes Master:v1.9.1+a0ce1bc657
OpenShift Web Console:v3.9.0-0.34.0

Comment 2 Junqi Zhao 2018-01-31 08:38:36 UTC
Created attachment 1388802 [details]
could view user project logs on kibana UI -- free-int

Comment 3 Junqi Zhao 2018-02-09 05:23:56 UTC
online-int,

Comment 4 Junqi Zhao 2018-02-09 05:29:16 UTC
online-int cluster,project indices name miss in kibana. see the picture
env:
OpenShift Master:v3.7.9 (online version 3.6.0.90)
Kubernetes Master:v1.7.6+a08f5eeb62

There is IOException: No space left on device in ES pod
java.io.IOException: No space left on device
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:326)
	at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
	at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
	at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
	at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
	at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
	at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)
	at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)
	at org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:369)
	at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
	at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
	at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
	at org.apache.log4j.Category.callAppenders(Category.java:206)
	at org.apache.log4j.Category.forcedLog(Category.java:391)
	at org.apache.log4j.Category.log(Category.java:856)
	at org.elasticsearch.common.logging.log4j.Log4jESLogger.internalWarn(Log4jESLogger.java:135)
	at org.elasticsearch.common.logging.support.AbstractESLogger.warn(AbstractESLogger.java:109)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.sendFailShard(IndicesClusterStateService.java:779)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.failAndRemoveShard(IndicesClusterStateService.java:773)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.handleRecoveryFailure(IndicesClusterStateService.java:740)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.access$300(IndicesClusterStateService.java:80)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService$2.onRecoveryFailed(IndicesClusterStateService.java:670)
	at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:179)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Comment 5 Junqi Zhao 2018-02-09 05:29:54 UTC
Created attachment 1393540 [details]
could not view user project logs on kibana UI -- online-int

Comment 6 Junqi Zhao 2018-02-09 05:30:48 UTC
Created attachment 1393541 [details]
es pod log --online-int cluster

Comment 7 Junqi Zhao 2018-02-11 00:51:06 UTC
pro-us-east-1 cluster, project indices are not generated, can not show project logs in kibana. See the attached picture

Comment 8 Junqi Zhao 2018-02-11 00:52:56 UTC
Created attachment 1394467 [details]
no project index -- pro-us-east-1 cluster

Comment 9 Junqi Zhao 2018-02-11 01:17:17 UTC
pro-us-east-1 env:
OpenShift Master: v3.7.9 (online version 3.6.0.87)
Kubernetes Master: v1.7.6+a08f5eeb62

logging vesion: v3.7.9

Comment 10 Jeff Cantrill 2018-02-12 14:04:36 UTC
Created attachment 1394933 [details]
Kibana View of admin user in pro-us-east-1 cluster

Comment 11 Jeff Cantrill 2018-02-12 14:07:21 UTC
Attachment [1] is of a working kibana UI in pro-us-east-1 from the perspective of a cluster-admin.  I am unable to test a non-admin user.  This cluster was modified on Feb 9 to expand the PVs for the ES nodes.  It additionally includes images that increase the ES request timeout.  Looking at the Kibana image it appears there are no logs available.

* Does the webconsole or 'oc logs' display logs that should have been collected?

[1] https://bugzilla.redhat.com/attachment.cgi?id=1394933

Comment 12 Junqi Zhao 2018-02-13 11:18:05 UTC
pro-us-east-1 cluster,logs could be shown on kibana now, see the attached picture.

Remove OnlinePro keyword, and [pro-us-east-1] from title
Env:
OpenShift Master:v3.7.9 (online version 3.6.0.87)
Kubernetes Master:v1.7.6+a08f5eeb62

Comment 13 Junqi Zhao 2018-02-13 11:19:17 UTC
Created attachment 1395275 [details]
could view user project logs on kibana UI -- pro-us-east-1

Comment 14 Junqi Zhao 2018-03-02 02:05:13 UTC
free-int cluster, project indices name miss in kibana, checked the es logs, found "ERR: Timed out while waiting for a green or yellow cluster state."


Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ...
ERR: Timed out while waiting for a green or yellow cluster state.
   * Try running sgadmin.sh with -icl and -nhnv (If thats works you need to check your clustername as well as hostnames in your SSL certificates)
13

Checked the es event
  Warning  Unhealthy              1h (x2 over 1h)  kubelet, ip-172-31-53-92.ec2.internal  Readiness probe failed: Elasticsearch node is not ready to accept HTTP requests yet [response code: 000]


OpenShift Master:v3.9.1 (online version 3.6.0.83) 
Kubernetes Master:v1.9.1+a0ce1bc657 
OpenShift Web Console:v3.9.1

Comment 15 Jeff Cantrill 2018-03-20 22:44:28 UTC
The pro-int cluster is still at v3.7.9 and suffers from a timeout that is not configurable.  Configuration has been resolved in the latest release of 3.7.  I can fix it up but why has the cluster not been upgraded? What do we need to do to make it happen.  It is otherwise fuctional from what I can tell.

Comment 16 Junqi Zhao 2018-03-21 05:07:00 UTC
free-stg cluster, can not show project indices in kibana, see the attached picture.

OpenShift Master: v3.9.12 (online version 3.6.0.78) 
Kubernetes Master:v1.9.1+a0ce1bc657 
OpenShift Web Console:v3.9.12

Comment 17 Junqi Zhao 2018-03-21 05:07:28 UTC
Created attachment 1410942 [details]
no project indices in kibana -- free-stg

Comment 18 Junqi Zhao 2018-03-21 05:23:57 UTC
free-int cluster, can not show project indices in kibana, checked es logs, es is in YELLOW status

2018-03-13 17:14:59 INFO  transport:99 - [Gargoyle] Using [com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] as transport, overridden by [search-guard-ssl]
Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ...
Clustername: logging-es
Clusterstate: YELLOW
Number of nodes: 3
Number of data nodes: 3




OpenShift Master: v3.9.7 (online version 3.6.0.83) 
Kubernetes Master: v1.9.1+a0ce1bc657 
OpenShift Web Console: v3.9.7

Comment 19 Jeff Cantrill 2018-03-21 14:08:04 UTC
free-stg is in green.  It looks like what happened is the deployment timed out before it was able to complete which resulted in:

* 2 of 3 ES pods being at the wrong version
* shard allocation disabled

Resolved by:

* rolling out dcs for the pods which were not at the desired version
* enabling shard allocation

Comment 20 Jeff Cantrill 2018-03-21 14:17:17 UTC
As of 3/21 @ 10:16 EST the cluster is green and kibana is functional

Comment 21 Junqi Zhao 2018-03-23 07:08:49 UTC
1) free-int cluster, es still in YELLOW status
2018-03-13 17:10:32 INFO  transport:99 - [Shathra] Using [com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] as transport, overridden by [search-guard-ssl]
Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ...
Clustername: logging-es
Clusterstate: YELLOW
Number of nodes: 3
Number of data nodes: 3

OpenShift Master: v3.9.7 (online version 3.6.0.83) 
Kubernetes Master: v1.9.1+a0ce1bc657 
OpenShift Web Console: v3.9.7


2) free-stg cluster, fluent pod is recreated continuously in node
ip-172-31-74-247.us-east-2.compute.internal

Non-terminated Pods:         (3 in total)
  Namespace                  Name                              CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                              ------------  ----------  ---------------  -------------
  logging                    logging-fluentd-x652g             100m (4%)     0 (0%)      512Mi (3%)       512Mi (3%)
  openshift-clam-server      oso-clam-server-77n5x             0 (0%)        0 (0%)      0 (0%)           0 (0%)
  openshift-devops-monitor   prometheus-node-exporter-b5ktw    100m (4%)     200m (8%)   30Mi (0%)        50Mi (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  200m (8%)     200m (8%)   542Mi (3%)       562Mi (4%)
Events:
  Type     Reason                Age                From                                                  Message
  ----     ------                ----               ----                                                  -------
  Warning  EvictionThresholdMet  1m (x854 over 3h)  kubelet, ip-172-31-74-247.us-east-2.compute.internal  Attempting to reclaim nodefs

logging version: v3.9.13

OpenShift Master: v3.9.14 (online version 3.6.0.78)
Kubernetes Master: v1.9.1+a0ce1bc657
OpenShift Web Console: v3.9.14

Comment 22 Junqi Zhao 2018-03-28 02:58:23 UTC
online-int cluster,project indices name miss in kibana.
[No space left on device] in one ES pod log

Exception in thread "elasticsearch[logging-es-data-master-v5qaelaq][refresh][T#1]" [.operations.2018.03.14][[.operations.2018.03.14][0]] RefreshFailedEngineException[Refresh failed]; nested: IOException[No space left on device];
	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:677)
	at org.elasticsearch.index.engine.InternalEngine$1.run(InternalEngine.java:481)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

env:
OpenShift Master:v3.9.14 (online version 3.6.0.90) 
Kubernetes Master:v1.9.1+a0ce1bc657 
OpenShift Web Console:v3.9.14 

Logging images version: v3.9.14

Comment 23 Jeff Cantrill 2018-05-10 20:52:05 UTC
Closing as this env is up and down based on CD.  Initial investigation shows it looks like there are bits (e.g. disk, node labels) that are out of sync which precludes logging from being functional.  I'm expecting this will be ironed out with prometheus alerts, etc.


Note You need to log in before you can comment on or make changes to this bug.