Tested, could show project index and logs in kibana now Please change to ON_QA env: OpenShift Master:v3.9.0-0.33.0 Kubernetes Master:v1.9.1+a0ce1bc657 OpenShift Web Console:v3.9.0-0.34.0
Created attachment 1388802 [details] could view user project logs on kibana UI -- free-int
online-int,
online-int cluster,project indices name miss in kibana. see the picture env: OpenShift Master:v3.7.9 (online version 3.6.0.90) Kubernetes Master:v1.7.6+a08f5eeb62 There is IOException: No space left on device in ES pod java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324) at org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:369) at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.elasticsearch.common.logging.log4j.Log4jESLogger.internalWarn(Log4jESLogger.java:135) at org.elasticsearch.common.logging.support.AbstractESLogger.warn(AbstractESLogger.java:109) at org.elasticsearch.indices.cluster.IndicesClusterStateService.sendFailShard(IndicesClusterStateService.java:779) at org.elasticsearch.indices.cluster.IndicesClusterStateService.failAndRemoveShard(IndicesClusterStateService.java:773) at org.elasticsearch.indices.cluster.IndicesClusterStateService.handleRecoveryFailure(IndicesClusterStateService.java:740) at org.elasticsearch.indices.cluster.IndicesClusterStateService.access$300(IndicesClusterStateService.java:80) at org.elasticsearch.indices.cluster.IndicesClusterStateService$2.onRecoveryFailed(IndicesClusterStateService.java:670) at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:179) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Created attachment 1393540 [details] could not view user project logs on kibana UI -- online-int
Created attachment 1393541 [details] es pod log --online-int cluster
pro-us-east-1 cluster, project indices are not generated, can not show project logs in kibana. See the attached picture
Created attachment 1394467 [details] no project index -- pro-us-east-1 cluster
pro-us-east-1 env: OpenShift Master: v3.7.9 (online version 3.6.0.87) Kubernetes Master: v1.7.6+a08f5eeb62 logging vesion: v3.7.9
Created attachment 1394933 [details] Kibana View of admin user in pro-us-east-1 cluster
Attachment [1] is of a working kibana UI in pro-us-east-1 from the perspective of a cluster-admin. I am unable to test a non-admin user. This cluster was modified on Feb 9 to expand the PVs for the ES nodes. It additionally includes images that increase the ES request timeout. Looking at the Kibana image it appears there are no logs available. * Does the webconsole or 'oc logs' display logs that should have been collected? [1] https://bugzilla.redhat.com/attachment.cgi?id=1394933
pro-us-east-1 cluster,logs could be shown on kibana now, see the attached picture. Remove OnlinePro keyword, and [pro-us-east-1] from title Env: OpenShift Master:v3.7.9 (online version 3.6.0.87) Kubernetes Master:v1.7.6+a08f5eeb62
Created attachment 1395275 [details] could view user project logs on kibana UI -- pro-us-east-1
free-int cluster, project indices name miss in kibana, checked the es logs, found "ERR: Timed out while waiting for a green or yellow cluster state." Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ... ERR: Timed out while waiting for a green or yellow cluster state. * Try running sgadmin.sh with -icl and -nhnv (If thats works you need to check your clustername as well as hostnames in your SSL certificates) 13 Checked the es event Warning Unhealthy 1h (x2 over 1h) kubelet, ip-172-31-53-92.ec2.internal Readiness probe failed: Elasticsearch node is not ready to accept HTTP requests yet [response code: 000] OpenShift Master:v3.9.1 (online version 3.6.0.83) Kubernetes Master:v1.9.1+a0ce1bc657 OpenShift Web Console:v3.9.1
The pro-int cluster is still at v3.7.9 and suffers from a timeout that is not configurable. Configuration has been resolved in the latest release of 3.7. I can fix it up but why has the cluster not been upgraded? What do we need to do to make it happen. It is otherwise fuctional from what I can tell.
free-stg cluster, can not show project indices in kibana, see the attached picture. OpenShift Master: v3.9.12 (online version 3.6.0.78) Kubernetes Master:v1.9.1+a0ce1bc657 OpenShift Web Console:v3.9.12
Created attachment 1410942 [details] no project indices in kibana -- free-stg
free-int cluster, can not show project indices in kibana, checked es logs, es is in YELLOW status 2018-03-13 17:14:59 INFO transport:99 - [Gargoyle] Using [com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] as transport, overridden by [search-guard-ssl] Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ... Clustername: logging-es Clusterstate: YELLOW Number of nodes: 3 Number of data nodes: 3 OpenShift Master: v3.9.7 (online version 3.6.0.83) Kubernetes Master: v1.9.1+a0ce1bc657 OpenShift Web Console: v3.9.7
free-stg is in green. It looks like what happened is the deployment timed out before it was able to complete which resulted in: * 2 of 3 ES pods being at the wrong version * shard allocation disabled Resolved by: * rolling out dcs for the pods which were not at the desired version * enabling shard allocation
As of 3/21 @ 10:16 EST the cluster is green and kibana is functional
1) free-int cluster, es still in YELLOW status 2018-03-13 17:10:32 INFO transport:99 - [Shathra] Using [com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] as transport, overridden by [search-guard-ssl] Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ... Clustername: logging-es Clusterstate: YELLOW Number of nodes: 3 Number of data nodes: 3 OpenShift Master: v3.9.7 (online version 3.6.0.83) Kubernetes Master: v1.9.1+a0ce1bc657 OpenShift Web Console: v3.9.7 2) free-stg cluster, fluent pod is recreated continuously in node ip-172-31-74-247.us-east-2.compute.internal Non-terminated Pods: (3 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- logging logging-fluentd-x652g 100m (4%) 0 (0%) 512Mi (3%) 512Mi (3%) openshift-clam-server oso-clam-server-77n5x 0 (0%) 0 (0%) 0 (0%) 0 (0%) openshift-devops-monitor prometheus-node-exporter-b5ktw 100m (4%) 200m (8%) 30Mi (0%) 50Mi (0%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 200m (8%) 200m (8%) 542Mi (3%) 562Mi (4%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning EvictionThresholdMet 1m (x854 over 3h) kubelet, ip-172-31-74-247.us-east-2.compute.internal Attempting to reclaim nodefs logging version: v3.9.13 OpenShift Master: v3.9.14 (online version 3.6.0.78) Kubernetes Master: v1.9.1+a0ce1bc657 OpenShift Web Console: v3.9.14
online-int cluster,project indices name miss in kibana. [No space left on device] in one ES pod log Exception in thread "elasticsearch[logging-es-data-master-v5qaelaq][refresh][T#1]" [.operations.2018.03.14][[.operations.2018.03.14][0]] RefreshFailedEngineException[Refresh failed]; nested: IOException[No space left on device]; at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:677) at org.elasticsearch.index.engine.InternalEngine$1.run(InternalEngine.java:481) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) env: OpenShift Master:v3.9.14 (online version 3.6.0.90) Kubernetes Master:v1.9.1+a0ce1bc657 OpenShift Web Console:v3.9.14 Logging images version: v3.9.14
Closing as this env is up and down based on CD. Initial investigation shows it looks like there are bits (e.g. disk, node labels) that are out of sync which precludes logging from being functional. I'm expecting this will be ironed out with prometheus alerts, etc.