This issue blocks the logging tests related with kibana.
Created attachment 1217285 [details] es_log
Created attachment 1217286 [details] fluentd_log
Could you please run 'oadm diagostics AggregatedLogging' to provide additional information
@Eric Yes, I noticed about the logs that Kibana is in green status, but the url can not be reached when I visit the route (I reproduced this on 3 different OCP envs): https://kibana.1104-xha.qe.rhcloud.com 503 Service Unavailable No server is available to handle this request. At the same time, checked the router pod in -n default looked fine on my env: router-1-xm9uw 1/1 Running 0 2h Bug title was updated reflecting the problems with kibana.
The kibana-proxy container looks to be the cause of the CrashLoopBackOff, which would explain why there may not be a service available to back the route. I turned the proxy logging up to DEBUG and see the following: ... master-url: https://kubernetes.default.svc.cluster.local masterUrl: https://kubernetes.default.svc.cluster.local master-ca: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt masterCa: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt transform: user_header,token_header user-header: X-Proxy-Remote-User userHeader: X-Proxy-Remote-User $0: usr/lib/node_modules/openshift-auth-proxy/openshift-auth-proxy.js debugLog: function () { if (this.debug) console.log.apply(console, arguments) } sessionSecret: c0ujSWEWvIv313Gb0toRpyXQEbAbMxptRkmrdTlHxguWHaQCJzilJ6ilUGHpGnu9pR9JjsvaKHCdYJCaUb6jnBDFBYzFWpwHTsUyfry2hS4XNBtQI4DE0jw77n5xsZfKCEimOMSvKq9FbBTm2DQoDNSHwMYHrUuvbvDh0uGWRwiP4EW84Bl0Lc5soluCPE16JI4tfkmk error: short write
I also see the 'error: short write' in the ES pod on this master. ... [2016-11-07 06:00:04,733][INFO ][cluster.routing.allocation] [Empathoid] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.operations.2016.11.07][0]] ...]). [2016-11-07 06:00:04,792][INFO ][cluster.metadata ] [Empathoid] [.operations.2016.11.07] update_mapping [com.redhat.viaq.common] [2016-11-07 06:00:20,418][INFO ][cluster.metadata ] [Empathoid] [.operations.2016.11.07] update_mapping [com.redhat.viaq.common] [2016-11-07 06:00:20,479][INFO ][cluster.metadata ] [Empathoid] [.operations.2016.11.07] update_mapping [com.redhat.viaq.common] [2016-11-07 06:00:20,528][INFO ][cluster.metadata ] [Empathoid] [.operations.2016.11.07] update_mapping [com.redhat.viaq.common] error: short write
@Eric, Yes, and I've reproduced this issue on another OCP 3.4.0 $ oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-876u9 1/1 Running 0 7m logging-deployer-wyh4t 0/1 Completed 0 8m logging-es-b211vv47-1-3jz2x 1/1 Running 0 7m logging-fluentd-xpebx 1/1 Running 0 7m logging-kibana-1-wss7m 1/2 CrashLoopBackOff 6 7m $ oc describe po logging-kibana-1-wss7m ... failed to "StartContainer" for "kibana-proxy" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=kibana-proxy pod=logging-kibana-1-wss7m_xiazhao(16a7e6b3-a55c-11e6-83b9-fa163ede30cd)"
(In reply to ewolinet from comment #13) > I also see the 'error: short write' in the ES pod on this master. > > ... > [2016-11-07 06:00:04,733][INFO ][cluster.routing.allocation] [Empathoid] > Cluster health status changed from [RED] to > [GREEN] (reason: [shards started [[.operations.2016.11.07][0]] ...]). > [2016-11-07 06:00:04,792][INFO ][cluster.metadata ] [Empathoid] > [.operations.2016.11.07] update_mapping > [com.redhat.viaq.common] > [2016-11-07 06:00:20,418][INFO ][cluster.metadata ] [Empathoid] > [.operations.2016.11.07] update_mapping > [com.redhat.viaq.common] > [2016-11-07 06:00:20,479][INFO ][cluster.metadata ] [Empathoid] > [.operations.2016.11.07] update_mapping > [com.redhat.viaq.common] > [2016-11-07 06:00:20,528][INFO ][cluster.metadata ] [Empathoid] > [.operations.2016.11.07] update_mapping > [com.redhat.viaq.common] > error: short write The openshift version on my OCP provided here is a little bit old, the short write issue has been verified with latest puddle which contains the bug fix:https://bugzilla.redhat.com/show_bug.cgi?id=1389617#c16
Xia, Can you update the value of "OAP_DEBUG" to "True" for dc/logging-kibana, redeploy it, and post the logs for the kibana-proxy container?
Created attachment 1218756 [details] kibana_proxy_container_log kibana_proxy_container_log attached
Removed keyword TestBlocker after discussing with my manager.
(In reply to Xia Zhao from comment #21) > Removed keyword TestBlocker after discussing with my manager. since it's not blocking our test any more , the previous blocked cases are unblocked now (with the workaround)
Verified with image: registry.ops.openshift.com/openshift3/logging-auth-proxy:3.4.0, the kibana pods can start up and ready now, also kibana UI accesible, set to verified: # oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-kv9t5 1/1 Running 1 27m logging-curator-ops-1-xainz 1/1 Running 1 27m logging-deployer-g7bj5 0/1 Completed 0 29m logging-es-kj4q1dos-1-9k9uh 1/1 Running 0 27m logging-es-ops-vtxgj260-1-r53s9 1/1 Running 0 27m logging-kibana-1-1smkj 2/2 Running 0 27m logging-kibana-ops-1-054n3 2/2 Running 0 27m Images tested: openshift3/logging-deployer 08eaf2753130 openshift3/logging-auth-proxy ec334b0c2669 openshift3/logging-elasticsearch 9b9452c0f8c2 openshift3/logging-kibana 7fc9916eea4d openshift3/logging-curator 9af78fc06248
Prerelease issue, no docs needed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066