Bug 1391803 - failed to "StartContainer" for "kibana-proxy" with CrashLoopBackOff
Summary: failed to "StartContainer" for "kibana-proxy" with CrashLoopBackOff
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: ewolinet
QA Contact: Xia Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1373611 1388031
TreeView+ depends on / blocked
 
Reported: 2016-11-04 05:19 UTC by Xia Zhao
Modified: 2017-03-08 18:43 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-01-18 12:49:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
es_log (19.41 KB, text/plain)
2016-11-04 05:48 UTC, Xia Zhao
no flags Details
fluentd_log (21.86 KB, text/plain)
2016-11-04 05:49 UTC, Xia Zhao
no flags Details
kibana_proxy_container_log (11.47 KB, text/plain)
2016-11-09 02:04 UTC, Xia Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0066 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 17:23:26 UTC

Comment 1 Xia Zhao 2016-11-04 05:21:25 UTC
This issue blocks the logging tests related with kibana.

Comment 2 Xia Zhao 2016-11-04 05:48:44 UTC
Created attachment 1217285 [details]
es_log

Comment 3 Xia Zhao 2016-11-04 05:49:07 UTC
Created attachment 1217286 [details]
fluentd_log

Comment 5 Jeff Cantrill 2016-11-04 12:51:08 UTC
Could you please run 'oadm diagostics AggregatedLogging' to provide additional information

Comment 9 Xia Zhao 2016-11-07 05:28:36 UTC
@Eric

Yes, I noticed about the logs that Kibana is in green status, but the url can not be reached when I visit the route (I reproduced this on 3 different OCP envs):

https://kibana.1104-xha.qe.rhcloud.com

503 Service Unavailable

No server is available to handle this request.

At the same time, checked the router pod in -n default looked fine on my env:
router-1-xm9uw             1/1       Running   0          2h

Bug title was updated reflecting the problems with kibana.

Comment 12 ewolinet 2016-11-07 15:41:07 UTC
The kibana-proxy container looks to be the cause of the CrashLoopBackOff, which would explain why there may not be a service available to back the route.

I turned the proxy logging up to DEBUG and see the following:

...
master-url: https://kubernetes.default.svc.cluster.local
masterUrl: https://kubernetes.default.svc.cluster.local
master-ca: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
masterCa: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
transform: user_header,token_header
user-header: X-Proxy-Remote-User
userHeader: X-Proxy-Remote-User
$0: usr/lib/node_modules/openshift-auth-proxy/openshift-auth-proxy.js
debugLog: function () {
  if (this.debug) console.log.apply(console, arguments)
}
sessionSecret:
c0ujSWEWvIv313Gb0toRpyXQEbAbMxptRkmrdTlHxguWHaQCJzilJ6ilUGHpGnu9pR9JjsvaKHCdYJCaUb6jnBDFBYzFWpwHTsUyfry2hS4XNBtQI4DE0jw77n5xsZfKCEimOMSvKq9FbBTm2DQoDNSHwMYHrUuvbvDh0uGWRwiP4EW84Bl0Lc5soluCPE16JI4tfkmk
error: short write

Comment 13 ewolinet 2016-11-07 15:51:29 UTC
I also see the 'error: short write' in the ES pod on this master.

...
[2016-11-07 06:00:04,733][INFO ][cluster.routing.allocation] [Empathoid] Cluster health status changed from [RED] to
[GREEN] (reason: [shards started [[.operations.2016.11.07][0]] ...]).
[2016-11-07 06:00:04,792][INFO ][cluster.metadata         ] [Empathoid] [.operations.2016.11.07] update_mapping
[com.redhat.viaq.common]
[2016-11-07 06:00:20,418][INFO ][cluster.metadata         ] [Empathoid] [.operations.2016.11.07] update_mapping
[com.redhat.viaq.common]
[2016-11-07 06:00:20,479][INFO ][cluster.metadata         ] [Empathoid] [.operations.2016.11.07] update_mapping
[com.redhat.viaq.common]
[2016-11-07 06:00:20,528][INFO ][cluster.metadata         ] [Empathoid] [.operations.2016.11.07] update_mapping
[com.redhat.viaq.common]
error: short write

Comment 14 Xia Zhao 2016-11-08 02:44:26 UTC
@Eric,

Yes, and I've reproduced this issue on another OCP 3.4.0

$ oc get po
NAME                          READY     STATUS             RESTARTS   AGE
logging-curator-1-876u9       1/1       Running            0          7m
logging-deployer-wyh4t        0/1       Completed          0          8m
logging-es-b211vv47-1-3jz2x   1/1       Running            0          7m
logging-fluentd-xpebx         1/1       Running            0          7m
logging-kibana-1-wss7m        1/2       CrashLoopBackOff   6          7m

$ oc describe po logging-kibana-1-wss7m
...
failed to "StartContainer" for "kibana-proxy" with CrashLoopBackOff: "Back-off 2m40s restarting failed
container=kibana-proxy pod=logging-kibana-1-wss7m_xiazhao(16a7e6b3-a55c-11e6-83b9-fa163ede30cd)"

Comment 15 Xia Zhao 2016-11-08 07:25:58 UTC
(In reply to ewolinet from comment #13)
> I also see the 'error: short write' in the ES pod on this master.
> 
> ...
> [2016-11-07 06:00:04,733][INFO ][cluster.routing.allocation] [Empathoid]
> Cluster health status changed from [RED] to
> [GREEN] (reason: [shards started [[.operations.2016.11.07][0]] ...]).
> [2016-11-07 06:00:04,792][INFO ][cluster.metadata         ] [Empathoid]
> [.operations.2016.11.07] update_mapping
> [com.redhat.viaq.common]
> [2016-11-07 06:00:20,418][INFO ][cluster.metadata         ] [Empathoid]
> [.operations.2016.11.07] update_mapping
> [com.redhat.viaq.common]
> [2016-11-07 06:00:20,479][INFO ][cluster.metadata         ] [Empathoid]
> [.operations.2016.11.07] update_mapping
> [com.redhat.viaq.common]
> [2016-11-07 06:00:20,528][INFO ][cluster.metadata         ] [Empathoid]
> [.operations.2016.11.07] update_mapping
> [com.redhat.viaq.common]
> error: short write

The openshift version on my OCP provided here is a little bit old, the short write issue has been verified with latest puddle which contains the bug fix:https://bugzilla.redhat.com/show_bug.cgi?id=1389617#c16

Comment 16 ewolinet 2016-11-08 16:34:02 UTC
Xia,

Can you update the value of "OAP_DEBUG" to "True" for dc/logging-kibana, redeploy it, and post the logs for the kibana-proxy container?

Comment 17 Xia Zhao 2016-11-09 02:04:44 UTC
Created attachment 1218756 [details]
kibana_proxy_container_log

kibana_proxy_container_log attached

Comment 21 Xia Zhao 2016-11-10 06:03:42 UTC
Removed keyword TestBlocker after discussing with my manager.

Comment 22 Xia Zhao 2016-11-10 06:24:24 UTC
(In reply to Xia Zhao from comment #21)
> Removed keyword TestBlocker after discussing with my manager.

since it's not blocking our test any more , the previous blocked cases are unblocked now (with the workaround)

Comment 26 Xia Zhao 2016-11-14 06:40:51 UTC
Verified with image: registry.ops.openshift.com/openshift3/logging-auth-proxy:3.4.0, the kibana pods can start up and ready now, also kibana UI accesible, set to verified:

# oc get po
NAME                              READY     STATUS      RESTARTS   AGE
logging-curator-1-kv9t5           1/1       Running     1          27m
logging-curator-ops-1-xainz       1/1       Running     1          27m
logging-deployer-g7bj5            0/1       Completed   0          29m
logging-es-kj4q1dos-1-9k9uh       1/1       Running     0          27m
logging-es-ops-vtxgj260-1-r53s9   1/1       Running     0          27m
logging-kibana-1-1smkj            2/2       Running     0          27m
logging-kibana-ops-1-054n3        2/2       Running     0          27m

Images tested:
openshift3/logging-deployer    08eaf2753130
openshift3/logging-auth-proxy    ec334b0c2669
openshift3/logging-elasticsearch    9b9452c0f8c2
openshift3/logging-kibana    7fc9916eea4d
openshift3/logging-curator    9af78fc06248

Comment 34 ewolinet 2016-12-12 15:45:40 UTC
Prerelease issue, no docs needed.

Comment 36 errata-xmlrpc 2017-01-18 12:49:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066


Note You need to log in before you can comment on or make changes to this bug.