Bug 1391803
| Summary: | failed to "StartContainer" for "kibana-proxy" with CrashLoopBackOff | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Xia Zhao <xiazhao> | ||||||||
| Component: | Logging | Assignee: | ewolinet | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Xia Zhao <xiazhao> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 3.4.0 | CC: | aos-bugs, ewolinet, jcantril, rmeggins, tdawson, xiazhao | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||||
| Doc Text: |
undefined
|
Story Points: | --- | ||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2017-01-18 12:49:23 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1373611, 1388031 | ||||||||||
| Attachments: |
|
||||||||||
|
Comment 1
Xia Zhao
2016-11-04 05:21:25 UTC
Created attachment 1217285 [details]
es_log
Created attachment 1217286 [details]
fluentd_log
Could you please run 'oadm diagostics AggregatedLogging' to provide additional information @Eric Yes, I noticed about the logs that Kibana is in green status, but the url can not be reached when I visit the route (I reproduced this on 3 different OCP envs): https://kibana.1104-xha.qe.rhcloud.com 503 Service Unavailable No server is available to handle this request. At the same time, checked the router pod in -n default looked fine on my env: router-1-xm9uw 1/1 Running 0 2h Bug title was updated reflecting the problems with kibana. The kibana-proxy container looks to be the cause of the CrashLoopBackOff, which would explain why there may not be a service available to back the route. I turned the proxy logging up to DEBUG and see the following: ... master-url: https://kubernetes.default.svc.cluster.local masterUrl: https://kubernetes.default.svc.cluster.local master-ca: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt masterCa: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt transform: user_header,token_header user-header: X-Proxy-Remote-User userHeader: X-Proxy-Remote-User $0: usr/lib/node_modules/openshift-auth-proxy/openshift-auth-proxy.js debugLog: function () { if (this.debug) console.log.apply(console, arguments) } sessionSecret: c0ujSWEWvIv313Gb0toRpyXQEbAbMxptRkmrdTlHxguWHaQCJzilJ6ilUGHpGnu9pR9JjsvaKHCdYJCaUb6jnBDFBYzFWpwHTsUyfry2hS4XNBtQI4DE0jw77n5xsZfKCEimOMSvKq9FbBTm2DQoDNSHwMYHrUuvbvDh0uGWRwiP4EW84Bl0Lc5soluCPE16JI4tfkmk error: short write I also see the 'error: short write' in the ES pod on this master. ... [2016-11-07 06:00:04,733][INFO ][cluster.routing.allocation] [Empathoid] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.operations.2016.11.07][0]] ...]). [2016-11-07 06:00:04,792][INFO ][cluster.metadata ] [Empathoid] [.operations.2016.11.07] update_mapping [com.redhat.viaq.common] [2016-11-07 06:00:20,418][INFO ][cluster.metadata ] [Empathoid] [.operations.2016.11.07] update_mapping [com.redhat.viaq.common] [2016-11-07 06:00:20,479][INFO ][cluster.metadata ] [Empathoid] [.operations.2016.11.07] update_mapping [com.redhat.viaq.common] [2016-11-07 06:00:20,528][INFO ][cluster.metadata ] [Empathoid] [.operations.2016.11.07] update_mapping [com.redhat.viaq.common] error: short write @Eric, Yes, and I've reproduced this issue on another OCP 3.4.0 $ oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-876u9 1/1 Running 0 7m logging-deployer-wyh4t 0/1 Completed 0 8m logging-es-b211vv47-1-3jz2x 1/1 Running 0 7m logging-fluentd-xpebx 1/1 Running 0 7m logging-kibana-1-wss7m 1/2 CrashLoopBackOff 6 7m $ oc describe po logging-kibana-1-wss7m ... failed to "StartContainer" for "kibana-proxy" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=kibana-proxy pod=logging-kibana-1-wss7m_xiazhao(16a7e6b3-a55c-11e6-83b9-fa163ede30cd)" (In reply to ewolinet from comment #13) > I also see the 'error: short write' in the ES pod on this master. > > ... > [2016-11-07 06:00:04,733][INFO ][cluster.routing.allocation] [Empathoid] > Cluster health status changed from [RED] to > [GREEN] (reason: [shards started [[.operations.2016.11.07][0]] ...]). > [2016-11-07 06:00:04,792][INFO ][cluster.metadata ] [Empathoid] > [.operations.2016.11.07] update_mapping > [com.redhat.viaq.common] > [2016-11-07 06:00:20,418][INFO ][cluster.metadata ] [Empathoid] > [.operations.2016.11.07] update_mapping > [com.redhat.viaq.common] > [2016-11-07 06:00:20,479][INFO ][cluster.metadata ] [Empathoid] > [.operations.2016.11.07] update_mapping > [com.redhat.viaq.common] > [2016-11-07 06:00:20,528][INFO ][cluster.metadata ] [Empathoid] > [.operations.2016.11.07] update_mapping > [com.redhat.viaq.common] > error: short write The openshift version on my OCP provided here is a little bit old, the short write issue has been verified with latest puddle which contains the bug fix:https://bugzilla.redhat.com/show_bug.cgi?id=1389617#c16 Xia, Can you update the value of "OAP_DEBUG" to "True" for dc/logging-kibana, redeploy it, and post the logs for the kibana-proxy container? Created attachment 1218756 [details]
kibana_proxy_container_log
kibana_proxy_container_log attached
Removed keyword TestBlocker after discussing with my manager. (In reply to Xia Zhao from comment #21) > Removed keyword TestBlocker after discussing with my manager. since it's not blocking our test any more , the previous blocked cases are unblocked now (with the workaround) Verified with image: registry.ops.openshift.com/openshift3/logging-auth-proxy:3.4.0, the kibana pods can start up and ready now, also kibana UI accesible, set to verified: # oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-kv9t5 1/1 Running 1 27m logging-curator-ops-1-xainz 1/1 Running 1 27m logging-deployer-g7bj5 0/1 Completed 0 29m logging-es-kj4q1dos-1-9k9uh 1/1 Running 0 27m logging-es-ops-vtxgj260-1-r53s9 1/1 Running 0 27m logging-kibana-1-1smkj 2/2 Running 0 27m logging-kibana-ops-1-054n3 2/2 Running 0 27m Images tested: openshift3/logging-deployer 08eaf2753130 openshift3/logging-auth-proxy ec334b0c2669 openshift3/logging-elasticsearch 9b9452c0f8c2 openshift3/logging-kibana 7fc9916eea4d openshift3/logging-curator 9af78fc06248 Prerelease issue, no docs needed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066 |