This can be a new subheading under Troubleshooting Kibana
@ewolinet We usually use parameters kibana-hostname, kibana-ops-hostname to deploy logging stacks: kibana-hostname=kibana.$SUBDOMAIN kibana-ops-hostname=kibana-ops.$SUBDOMAIN the $SUBDOMAIN can be get via: grep subdomain /etc/origin/master/master-config.yaml If customer use F-5 load banlance, subdomain value is empty in master-config.yaml, so how can we deploy logging with kibana-hostname,kibana-ops-hostname in this case.
Verified again on OCP 3.4 with logging 3.3.0, Kibana UI can not be opened first, after doing the workaround, Kibana UI can be accessed, and logs can be shown. From https://access.redhat.com/solutions/2774691, there is additonal info: "Error: UnknownHostException[No trusted proxies]", I did not see this error in my error trace, but Kibana UI can not be accessed, after doing the workaround according to https://docs.openshift.com/container-platform/3.3/install_config/aggregate_logging.html#troubleshooting-kibana, Kibana UI can be accessed. Error trace is under bellow: Kibana: Unknown error while connecting to Elasticsearch Error: Unknown error while connecting to Elasticsearch ErrorAbstract@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:85025:19 StatusCodeError@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:85174:5 respond@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:86367:15 checkRespForFailure@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:86335:7 [24]</AngularConnector.prototype.request/<@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:84973:7 qFactory/defer/deferred.promise.then/wrappedErrback@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:20902:31 qFactory/defer/deferred.promise.then/wrappedErrback@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:20902:31 qFactory/defer/deferred.promise.then/wrappedErrback@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:20902:31 qFactory/createInternalRejectedPromise/<.then/<@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:21035:29 $RootScopeProvider/this.$get</Scope.prototype.$eval@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:22022:16 $RootScopeProvider/this.$get</Scope.prototype.$digest@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:21834:15 $RootScopeProvider/this.$get</Scope.prototype.$apply@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:22126:13 done@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:17661:34 completeRequest@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:17875:7 createHttpBackend/</xhr.onreadystatechange@https://kibana.1215-x1e.qe.rhcloud.com/index.js?_b=7668:17814:11 Image ID: openshift3/logging-kibana f049d51b3f87 openshift3/logging-fluentd d8aa2d33d02c openshift3/logging-elasticsearch 10e45bf2a923 openshift3/logging-auth-proxy b7dfb8ce3e4a openshift3/logging-curator 39ef2f42b595 F5 load balancer version 12.1.1 Set this issue to VERIFIED and close it.
It's been brought to my attention that this is not quite clear: >Scale down all Fluentd pods. and >Scale up all Fluentd pods. As described in the docs are rather ambiguous given that fluentd is now deployed in a daemonset. This should be as easy as noting that "scaling down" means: $ oc delete daemonset logging-fluentd and "scaling up" means: $ oc new-app logging-fluentd-template These steps are elsewhere on the docs page but it would be safest to add them in this section as well.
Is that the approach we want to take to scale daemonset pods, or would we rather label and unlabel nodes? I agree that this probably is too vague, but we should agree on a method for unscheduling the pods. I know the perf team has seen issues where we try to deploy 200+ daemonset pods at once, and deleting the daemonset definition and recreating it may run into that issue. Their performance docs make note of this and suggest that customers label nodes in batches to avoid running into that. https://docs.openshift.org/latest/install_config/aggregate_logging_sizing.html#install-config-aggregate-logging-sizing-guidelines-large-cluster-installation
I would defer to you on this generally, although an easy way to batch label nodes would be a nice feature to make this less of a headache/less prone to mistakes. I'm thinking in a cluster with 200 nodes, you'd obviously not want to do this manually. I suppose a script or an ansible playbook using "batched" hosts files could work, but it'd be ideal if we had a suggestion to give. Either way, if we go the labeling nodes route we should probably specify that in this section either way :)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1235