Bug 1945168

Summary: Elasticsearch hostname is not getting resolved by fluend and kibana when internal domain is used
Product: OpenShift Container Platform Reporter: Dhruv Gautam <dgautam>
Component: LoggingAssignee: Periklis Tsirakidis <periklis>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: medium    
Version: 4.6CC: aos-bugs, ewolinet, ocasalsa, periklis
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: logging-exploration
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The domain for accessing elasticsearch was hardcoded using `.cluster.local` in EO and CLO. Consequence: Cluster with custom DNS configurations could not resolve the domain appropriately and in turn Elasticsearch remained inaccessible for Fluentd and Kibana. Fix: Domain configuration for Elasticsearch is using now `.svc` as per convention and local DNS can resolve this appropriately. Result: Elastiscearch instance is accessible with and without custom DNS configuration now.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-12 12:16:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dhruv Gautam 2021-03-31 12:56:02 UTC
Description of problem:
After deploying logging stack, fluentd and kibana are not able to resolve elasticsearch hostname

Fluentd logs :
2021-03-28T04:25:16.095607233Z 2021-03-28 04:25:16 +0000 [warn]: [default] failed to flush the buffer. retry_time=3 next_retry_seconds=2021-03-28 04:25:20 +0000 chunk="5be8a1fd530465ad135bed371a2c729a" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\"}): Couldn't resolve host name"

Kibana logs:
2021-03-28T04:24:50.963666757Z {"type":"log","@timestamp":"2021-03-28T04:24:50Z","tags":["warning","elasticsearch","admin"],"pid":117,"message":"No living connections"}
2021-03-28T04:24:53.469808289Z {"type":"log","@timestamp":"2021-03-28T04:24:53Z","tags":["warning","elasticsearch","admin"],"pid":117,"message":"Unable to revive connection: https://elasticsearch.openshift-logging.svc.cluster.local:9200/"}


Version-Release number of selected component (if applicable):
4.7

How reproducible:
100%

Steps to Reproduce:
1. Use internal domain for RHOCP 4.7
2. Install Red Hat OpenShift Logging
3. Check fluentd and kibana pod logs

Actual results:
Fluentd and kibana are not able to resolve elasticsearch's hostname

Expected results:
Fluentd and kibana should be able to resolve elasticsearch hostname

Additional info:

Comment 2 Dhruv Gautam 2021-04-08 15:02:52 UTC
Hi Team

Can you share the steps to workaround the issue ?
I will get it updated in the KCS solution.

Comment 6 Anping Li 2021-05-06 08:46:10 UTC
Change the clusterdomain to dev.test in two steps 1) update clusterDomain in nodes /etc/kubernetes/kubelet.conf 2) update CoreDNS configmap to serve the new clusterDomain.
Verified on clusterlogging.4.6.0-202104300142.p0,elasticsearch-operator.4.6.0-202104302046.p0.

Comment 8 errata-xmlrpc 2021-05-12 12:16:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.28 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1489