1970942 – Unable to access https://elasticsearch.openshift-logging.svc:9200 service, seems elasticsearch-proxy are preventing access to elasticsearch

Bug 1970942 - Unable to access https://elasticsearch.openshift-logging.svc:9200 service, seems elasticsearch-proxy are preventing access to elasticsearch

Summary: Unable to access https://elasticsearch.openshift-logging.svc:9200 service, se...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.6
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.6.z
Assignee:	Periklis Tsirakidis
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:	logging-exploration
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-11 14:17 UTC by Simon Reber
Modified:	2021-09-29 05:41 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-09-21 06:58:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Simon Reber 2021-06-11 14:17:26 UTC

Description of problem:

OpenShift Container Platform 4.6.26 with clusterlogging.4.6.0-202106010807.p0.git.e091c13 and elasticsearch-operator.4.6.0-202106010807.p0.git.c07c7ab fails to allow connection via https://elasticsearch.openshift-logging.svc:9200 service. Meaning fluentd is unable to push logs to elasticsearch and kibana is unable to access logs.

Running commands directly on `elasticsearch` is working fine and the `elasticsearch` Cluster also seems healthy. When running a curl from `elasticsearch-proxy` to https://localhost:9200 it also works. But everything going through https://elasticsearch.openshift-logging.svc:9200 is stuck and either hanging or throwing an error.

Version-Release number of selected component (if applicable):

 - clusterlogging.4.6.0-202106010807.p0.git.e091c13
 - elasticsearch-operator.4.6.0-202106010807.p0.git.c07c7ab

How reproducible:

 - N/A

Steps to Reproduce:
1. N/A

Actual results:

Logs from `kibana` are showing the below error messages.

["status","plugin:elasticsearch.1","error"],"pid":117,"state":"red","message":"Status changed from green to red - Request Timeout after 3000ms","prevState":"green","prevMsg":"Ready"}
Elasticsearch ERROR: 2021-06-10T09:08:49Z
  Error: Request error, retrying
  GET https://elasticsearch.openshift-logging.svc:9200/_opendistro/_security/tenantinfo => socket hang up
      at Log.error (/opt/app-root/src/node_modules/elasticsearch/src/lib/log.js:226:56)
      at checkRespForFailure (/opt/app-root/src/node_modules/elasticsearch/src/lib/transport.js:259:18)
      at HttpConnector.<anonymous> (/opt/app-root/src/node_modules/elasticsearch/src/lib/connectors/http.js:164:7)
      at ClientRequest.wrapper (/opt/app-root/src/node_modules/elasticsearch/node_modules/lodash/lodash.js:4935:19)
      at ClientRequest.emit (events.js:198:13)
      at TLSSocket.socketCloseListener (_http_client.js:373:11)
      at TLSSocket.emit (events.js:203:15)
      at _handle.close (net.js:607:12)
      at TCP.done (_tls_wrap.js:400:7)

{"type":"log","@timestamp":"2021-06-10T09:09:31Z","tags":["status","plugin:elasticsearch.1","info"],"pid":117,"state":"green","message":"Status changed from red to green - Ready","prevState":"red","prevMsg":"Request Timeout after 3000ms"}
Elasticsearch ERROR: 2021-06-10T09:09:42Z
  Error: Request error, retrying
  GET https://elasticsearch.openshift-logging.svc:9200/_opendistro/_security/authinfo => socket hang up
      at Log.error (/opt/app-root/src/node_modules/elasticsearch/src/lib/log.js:226:56)
      at checkRespForFailure (/opt/app-root/src/node_modules/elasticsearch/src/lib/transport.js:259:18)
      at HttpConnector.<anonymous> (/opt/app-root/src/node_modules/elasticsearch/src/lib/connectors/http.js:164:7)
      at ClientRequest.wrapper (/opt/app-root/src/node_modules/elasticsearch/node_modules/lodash/lodash.js:4935:19)
      at ClientRequest.emit (events.js:198:13)
      at TLSSocket.socketCloseListener (_http_client.js:373:11)
      at TLSSocket.emit (events.js:203:15)
      at _handle.close (net.js:607:12)
      at TCP.done (_tls_wrap.js:400:7)

{"type":"log","@timestamp":"2021-06-10T09:09:52Z","tags":["status","plugin:elasticsearch.1","error"],"pid":117,"state":"red","message":"Status changed from green to red - Request Timeout after 3000ms","prevState":"green","prevMsg":"Ready"}


Logs from `elasticsearch-proxy` are reporting the below problem.

time="2021-06-11T12:07:49Z" level=info msg="mapping path \"/\" => upstream \"https://localhost:9200/\""
2021/06/11 12:07:54 http: proxy error: dial tcp [::1]:9200: connect: connection refused
2021/06/11 12:08:13 http: proxy error: dial tcp [::1]:9200: connect: connection refused
2021/06/11 12:08:23 http: proxy error: dial tcp [::1]:9200: connect: connection refused
2021/06/11 12:09:23 http: proxy error: context canceled
2021/06/11 12:12:53 http: proxy error: context canceled
2021/06/11 12:13:03 http: proxy error: context canceled
2021/06/11 12:13:23 http: proxy error: context canceled
2021/06/11 12:13:33 http: proxy error: context canceled
2021/06/11 12:13:34 http: proxy error: read tcp 10.94.17.22:60000->10.94.59.4:58224: i/o timeout
2021/06/11 12:13:43 http: proxy error: read tcp 10.94.17.22:60000->10.94.83.9:41504: i/o timeout
2021/06/11 12:13:43 http: proxy error: context canceled
2021/06/11 12:16:33 http: proxy error: context canceled
time="2021-06-11T12:16:36Z" level=info msg="Error processing request in handler authorization: Unable to determine username"
2021/06/11 12:16:53 http: proxy error: context canceled
2021/06/11 12:17:03 http: proxy error: context canceled
2021/06/11 12:17:24 http: proxy error: context canceled
2021/06/11 12:17:33 http: proxy error: context canceled
2021/06/11 12:17:53 http: proxy error: context canceled
2021/06/11 12:18:03 http: proxy error: context canceled
2021/06/11 12:18:23 http: proxy error: context canceled
2021/06/11 12:18:33 http: proxy error: context canceled
2021/06/11 12:18:53 http: proxy error: context canceled
2021/06/11 12:19:03 http: proxy error: context canceled
2021/06/11 12:19:23 http: proxy error: context canceled
2021/06/11 12:19:33 http: proxy error: context canceled
2021/06/11 12:19:36 http: proxy error: context canceled
2021/06/11 12:19:53 http: proxy error: context canceled
2021/06/11 12:20:03 http: proxy error: context canceled
2021/06/11 12:20:23 http: proxy error: context canceled

Expected results:

No error being reported and access to elasticsearch and thus overall OpenShift Container Platform 4 - Logging to work as expected.

Additional info:

Comment 7 Periklis Tsirakidis 2021-07-02 06:56:32 UTC

Putting this back to medium Severity/Priority as the mentioned clusters recovered. Further investigation ongoing by support.

Comment 8 Periklis Tsirakidis 2021-09-06 04:47:15 UTC

@sreber 

Do you have any outcome of the header investigation on the customer cluster?

Comment 10 Periklis Tsirakidis 2021-09-21 06:58:29 UTC

Note You need to log in before you can comment on or make changes to this bug.