Description of problem: OpenShift Container Platform 4.6.26 with clusterlogging.4.6.0-202106010807.p0.git.e091c13 and elasticsearch-operator.4.6.0-202106010807.p0.git.c07c7ab fails to allow connection via https://elasticsearch.openshift-logging.svc:9200 service. Meaning fluentd is unable to push logs to elasticsearch and kibana is unable to access logs. Running commands directly on `elasticsearch` is working fine and the `elasticsearch` Cluster also seems healthy. When running a curl from `elasticsearch-proxy` to https://localhost:9200 it also works. But everything going through https://elasticsearch.openshift-logging.svc:9200 is stuck and either hanging or throwing an error. Version-Release number of selected component (if applicable): - clusterlogging.4.6.0-202106010807.p0.git.e091c13 - elasticsearch-operator.4.6.0-202106010807.p0.git.c07c7ab How reproducible: - N/A Steps to Reproduce: 1. N/A Actual results: Logs from `kibana` are showing the below error messages. ["status","plugin:elasticsearch.1","error"],"pid":117,"state":"red","message":"Status changed from green to red - Request Timeout after 3000ms","prevState":"green","prevMsg":"Ready"} Elasticsearch ERROR: 2021-06-10T09:08:49Z Error: Request error, retrying GET https://elasticsearch.openshift-logging.svc:9200/_opendistro/_security/tenantinfo => socket hang up at Log.error (/opt/app-root/src/node_modules/elasticsearch/src/lib/log.js:226:56) at checkRespForFailure (/opt/app-root/src/node_modules/elasticsearch/src/lib/transport.js:259:18) at HttpConnector.<anonymous> (/opt/app-root/src/node_modules/elasticsearch/src/lib/connectors/http.js:164:7) at ClientRequest.wrapper (/opt/app-root/src/node_modules/elasticsearch/node_modules/lodash/lodash.js:4935:19) at ClientRequest.emit (events.js:198:13) at TLSSocket.socketCloseListener (_http_client.js:373:11) at TLSSocket.emit (events.js:203:15) at _handle.close (net.js:607:12) at TCP.done (_tls_wrap.js:400:7) {"type":"log","@timestamp":"2021-06-10T09:09:31Z","tags":["status","plugin:elasticsearch.1","info"],"pid":117,"state":"green","message":"Status changed from red to green - Ready","prevState":"red","prevMsg":"Request Timeout after 3000ms"} Elasticsearch ERROR: 2021-06-10T09:09:42Z Error: Request error, retrying GET https://elasticsearch.openshift-logging.svc:9200/_opendistro/_security/authinfo => socket hang up at Log.error (/opt/app-root/src/node_modules/elasticsearch/src/lib/log.js:226:56) at checkRespForFailure (/opt/app-root/src/node_modules/elasticsearch/src/lib/transport.js:259:18) at HttpConnector.<anonymous> (/opt/app-root/src/node_modules/elasticsearch/src/lib/connectors/http.js:164:7) at ClientRequest.wrapper (/opt/app-root/src/node_modules/elasticsearch/node_modules/lodash/lodash.js:4935:19) at ClientRequest.emit (events.js:198:13) at TLSSocket.socketCloseListener (_http_client.js:373:11) at TLSSocket.emit (events.js:203:15) at _handle.close (net.js:607:12) at TCP.done (_tls_wrap.js:400:7) {"type":"log","@timestamp":"2021-06-10T09:09:52Z","tags":["status","plugin:elasticsearch.1","error"],"pid":117,"state":"red","message":"Status changed from green to red - Request Timeout after 3000ms","prevState":"green","prevMsg":"Ready"} Logs from `elasticsearch-proxy` are reporting the below problem. time="2021-06-11T12:07:49Z" level=info msg="mapping path \"/\" => upstream \"https://localhost:9200/\"" 2021/06/11 12:07:54 http: proxy error: dial tcp [::1]:9200: connect: connection refused 2021/06/11 12:08:13 http: proxy error: dial tcp [::1]:9200: connect: connection refused 2021/06/11 12:08:23 http: proxy error: dial tcp [::1]:9200: connect: connection refused 2021/06/11 12:09:23 http: proxy error: context canceled 2021/06/11 12:12:53 http: proxy error: context canceled 2021/06/11 12:13:03 http: proxy error: context canceled 2021/06/11 12:13:23 http: proxy error: context canceled 2021/06/11 12:13:33 http: proxy error: context canceled 2021/06/11 12:13:34 http: proxy error: read tcp 10.94.17.22:60000->10.94.59.4:58224: i/o timeout 2021/06/11 12:13:43 http: proxy error: read tcp 10.94.17.22:60000->10.94.83.9:41504: i/o timeout 2021/06/11 12:13:43 http: proxy error: context canceled 2021/06/11 12:16:33 http: proxy error: context canceled time="2021-06-11T12:16:36Z" level=info msg="Error processing request in handler authorization: Unable to determine username" 2021/06/11 12:16:53 http: proxy error: context canceled 2021/06/11 12:17:03 http: proxy error: context canceled 2021/06/11 12:17:24 http: proxy error: context canceled 2021/06/11 12:17:33 http: proxy error: context canceled 2021/06/11 12:17:53 http: proxy error: context canceled 2021/06/11 12:18:03 http: proxy error: context canceled 2021/06/11 12:18:23 http: proxy error: context canceled 2021/06/11 12:18:33 http: proxy error: context canceled 2021/06/11 12:18:53 http: proxy error: context canceled 2021/06/11 12:19:03 http: proxy error: context canceled 2021/06/11 12:19:23 http: proxy error: context canceled 2021/06/11 12:19:33 http: proxy error: context canceled 2021/06/11 12:19:36 http: proxy error: context canceled 2021/06/11 12:19:53 http: proxy error: context canceled 2021/06/11 12:20:03 http: proxy error: context canceled 2021/06/11 12:20:23 http: proxy error: context canceled Expected results: No error being reported and access to elasticsearch and thus overall OpenShift Container Platform 4 - Logging to work as expected. Additional info:
Putting this back to medium Severity/Priority as the mentioned clusters recovered. Further investigation ongoing by support.
@sreber Do you have any outcome of the header investigation on the customer cluster?
@