+++ This bug was initially created as a clone of Bug #1841832 +++ Description of problem: EO operator can't access EO-service: time="2020-05-29T13:28:07Z" level=warning msg="Unable to perform synchronized flush: Post https://elasticsearch.openshift-logging.svc:9200/_flush/synced: dial tcp 172.30.212.255:9200: i/o timeout" Version-Release number of selected component (if applicable): 4.5 How reproducible: Upgrade 4.4 CLO and EO with CLO's instance created. Steps to Reproduce: 1. Install 4.4 CLO and EO 2. Create CLO's CR 3. Upgrade EO to 4.5 Actual results: Elasticsearch cluster can't start Expected results: Elasticsearch cluster upgraded, up and running Additional info: --- Additional comment from ikarpukh on 2020-05-29 15:27:47 UTC --- EO LOGS: time="2020-05-29T13:28:07Z" level=warning msg="Unable to perform synchronized flush: Post https://elasticsearch.openshift-logging.svc:9200/_flush/synced: dial tcp 172.30.212.255:9200: i/o timeout" time="2020-05-29T13:29:43Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster" time="2020-05-29T13:29:43Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-tudtgwxx-1: Node elasticsearch-cdm-tudtgwxx-1 has not rejoined cluster elasticsearch yet" time="2020-05-29T13:30:13Z" level=info msg="Waiting for cluster to be recovered before upgrading elasticsearch-cdm-tudtgwxx-2: / [yellow green]" time="2020-05-29T13:30:13Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-tudtgwxx-2: Cluster not in at least yellow state before beginning upgrade: " time="2020-05-29T13:30:43Z" level=info msg="Waiting for cluster to be recovered before upgrading elasticsearch-cdm-tudtgwxx-3: / [yellow green]" time="2020-05-29T13:30:43Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-tudtgwxx-3: Cluster not in at least yellow state before beginning upgrade: " time="2020-05-29T13:33:45Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster" time="2020-05-29T13:36:47Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster" time="2020-05-29T13:39:49Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster" time="2020-05-29T13:42:51Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster" time="2020-05-29T13:45:54Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster" time="2020-05-29T13:48:56Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster" time="2020-05-29T13:51:58Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster" OPENSHIFT-LOGGING NAMESPACE STATUS: cluster-logging-operator-7d7bbc88f5-vvv75 1/1 Running 0 72m curator-1590759000-7jkdq 1/1 Running 0 56m curator-1590760200-9pfq2 1/1 Running 0 36m curator-1590762000-fm7t7 0/1 Completed 0 6m34s elasticsearch-cdm-tudtgwxx-1-85d69b58fc-cc5dg 1/2 Running 0 3m5s elasticsearch-cdm-tudtgwxx-2-b888c4d44-djvnf 1/2 Running 0 2m22s elasticsearch-cdm-tudtgwxx-3-69db67bd47-8hdpt 1/2 Running 0 34s fluentd-7mblm 1/1 Running 0 71m fluentd-82vkv 1/1 Running 0 71m fluentd-qvkkd 1/1 Running 0 71m fluentd-smhcc 1/1 Running 0 71m fluentd-tfjqj 1/1 Running 0 71m fluentd-zcwl7 1/1 Running 0 71m kibana-7676965bcf-dn65g 2/2 Running 0 71m --- Additional comment from anli on 2020-06-03 04:23:33 UTC --- After I deleted all ES pods, the new ES pods became running. What is the root cause? cluster-logging-operator-98f5c5fd-4pw4x 1/1 Running 0 20m curator-1591153800-zb8lb 0/1 Completed 0 66m curator-1591157400-6gdbs 0/1 Error 0 6m45s elasticsearch-cdm-knaloezd-1-7bbdf76f85-z2pqv 2/2 Running 0 22m elasticsearch-cdm-knaloezd-2-5d9d75f6fb-qs7gc 2/2 Running 0 22m elasticsearch-cdm-knaloezd-3-86dd567d7-b2vlz 2/2 Running 0 22m elasticsearch-delete-app-1591157700-4gmcl 0/1 Completed 0 112s elasticsearch-delete-audit-1591157700-tm2lh 0/1 Completed 0 112s elasticsearch-delete-infra-1591157700-jlhmg 0/1 Completed 0 112s elasticsearch-rollover-app-1591157700-5l9rh 0/1 Error 0 112s elasticsearch-rollover-audit-1591157700-sm7rm 0/1 Error 0 112s elasticsearch-rollover-infra-1591157700-78mpt 0/1 Error 0 112s --- Additional comment from anli on 2020-06-03 09:06:19 UTC --- This network proxy works for me. apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: restricted-es-policy namespace: openshift-logging spec: ingress: - from: - namespaceSelector: matchLabels: openshift.io/cluster-logging: "true" podSelector: matchLabels: name: elasticsearch-operator - podSelector: matchLabels: component: elasticsearch podSelector: matchLabels: component: elasticsearch policyTypes: - Ingress
Verified on elasticsearch-operator.4.5.0-202006091957
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409