Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1843715

Summary: EO operator can't talk to itself after 4.4 -> 4.5 upgrade
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: LoggingAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.5CC: aos-bugs, jcantril, scuppett
Target Milestone: ---Keywords: TestBlocker, Upgrades
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:43:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1841832    
Bug Blocks:    

Description OpenShift BugZilla Robot 2020-06-03 22:25:42 UTC
+++ This bug was initially created as a clone of Bug #1841832 +++

Description of problem:
EO operator can't access EO-service:
time="2020-05-29T13:28:07Z" level=warning msg="Unable to perform synchronized flush: Post https://elasticsearch.openshift-logging.svc:9200/_flush/synced: dial tcp 172.30.212.255:9200: i/o timeout"



Version-Release number of selected component (if applicable): 4.5


How reproducible: 
Upgrade 4.4 CLO and EO with CLO's instance created.


Steps to Reproduce:
1. Install 4.4 CLO and EO
2. Create CLO's CR
3. Upgrade EO to 4.5


Actual results:
Elasticsearch cluster can't start


Expected results:
Elasticsearch cluster upgraded, up and running

Additional info:

--- Additional comment from ikarpukh on 2020-05-29 15:27:47 UTC ---

EO LOGS:

time="2020-05-29T13:28:07Z" level=warning msg="Unable to perform synchronized flush: Post https://elasticsearch.openshift-logging.svc:9200/_flush/synced: dial tcp 172.30.212.255:9200: i/o timeout"
time="2020-05-29T13:29:43Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster"
time="2020-05-29T13:29:43Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-tudtgwxx-1: Node elasticsearch-cdm-tudtgwxx-1 has not rejoined cluster elasticsearch yet"
time="2020-05-29T13:30:13Z" level=info msg="Waiting for cluster to be recovered before upgrading elasticsearch-cdm-tudtgwxx-2:  / [yellow green]"
time="2020-05-29T13:30:13Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-tudtgwxx-2: Cluster not in at least yellow state before beginning upgrade: "
time="2020-05-29T13:30:43Z" level=info msg="Waiting for cluster to be recovered before upgrading elasticsearch-cdm-tudtgwxx-3:  / [yellow green]"
time="2020-05-29T13:30:43Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-tudtgwxx-3: Cluster not in at least yellow state before beginning upgrade: "
time="2020-05-29T13:33:45Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster"
time="2020-05-29T13:36:47Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster"
time="2020-05-29T13:39:49Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster"
time="2020-05-29T13:42:51Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster"
time="2020-05-29T13:45:54Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster"
time="2020-05-29T13:48:56Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster"
time="2020-05-29T13:51:58Z" level=info msg="Timed out waiting for elasticsearch-cdm-tudtgwxx-1 to rejoin cluster"


OPENSHIFT-LOGGING NAMESPACE STATUS:
cluster-logging-operator-7d7bbc88f5-vvv75       1/1       Running     0          72m
curator-1590759000-7jkdq                        1/1       Running     0          56m
curator-1590760200-9pfq2                        1/1       Running     0          36m
curator-1590762000-fm7t7                        0/1       Completed   0          6m34s
elasticsearch-cdm-tudtgwxx-1-85d69b58fc-cc5dg   1/2       Running     0          3m5s
elasticsearch-cdm-tudtgwxx-2-b888c4d44-djvnf    1/2       Running     0          2m22s
elasticsearch-cdm-tudtgwxx-3-69db67bd47-8hdpt   1/2       Running     0          34s
fluentd-7mblm                                   1/1       Running     0          71m
fluentd-82vkv                                   1/1       Running     0          71m
fluentd-qvkkd                                   1/1       Running     0          71m
fluentd-smhcc                                   1/1       Running     0          71m
fluentd-tfjqj                                   1/1       Running     0          71m
fluentd-zcwl7                                   1/1       Running     0          71m
kibana-7676965bcf-dn65g                         2/2       Running     0          71m

--- Additional comment from anli on 2020-06-03 04:23:33 UTC ---

After I deleted all ES pods, the new ES pods became running.  What is the root cause? 

cluster-logging-operator-98f5c5fd-4pw4x         1/1     Running     0          20m
curator-1591153800-zb8lb                        0/1     Completed   0          66m
curator-1591157400-6gdbs                        0/1     Error       0          6m45s
elasticsearch-cdm-knaloezd-1-7bbdf76f85-z2pqv   2/2     Running     0          22m
elasticsearch-cdm-knaloezd-2-5d9d75f6fb-qs7gc   2/2     Running     0          22m
elasticsearch-cdm-knaloezd-3-86dd567d7-b2vlz    2/2     Running     0          22m
elasticsearch-delete-app-1591157700-4gmcl       0/1     Completed   0          112s
elasticsearch-delete-audit-1591157700-tm2lh     0/1     Completed   0          112s
elasticsearch-delete-infra-1591157700-jlhmg     0/1     Completed   0          112s
elasticsearch-rollover-app-1591157700-5l9rh     0/1     Error       0          112s
elasticsearch-rollover-audit-1591157700-sm7rm   0/1     Error       0          112s
elasticsearch-rollover-infra-1591157700-78mpt   0/1     Error       0          112s

--- Additional comment from anli on 2020-06-03 09:06:19 UTC ---

This network proxy works for me. 

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: restricted-es-policy
      namespace: openshift-logging
    spec:
      ingress:
      - from:
        - namespaceSelector:
            matchLabels:
              openshift.io/cluster-logging: "true"
          podSelector:
            matchLabels:
              name: elasticsearch-operator
        - podSelector:
            matchLabels:
              component: elasticsearch
      podSelector:
        matchLabels:
          component: elasticsearch
      policyTypes:
      - Ingress

Comment 3 Anping Li 2020-06-10 06:21:07 UTC
Verified on elasticsearch-operator.4.5.0-202006091957

Comment 4 errata-xmlrpc 2020-07-13 17:43:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409