Bug 1782740 - Elasticsearch container start failed after `scheduledCertRedeploy` when enable forwarding logs to external logstore as secure.
Summary: Elasticsearch container start failed after `scheduledCertRedeploy` when enabl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.4.0
Assignee: ewolinet
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks: 1813381
TreeView+ depends on / blocked
 
Reported: 2019-12-12 09:23 UTC by Qiaoling Tang
Modified: 2020-05-04 11:20 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 11:19:53 UTC
Target Upstream Version:
hyoskim: needinfo+


Attachments (Terms of Use)
CLO and ES pods' logs (1006.03 KB, application/gzip)
2019-12-12 09:23 UTC, Qiaoling Tang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-logging-operator pull 323 0 None closed Bug 1782740: Addressing LF causing cert rotation 2020-12-29 11:32:43 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:20:27 UTC

Description Qiaoling Tang 2019-12-12 09:23:41 UTC
Created attachment 1644333 [details]
CLO and ES pods' logs

Description of problem:
Deploy logging, and enable logforwarding to forward logs to external log store as secure. The secrets in the openshift-logging namespace always changed, but after the secrets changed, the ES pods can't start.

$ oc get pod
NAME                                            READY   STATUS      RESTARTS   AGE
cluster-logging-operator-5d4fb68497-pwwn5       1/1     Running     0          33m
curator-1576139400-xd2x6                        0/1     Completed   0          27m
elasticsearch-cdm-33pwd8fe-1-5458bc5cb-7l7r2    1/2     Running     0          18m
elasticsearch-cdm-33pwd8fe-2-8596cd8b5c-mkdzn   1/2     Running     0          18m
elasticsearch-cdm-33pwd8fe-3-5888454ccc-xnlpc   1/2     Running     0          18m
fluentd-6rrs5                                   1/1     Running     0          28m
fluentd-mmzp4                                   1/1     Running     0          28m
fluentd-nhwh7                                   1/1     Running     0          28m
fluentd-r5wbq                                   1/1     Running     0          28m
fluentd-t2ssn                                   1/1     Running     0          28m
fluentd-vrhch                                   1/1     Running     0          28m
kibana-c7785c496-ggfj5                          2/2     Running     0          13m


$ oc get logforwarding -oyaml
apiVersion: v1
items:
- apiVersion: logging.openshift.io/v1alpha1
  kind: LogForwarding
  metadata:
    creationTimestamp: "2019-12-12T08:29:04Z"
    generation: 1
    name: instance
    namespace: openshift-logging
    resourceVersion: "367839"
    selfLink: /apis/logging.openshift.io/v1alpha1/namespaces/openshift-logging/logforwardings/instance
    uid: f09de864-9941-4d4c-86c6-58db033d26c5
  spec:
    outputs:
    - endpoint: fluentdserver.fluentd.svc:24224
      name: fluentd-created-by-user
      secret:
        name: fluentdserver-test
      type: forward
    pipelines:
    - inputSource: logs.app
      name: app-pipeline
      outputRefs:
      - fluentd-created-by-user
    - inputSource: logs.infra
      name: infra-pipeline
      outputRefs:
      - fluentd-created-by-user
    - inputSource: logs.audit
      name: audit-pipeline
      outputRefs:
      - fluentd-created-by-user
  status:
    lastUpdated: null
    outputs:
    - lastUpdated: "2019-12-12T08:57:34Z"
      name: fluentd-created-by-user
      state: Accepted
    pipelines:
    - lastUpdated: "2019-12-12T08:57:34Z"
      name: app-pipeline
      state: Accepted
    - lastUpdated: "2019-12-12T08:57:34Z"
      name: infra-pipeline
      state: Accepted
    - lastUpdated: "2019-12-12T08:57:34Z"
      name: audit-pipeline
      state: Accepted
    sources:
    - logs.app
    - logs.infra
    - logs.audit
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

EO log:
$ oc logs -n openshift-operators-redhat elasticsearch-operator-5698b6bcb7-prbzp
time="2019-12-12T08:25:25Z" level=warning msg="Unable to parse loglevel \"\""
{"level":"info","ts":1576139125.2378323,"logger":"cmd","msg":"Go Version: go1.12.12"}
{"level":"info","ts":1576139125.237872,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1576139125.2378771,"logger":"cmd","msg":"Version of operator-sdk: v0.8.2"}
{"level":"info","ts":1576139125.2383296,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1576139125.3436556,"logger":"leader","msg":"Found existing lock","LockOwner":"elasticsearch-operator-5698b6bcb7-pnzsv"}
{"level":"info","ts":1576139125.3521707,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1576139126.4785707,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1576139126.5475724,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1576139126.5478206,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1576139126.6440153,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
{"level":"info","ts":1576139126.644037,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1576139126.7442143,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"}
{"level":"info","ts":1576139126.8443587,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1}
time="2019-12-12T08:28:15Z" level=info msg="Flushing nodes for openshift-logging/elasticsearch"
time="2019-12-12T08:28:43Z" level=info msg="Flushing nodes for openshift-logging/elasticsearch"
time="2019-12-12T08:33:22Z" level=info msg="Beginning full cluster restart for cert redeploy on elasticsearch"
time="2019-12-12T08:33:22Z" level=warning msg="Unable to disable shard allocation: Put https://elasticsearch.openshift-logging.svc:9200/_cluster/settings: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"openshift-cluster-logging-signer\")"
time="2019-12-12T08:33:22Z" level=warning msg="Unable to perform synchronized flush: Post https://elasticsearch.openshift-logging.svc:9200/_flush/synced: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"openshift-cluster-logging-signer\")"
time="2019-12-12T08:33:22Z" level=warning msg="Unable to get cluster size prior to restart for elasticsearch-cdm-33pwd8fe-1"
time="2019-12-12T08:33:22Z" level=warning msg="Unable to get cluster size prior to restart for elasticsearch-cdm-33pwd8fe-2"
time="2019-12-12T08:33:22Z" level=warning msg="Unable to get cluster size prior to restart for elasticsearch-cdm-33pwd8fe-3"
time="2019-12-12T08:34:37Z" level=info msg="Timed out waiting for elasticsearch-cdm-33pwd8fe-1 to leave the cluster"
time="2019-12-12T08:36:18Z" level=info msg="Timed out waiting for elasticsearch-cdm-33pwd8fe-2 to leave the cluster"
time="2019-12-12T08:37:50Z" level=info msg="Timed out waiting for elasticsearch-cdm-33pwd8fe-3 to leave the cluster"
time="2019-12-12T08:39:50Z" level=warning msg="Unable to enable shard allocation: Put https://elasticsearch.openshift-logging.svc:9200/_cluster/settings: dial tcp 172.30.221.223:9200: i/o timeout"
time="2019-12-12T08:41:50Z" level=info msg="Waiting for cluster to complete recovery:  / green"
time="2019-12-12T08:43:50Z" level=info msg="Waiting for cluster to complete recovery:  / green"
time="2019-12-12T08:45:51Z" level=info msg="Waiting for cluster to complete recovery:  / green"
time="2019-12-12T08:47:51Z" level=info msg="Waiting for cluster to complete recovery:  / green"
time="2019-12-12T08:49:51Z" level=info msg="Waiting for cluster to complete recovery:  / green"
time="2019-12-12T08:51:51Z" level=info msg="Waiting for cluster to complete recovery:  / green"
time="2019-12-12T08:53:52Z" level=info msg="Waiting for cluster to complete recovery:  / green"
time="2019-12-12T08:55:52Z" level=info msg="Waiting for cluster to complete recovery:  / green"
time="2019-12-12T08:57:52Z" level=info msg="Waiting for cluster to complete recovery:  / green"
time="2019-12-12T08:59:52Z" level=info msg="Waiting for cluster to complete recovery:  / green"



Version-Release number of selected component (if applicable):
ose-elasticsearch-operator-v4.3.0-201912111602
ose-cluster-logging-operator-v4.3.0-201912111602

How reproducible:
Always

Steps to Reproduce:
1.deploy logging
2.enable logforwarding to forward logs to external log store
3.wait for several minutes

Actual results:
The secrets always changed after enabling logforwarding, and the ES pods couldn't start successfully.

Expected results:
The secrets should not be changed.
If the secrets changed, the ES cluster should be able to come back.

Additional info:

Comment 6 Qiaoling Tang 2020-01-13 02:01:35 UTC
Verified with ose-cluster-logging-operator-v4.4.0-202001102023

Comment 11 errata-xmlrpc 2020-05-04 11:19:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.