Bug 1918920 - The elasticsearch pods doesn't get restarted automatically after each update
Summary: The elasticsearch pods doesn't get restarted automatically after each update
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.z
Assignee: Gerard Vanloo
QA Contact: Anping Li
URL:
Whiteboard: logging-exploration
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-21 17:20 UTC by KOSAL RAJ I
Modified: 2021-07-23 14:50 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The elasticsearch pods are not marked to be restarted after secret changes. Consequence: The elasticsearch pods doesn't get restarted automatically after each update Fix: Add a new controller to watch the secret of elasticsearch. If there is any change to the secret, controller will change the elasticsearch cluster's status to be "scheduledRedeploy" Result: Elasticsearch cluster can be restarted correctly after its secret is changed.
Clone Of:
: 1923788 1952968 (view as bug list)
Environment:
Last Closed: 2021-07-23 14:50:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift elasticsearch-operator pull 628 0 None closed Bug 1918920: [LOGEXP 1009] Watch secret update for elasticsearch cluster 2021-02-20 00:02:58 UTC
Github openshift elasticsearch-operator pull 667 0 None closed BUG 1918920: requeue secret update event when the cluster is being ce… 2021-04-28 14:42:47 UTC
Github openshift elasticsearch-operator pull 695 0 None closed Bug 1918920: Allow secret reconciler to reconcile during create 2021-04-28 14:42:54 UTC

Description KOSAL RAJ I 2021-01-21 17:20:18 UTC
Description of problem:
The elasticsearch pods doesn't get restarted automatically after each update

Version-Release number of selected component (if applicable):
OCP 4.5

How reproducible:
Enabling automatic update for logging cluster

Steps to Reproduce:
1.
2.
3.

Actual results:
ES log:
2021/01/04 08:48:25 http: TLS handshake error from 10.128.0.29:54900: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-cluster-logging-signer")
2021/01/04 08:48:27 http: TLS handshake error from 10.128.4.21:49714: remote error: tls: unknown certificate authority
2021/01/04 08:48:41 http: TLS handshake error from 10.128.0.29:55360: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-cluster-logging-signer")
2021/01/04 08:48:46 http: TLS handshake error from 10.128.0.29:55518: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-cluster-logging-signer")
2021/01/04 08:48:48 http: TLS handshake error from 10.128.0.29:55600: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-cluster-logging-signer")
time="2021-01-04T08:48:51Z" level=info msg="Handling request \"authorization\""
time="2021-01-04T08:48:52Z" level=info msg="Handling request \"authorization\""

Expected results:
ES pods should get updated.

Additional info:

Similar issue share in the KCS: https://access.redhat.com/solutions/5347071

Manual restarting of pod fixes the issue.

Comment 1 Jeff Cantrill 2021-01-25 17:12:00 UTC
Eric thoughts is this would be fixed by https://github.com/openshift/cluster-logging-operator/pull/858 to fix up the cert storage and generation?  It smells of the same issue

Comment 2 ewolinet 2021-01-25 18:09:15 UTC
It seems to be based on the same issue, yes.

We have a PR in the works for EO that should help the operator recognize when this happens and reschedule all es pods to be restarted.

Comment 13 Matthew Sweikert 2021-02-24 20:50:33 UTC
Hello,

As mentioned earlier, this ticket is currently blocking https://issues.redhat.com/browse/TRACING-1725.  I moved this issue to urgent as this will now push the customer into an unsupported state, sitting on OCP 4.4.

Comment 17 Hui Kang 2021-03-12 19:27:42 UTC
Two jira issues:

- 5.1: https://issues.redhat.com/browse/LOG-1205
- 5.0: https://issues.redhat.com/browse/LOG-1206

Comment 22 Gerard Vanloo 2021-07-23 14:50:24 UTC
Closed. This was moved to JIRA: https://issues.redhat.com/browse/LOG-1619


Note You need to log in before you can comment on or make changes to this bug.