1918920 – The elasticsearch pods doesn't get restarted automatically after each update

Bug 1918920 - The elasticsearch pods doesn't get restarted automatically after each update

Summary: The elasticsearch pods doesn't get restarted automatically after each update

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.z
Assignee:	Gerard Vanloo
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:	logging-exploration
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-21 17:20 UTC by KOSAL RAJ I
Modified:	2024-10-01 17:21 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The elasticsearch pods are not marked to be restarted after secret changes. Consequence: The elasticsearch pods doesn't get restarted automatically after each update Fix: Add a new controller to watch the secret of elasticsearch. If there is any change to the secret, controller will change the elasticsearch cluster's status to be "scheduledRedeploy" Result: Elasticsearch cluster can be restarted correctly after its secret is changed.
Clone Of:
Clones:	1923788 1952968 (view as bug list)
Environment:
Last Closed:	2021-07-23 14:50:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift elasticsearch-operator pull 628	None	closed	Bug 1918920: [LOGEXP 1009] Watch secret update for elasticsearch cluster	2021-02-20 00:02:58 UTC
Github	openshift elasticsearch-operator pull 667	None	closed	BUG 1918920: requeue secret update event when the cluster is being ce…	2021-04-28 14:42:47 UTC
Github	openshift elasticsearch-operator pull 695	None	closed	Bug 1918920: Allow secret reconciler to reconcile during create	2021-04-28 14:42:54 UTC

Description KOSAL RAJ I 2021-01-21 17:20:18 UTC

Description of problem:
The elasticsearch pods doesn't get restarted automatically after each update

Version-Release number of selected component (if applicable):
OCP 4.5

How reproducible:
Enabling automatic update for logging cluster

Steps to Reproduce:
1.
2.
3.

Actual results:
ES log:
2021/01/04 08:48:25 http: TLS handshake error from 10.128.0.29:54900: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-cluster-logging-signer")
2021/01/04 08:48:27 http: TLS handshake error from 10.128.4.21:49714: remote error: tls: unknown certificate authority
2021/01/04 08:48:41 http: TLS handshake error from 10.128.0.29:55360: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-cluster-logging-signer")
2021/01/04 08:48:46 http: TLS handshake error from 10.128.0.29:55518: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-cluster-logging-signer")
2021/01/04 08:48:48 http: TLS handshake error from 10.128.0.29:55600: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-cluster-logging-signer")
time="2021-01-04T08:48:51Z" level=info msg="Handling request \"authorization\""
time="2021-01-04T08:48:52Z" level=info msg="Handling request \"authorization\""

Expected results:
ES pods should get updated.

Additional info:

Similar issue share in the KCS: https://access.redhat.com/solutions/5347071

Manual restarting of pod fixes the issue.

Comment 1 Jeff Cantrill 2021-01-25 17:12:00 UTC

Eric thoughts is this would be fixed by https://github.com/openshift/cluster-logging-operator/pull/858 to fix up the cert storage and generation?  It smells of the same issue

Comment 2 ewolinet 2021-01-25 18:09:15 UTC

It seems to be based on the same issue, yes.

We have a PR in the works for EO that should help the operator recognize when this happens and reschedule all es pods to be restarted.

Comment 13 Matthew Sweikert 2021-02-24 20:50:33 UTC

Hello,

As mentioned earlier, this ticket is currently blocking https://issues.redhat.com/browse/TRACING-1725.  I moved this issue to urgent as this will now push the customer into an unsupported state, sitting on OCP 4.4.

Comment 17 Hui Kang 2021-03-12 19:27:42 UTC

Two jira issues:

- 5.1: https://issues.redhat.com/browse/LOG-1205
- 5.0: https://issues.redhat.com/browse/LOG-1206

Comment 22 Gerard Vanloo 2021-07-23 14:50:24 UTC

Closed. This was moved to JIRA: https://issues.redhat.com/browse/LOG-1619

Note You need to log in before you can comment on or make changes to this bug.