Description of problem: The elasticsearch pods doesn't get restarted automatically after each update Version-Release number of selected component (if applicable): OCP 4.5 How reproducible: Enabling automatic update for logging cluster Steps to Reproduce: 1. 2. 3. Actual results: ES log: 2021/01/04 08:48:25 http: TLS handshake error from 10.128.0.29:54900: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-cluster-logging-signer") 2021/01/04 08:48:27 http: TLS handshake error from 10.128.4.21:49714: remote error: tls: unknown certificate authority 2021/01/04 08:48:41 http: TLS handshake error from 10.128.0.29:55360: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-cluster-logging-signer") 2021/01/04 08:48:46 http: TLS handshake error from 10.128.0.29:55518: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-cluster-logging-signer") 2021/01/04 08:48:48 http: TLS handshake error from 10.128.0.29:55600: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-cluster-logging-signer") time="2021-01-04T08:48:51Z" level=info msg="Handling request \"authorization\"" time="2021-01-04T08:48:52Z" level=info msg="Handling request \"authorization\"" Expected results: ES pods should get updated. Additional info: Similar issue share in the KCS: https://access.redhat.com/solutions/5347071 Manual restarting of pod fixes the issue.
Eric thoughts is this would be fixed by https://github.com/openshift/cluster-logging-operator/pull/858 to fix up the cert storage and generation? It smells of the same issue
It seems to be based on the same issue, yes. We have a PR in the works for EO that should help the operator recognize when this happens and reschedule all es pods to be restarted.
Hello, As mentioned earlier, this ticket is currently blocking https://issues.redhat.com/browse/TRACING-1725. I moved this issue to urgent as this will now push the customer into an unsupported state, sitting on OCP 4.4.
Two jira issues: - 5.1: https://issues.redhat.com/browse/LOG-1205 - 5.0: https://issues.redhat.com/browse/LOG-1206
Closed. This was moved to JIRA: https://issues.redhat.com/browse/LOG-1619