1730087 – Regenerating logging certificates can fail with SEC_ERROR_REUSED_ISSUER_AND_SERIAL

Bug 1730087 - Regenerating logging certificates can fail with SEC_ERROR_REUSED_ISSUER_AND_SERIAL

Summary: Regenerating logging certificates can fail with SEC_ERROR_REUSED_ISSUER_AND_S...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.9.0
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Jeff Cantrill
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-15 19:28 UTC by Matthew Robson
Modified:	2019-12-04 15:52 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-07-18 05:17:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Matthew Robson 2019-07-15 19:28:55 UTC

Description of problem:

Following the steps to regenerate new logging certs in 3.9: https://docs.openshift.com/container-platform/3.9/install_config/aggregate_logging.html#fluentd-redeploy-certs

The ansible playbook failed out.

TASK [openshift_logging_elasticsearch : Getting ES version for logging-es cluster] ***
fatal: [myDRserver2001.myDomain.tld]: FAILED! => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "exec", "logging-es-nh7hh97z-12-hpdlk", "-c", "elasticsearch", "-n", "logging", "--", "curl", "-s", "--cacert", "/etc/elasticsearch/secret/admin-ca", "--cert", "/etc/elasticsearch/secret/admin-cert", "--key", "/etc/elasticsearch/secret/admin-key", "-XGET", "https://localhost:9200/"], "delta": "0:00:00.646870", "end": "2019-07-15 08:47:36.275881", "failed": true, "msg": "non-zero return code", "rc": 35, "start": "2019-07-15 08:47:35.629011", "stderr": "command terminated with exit code 35", "stderr_lines": ["command terminated with exit code 35"], "stdout": "", "stdout_lines": []}

The actual issue is:

[ansible@myDRserver2001 ~]$ oc exec logging-es-nh7hh97z-12-hpdlk -c elasticsearch -n logging -- curl -vs --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key -XGET https://localhost:9200/
* About to connect() to localhost port 9200 (#0)
*   Trying ::1...
* Connected to localhost (::1) port 9200 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/elasticsearch/secret/admin-ca
  CApath: none
* NSS error -8054 (SEC_ERROR_REUSED_ISSUER_AND_SERIAL)
* You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
* Closing connection 0
command terminated with exit code 35

Looking at the CAs and CRTs, the Serials are all ok (not matching) - this isn't related to https://github.com/openshift/openshift-ansible/issues/7252

The issue seems to be that the old crt and the new crt have the same serial and the old crt is cached somewhere with nss causing curl to fail.

Can not find where the old crt is, certutil shows an empty DB.

sh-4.2$ certutil -L -d sql:/etc/pki/nssdb

Certificate Nickname                                         Trust Attributes
                                                             SSL,S/MIME,JAR/XPI

Deleting the ES pod and allowing it to create resolves the issue as the old crt seems to be purged, but because the playbook failed, everything else in the logging stack is out of sync.

Version-Release number of selected component (if applicable):

v3.9.51


How reproducible:

Random


Steps to Reproduce:

1) Deleted the certs from the masters

2) Reran logging playbook

3) CA and Certs were recreated - the CA looks fine - serial 1

4) The new cert for ES has the same issuer CN=logging-signer-test - this is expected

5) The new cert for ES has a Serial 5 which we assume is the same Serial as the old certificate and thus fails

Actual results:


Expected results:
Playbook fails and looking stack is broken

Additional info:

Comment 11 Dirk Porter 2019-11-27 15:32:00 UTC

Hello, 

I have a customer with this issue in 3.9. Is there a workaround that was discovered for this? 

Regards, 

Dirk Porter

Note You need to log in before you can comment on or make changes to this bug.