Bug 1730087 - Regenerating logging certificates can fail with SEC_ERROR_REUSED_ISSUER_AND_SERIAL
Summary: Regenerating logging certificates can fail with SEC_ERROR_REUSED_ISSUER_AND_S...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.9.0
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Jeff Cantrill
QA Contact: Anping Li
Depends On:
TreeView+ depends on / blocked
Reported: 2019-07-15 19:28 UTC by Matthew Robson
Modified: 2019-12-04 15:52 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-07-18 05:17:58 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Matthew Robson 2019-07-15 19:28:55 UTC
Description of problem:

Following the steps to regenerate new logging certs in 3.9: https://docs.openshift.com/container-platform/3.9/install_config/aggregate_logging.html#fluentd-redeploy-certs

The ansible playbook failed out.

TASK [openshift_logging_elasticsearch : Getting ES version for logging-es cluster] ***
fatal: [myDRserver2001.myDomain.tld]: FAILED! => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "exec", "logging-es-nh7hh97z-12-hpdlk", "-c", "elasticsearch", "-n", "logging", "--", "curl", "-s", "--cacert", "/etc/elasticsearch/secret/admin-ca", "--cert", "/etc/elasticsearch/secret/admin-cert", "--key", "/etc/elasticsearch/secret/admin-key", "-XGET", "https://localhost:9200/"], "delta": "0:00:00.646870", "end": "2019-07-15 08:47:36.275881", "failed": true, "msg": "non-zero return code", "rc": 35, "start": "2019-07-15 08:47:35.629011", "stderr": "command terminated with exit code 35", "stderr_lines": ["command terminated with exit code 35"], "stdout": "", "stdout_lines": []}

The actual issue is:

[ansible@myDRserver2001 ~]$ oc exec logging-es-nh7hh97z-12-hpdlk -c elasticsearch -n logging -- curl -vs --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key -XGET https://localhost:9200/
* About to connect() to localhost port 9200 (#0)
*   Trying ::1...
* Connected to localhost (::1) port 9200 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/elasticsearch/secret/admin-ca
  CApath: none
* You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
* Closing connection 0
command terminated with exit code 35

Looking at the CAs and CRTs, the Serials are all ok (not matching) - this isn't related to https://github.com/openshift/openshift-ansible/issues/7252

The issue seems to be that the old crt and the new crt have the same serial and the old crt is cached somewhere with nss causing curl to fail.

Can not find where the old crt is, certutil shows an empty DB.

sh-4.2$ certutil -L -d sql:/etc/pki/nssdb

Certificate Nickname                                         Trust Attributes

Deleting the ES pod and allowing it to create resolves the issue as the old crt seems to be purged, but because the playbook failed, everything else in the logging stack is out of sync.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:

1) Deleted the certs from the masters

2) Reran logging playbook

3) CA and Certs were recreated - the CA looks fine - serial 1

4) The new cert for ES has the same issuer CN=logging-signer-test - this is expected

5) The new cert for ES has a Serial 5 which we assume is the same Serial as the old certificate and thus fails

Actual results:

Expected results:
Playbook fails and looking stack is broken

Additional info:

Comment 11 Dirk Porter 2019-11-27 15:32:00 UTC

I have a customer with this issue in 3.9. Is there a workaround that was discovered for this? 


Dirk Porter

Note You need to log in before you can comment on or make changes to this bug.