Bug 1888958
Summary: | certificates are regenerated only when crt is missing. | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | German Parente <gparente> | |
Component: | Logging | Assignee: | Jeff Cantrill <jcantril> | |
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.5 | CC: | aos-bugs, dahernan, jcantril, jeder, ocasalsa, periklis, qitang, rrackow, rsandu, syedriko | |
Target Milestone: | --- | Keywords: | ServiceDeliveryImpact | |
Target Release: | 4.7.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | logging-core | |||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1895607 (view as bug list) | Environment: | ||
Last Closed: | 2021-02-24 11:21:19 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1895607 |
Description
German Parente
2020-10-16 15:12:39 UTC
This may be resolved in 4.7 with https://issues.redhat.com/browse/LOG-422 Hello, I was commenting before with @German that I was able to reproduce the issue. The files in the openshift logging operator were like this: ~~~ sh-4.2$ ls -lrt total 80 -rw-r--r--. 1 1000600000 root 3 Oct 15 08:39 ca.serial.txt -rw-r--r--. 1 1000600000 root 0 Oct 16 15:14 ca.db -rw-r--r--. 1 1000600000 root 1935 Oct 16 15:48 system.logging.fluentd.crt_backup -rw-r--r--. 1 1000600000 root 1935 Oct 16 15:48 system.logging.curator.crt -rw-r--r--. 1 1000600000 root 0 Oct 16 15:48 system.logging.fluentd.key -rw-r--r--. 1 1000600000 root 3272 Oct 16 15:48 system.logging.curator.key -rw-r--r--. 1 1000600000 root 1935 Oct 16 15:48 system.logging.fluentd.crt -rw-r--r--. 1 1000600000 root 3272 Oct 16 15:48 ca.key -rw-r--r--. 1 1000600000 root 3272 Oct 16 15:48 elasticsearch.key -rw-r--r--. 1 1000600000 root 2411 Oct 16 15:48 elasticsearch.crt -rw-r--r--. 1 1000600000 root 3272 Oct 16 15:48 logging-es.key -rw-r--r--. 1 1000600000 root 2171 Oct 16 15:48 logging-es.crt -rw-r--r--. 1 1000600000 root 3272 Oct 16 15:48 system.admin.key -rw-r--r--. 1 1000600000 root 1923 Oct 16 15:48 system.admin.crt -rw-r--r--. 1 1000600000 root 1850 Oct 16 15:48 ca.crt -rw-r--r--. 1 1000600000 root 3272 Oct 16 15:48 system.logging.kibana.key -rw-r--r--. 1 1000600000 root 1935 Oct 16 15:48 system.logging.kibana.crt -rw-r--r--. 1 1000600000 root 3272 Oct 16 15:48 kibana-internal.key -rw-r--r--. 1 1000600000 root 32 Oct 16 15:48 kibana-session-secret -rw-r--r--. 1 1000600000 root 1956 Oct 16 15:48 kibana-internal.crt -rw-r--r--. 1 1000600000 root 4256 Oct 16 15:48 signing.conf ~~~ Then, the ca.db and system.logging.fluentd.key were empty. Like the ca.db was empty, the system.logging.fluentd.key was not regenerated. I have tried the next on my own lab: ~~~ $ oc -n openshift-logging rsh <operator pod> $ cd /tmp/ocp-clo/ $ echo "" > system.logging.fluentd.key ~~~ Doing this, always the system.logging.fluentd.key is generated again with the secret and everything comes back to work. The next step was to try to have a similar configuration to the described where the issue happened getting a ca.db empty: ~~~ $ oc -n openshift-logging rsh <operator pod> $ cd /tmp/ocp-clo/ ### Delete content from ca.db $ echo "" > ca.db ### Delete content from system.logging.fluentd.key $ echo "" > system.logging.fluentd.key ~~~ In this case, as it's expected, the system.logging.fluentd.key is never regenerated. The next step, was to fill again the ca.db with the content (I knew it, but usually, you are not going to know it), and the system.logging.fluentd.key was generated again. Then, reviewing the code a possible workaround is to force the regeneration of the key. ~~~ if [ $REGENERATE_NEEDED = 1 ] || [ ! -f ${WORKING_DIR}/${component}.crt ] || ! openssl x509 -checkend 0 -noout -in ${WORKING_DIR}/${component}.crt; ~~~ In this case, as it is checking if exists "${WORKING_DIR}/${component}.crt", we opted to delete the system.logging.fluentd.crt and it regenerated the system.logging.fluentd.crt. Then, I don't know how it was possible to get the ca.db and the system.logging.fluentd.key files empty: - Do you have any idea @Jeff about how this situation can be possible? Thanks in advance, Oscar [1] https://github.com/openshift/cluster-logging-operator/blob/6f5ed87809a34a678637ee219e7b4d91cfb979a4/scripts/cert_generation.sh#L210 Hello, I was not able to see a Bug created for being backported this to OCP 4.5. Do we have one? Should I create a new one for it? Regards, Oscar (In reply to Oscar Casal Sanchez from comment #11) > Hello, > > I was not able to see a Bug created for being backported this to OCP 4.5. Do > we have one? Should I create a new one for it? > > Regards, > Oscar @Oscar Casal Sanchez I suggest some patience here. The Backport-BZ are created when they parent BZ is VERIFIED. The openshift bots will create them automatically if, e.g. the 4.6 PR is marked with "/cherry-pick release-4.5". Verified on clusterlogging.4.7.0-202011071430.p0. 1. Set clusterlogging/instance to Unmanaged. 2. Set fluentd.key to blank oc edit secret fluentd -n logging ->data.tls.keys: "" 3. Delete one fluentd pod. The new pod will be in error or Init:CrashLoopBackOff. The /etc/fluent/keys/tls.key in fluentd pods is blank. Note: The other fluentd pods are in Running status, although the /etc/fluent/keys/tls.key is blank. The reason is the fluentds are using old secrets until the process was restarted to load the new one. 4. Set clusterlogging/instance back to Managed. and wait for a while 5. Check the fluentd pods status. All fluentd pods are running. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0652 |