Bug 1961811

Summary: Creating a configmap for a CA without a trailing newline in source file results in non-working CA verification
Product: OpenShift Container Platform Reporter: Mike Allmen <mallmen>
Component: NetworkingAssignee: Surya Seetharaman <surya>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: low CC: aconstan, ddelcian, lmohanty, mallmen, mrobson, operations, steven.barre, surya, tas, wking
Version: 4.6.z   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:31:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1985308    
Attachments:
Description Flags
test showing buggy concatenation vs. AppendCertsFromPEM none

Description Mike Allmen 2021-05-18 18:20:07 UTC
User-Agent:       Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0
Build Identifier: 

A missing newline at the end of the certificate file when creating a ConfigMap
for a custom root CA causes issues with verifying certificates in the cluster signed by that CA.

In my case, using the Update Service operator, after installing and configuring an update service, the CVO failed to verify the certificate causing a failure to determine correct update status for the cluster.

Reproducible: Always

Steps to Reproduce:
1.  install cluster

2.  configure custom ConfigMap for root CA (with no newline at the end of the certificate) per https://docs.openshift.com/container-platform/4.6/security/certificates/replacing-default-ingress-certificate.html, step 1

$ cat root-ca.crt
-----BEGIN CERTIFICATE-----
<snip>
-----END CERTIFICATE-----$    <== note, the command prompt is not on the next line

3.  configure proxy object to use new CA per https://docs.openshift.com/container-platform/4.6/security/certificates/replacing-default-ingress-certificate.html, step2

4. (optional maybe) not sure what else this may be affecting, but in my case I configured the Update Service operator at this point which resulted in the error I saw
Actual Results:  
ClusterVersion fails to reconcile cluster upgrade status using the custom service/route configured through Update Service

message: 'Unable to retrieve available updates: Get "https://service-policy-engine-route-openshift-update-service.apps.ocp4.example.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.6&id=b821e26d-0cb0-4879-a4d5-16f2e9af916c&version=4.6.23": x509: certificate signed by unknown authority'

Expected Results:  
The ClusterVersion properly determines cluster upgrade status

$ oc adm upgrade
Cluster version is 4.6.23

Updates:

VERSION IMAGE
4.6.25  registry.ocp4.example.com:8443/ocp-release/release-images@sha256:7f26b56dc31547a26ce1f67eeb59ecee92dc07f3622e203c51e39fd6d7bcc930

This was performed in a disconnected cluster while testing the new Upgrade Service operator

Comment 2 Mike Allmen 2021-05-18 21:11:12 UTC
I had the same issue a while back https://chat.google.com/room/AAAA2bt6nL0/So41yxed4io
In this case I had created a chain cert for ingress, router pods were not starting, and adding a newline to the chain cert fixed that issue too.  Just noting another case whether the behavior was observed.

Comment 3 Alexander Constantinescu 2021-05-19 14:57:04 UTC
I don't __think__ this is a valid bug. I'm not sure if a certificate without a missing new line is a valid certificate, it certainly isn't for ssh keys, see below:

$ cat id_rsa 
...
...
...
48hFyUfuu+2HZmZHgFf9YximqauW5V6G64OHVscU592WS6ScysZ687f454pVUjyrXmmTpp
2PzbGnTJl6ujR79LU1kgi902nCwl/EXGqQQVJW95elhdDjOOW3n0QobDDLSFRwqsOEUexa
k94F1WYlcQB4WdGniqiyO26IQEhSnMyzhnyPeG/0R/hBurt5rlxGtDoDnEwWa3HhTvpc8S
5QDJiY7W9ywWm2LZI2NxVQ6rIZxOgOyzzdSwis1xcdu47Wte
-----END OPENSSH PRIVATE KEY-----aconstan@work ~ $
$ ssh-keygen -y -f id_rsa
Load key "id_rsa": invalid format

I am assigning this to Daneyon who wrote the proxy/certificate rotation code in the CNO. He might have more insight.

Comment 4 W. Trevor King 2021-05-19 19:51:10 UTC
This is definitely a bug.  From the must-gather:

$ yaml2json - <cluster-scoped-resources/config.openshift.io/proxies/cluster.yaml | jq -r .spec.trustedCA.name
root-ca-ocp4-example-com
$ yaml2json - <namespaces/openshift-config/core/configmaps.yaml | jq '.items[] | select(.metadata.name == "root-ca-ocp4-example-com").data["ca-bundle.crt"]'
"-----BEGIN CERTIFICATE-----...-----END CERTIFICATE-----"
$ yaml2json - <namespaces/openshift-config-managed/core/configmaps.yaml | jq -r '.items[] | select(.metadata.name == "trusted-ca-bundle").data["ca-bundle.crt"]'
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----# ACCVRAIZ1
-----BEGIN CERTIFICATE-----
...

The network operator should not be writing that bogus content to the managed ConfigMap.  To close this out, I'm fine with either:

a. Network operator complains about the unparsed certificates, or lack of newline, or whatever... wherever it usually complains about invalid content.  Network operator does not update the managed ConfigMap.  Consumers of the managed ConfigMap continue to successfully use the previous, valid CAs.
b. Network operator realizes that a newline would recover the user-configured certificate, and adds one automatically.  Network operator updates the managed ConfigMap with valid content.  Consumers of the managed ConfigMap begin to successfully use the new, valid CAs.

Comment 5 Surya Seetharaman 2021-06-15 11:59:54 UTC
well, the CNO seems to be using https://github.com/openshift/cluster-network-operator/blob/1e43785913eaad403e15c96c2589ed9fb83b22a9/pkg/util/validation/trustbundle.go#L54 golang's https://golang.org/pkg/crypto/x509/#ParseCertificate ParseCertificate library. According to the implementation of that function, it looks like it doesn't really allow any trailing data: https://golang.org/src/crypto/x509/x509.go?s=51605:51665#L1537 which I assume also includes spaces or newlines?

Comment 6 Surya Seetharaman 2021-06-15 13:29:29 UTC
The problem is https://golang.org/src/encoding/pem/pem.go?s=2505:2553#L76 kind of cuts of all trailing things and only uses the pem block. Perhaps we should add back the newline?

Comment 7 W. Trevor King 2021-06-15 16:46:40 UTC
Because ParseCertificates doesn't give convenient access to any unparsed remainder, I think the easiest implementations of my two options from comment 4 are:

a. Network operator complains because the configured ConfigMap ca-bundle.crt value doesn't end in a newline.
b. Network operator notices that the configured ConfigMap ca-bundle.crt value doesn't end in a newline, and appends one before calling ParseCertificates.

Comment 8 Surya Seetharaman 2021-06-16 15:07:14 UTC
@wking: yea I am trying to do option a). So for that the problem is, even if the user gave a newline the pem.Decode would remove it (AFAICS) - we can get the remaining portion back from this function. So if its a newline that is the remainder, I'll add it back before calling ParseCertificates and if no remainder then I'll make CNO complain user did not give a newline. Still testing this.

Comment 9 W. Trevor King 2021-06-16 16:55:10 UTC
With (a), you don't even need to get as far as a pem.Decode attempt.  You can just say "hey, this string does not end in a newline" and complain without looking any further at the content.

Comment 10 Surya Seetharaman 2021-06-17 12:23:03 UTC
oh hmm yea that makes my job easier! cool I'll include that in the validation portion.

Comment 11 operations 2021-06-24 15:28:17 UTC
a certificate without new line is valid certificate, it's not a golang error,
the issue is in https://github.com/openshift/router/blob/88ecceb8953751fcde2fe2eab52a5c15e07a955e/pkg/router/routeapihelpers/validation.go#L99
this method should check if last byte is carriage return and insert it if not.

Comment 12 W. Trevor King 2021-06-25 06:55:33 UTC
> ... this method should check if last byte is carriage return and insert it if not.

Yeah, that's my option (b).

> a certificate without new line is valid certificate...

Hmm.  [1] is the X.509 v3 spec, pointing at RFC 1422 for PEM.  RFC 1422 points at RFC 1421 for the protocol [2].  RFC 1421 points at RFC 934 for encapsulation boundaries [3].  RFC 934 has a funky finite-state automaton for bursting (i.e. parsing) text which may contain encapsulation boundaries [4].  Here's my take at annotating the certificate:

  S1 - S3 - S4 ---BEGIN CERTIFICATE-----
  S5 < S2 snip>
  S1 - S3 - S4 ---END CERTIFICATE-----

So yeah, that puts you pretty clearly in the closing encapsulation boundary, regardless of newlines.  And from RFC 7468's much more familiar ABNF [5]:

  textualmsg = preeb *WSP eol
               *eolWSP
               base64text
               posteb *WSP [eol]

So that makes the fact that the trailing newline is optional super clear.  But they also give [5]:

   stricttextualmsg = preeb eol
                      strictbase64text
                      posteb eol

where the trailing newline is required.  And they say [5]:

   New implementations SHOULD emit the strict format (Figure 3)
   specified above.  The choice of parsing strategy depends on the
   context of use.

So looks like both Go and the network operator are technically free to go either way on whether they require a trailing newline or not after the closing encapsulation boundary's dashes.

[1]: https://datatracker.ietf.org/doc/html/rfc5280#section-3.1
[2]: https://datatracker.ietf.org/doc/html/rfc1422#section-1
[3]: https://datatracker.ietf.org/doc/html/rfc1421#section-4.4
[4]: https://datatracker.ietf.org/doc/html/rfc934#page-7
[5]: https://datatracker.ietf.org/doc/html/rfc7468#section-3

Comment 13 operations 2021-06-25 08:09:54 UTC
Hello, 

I investigated a little bit further, in our case we were subject to this issue because we used cert-manager together with a vault Issuer. Vault pki API seem to render the certificate chain without trailing newline. In cert-manager, it seems they decided to add a carriage return in ANY case as you can see there:
https://github.com/jetstack/cert-manager/blob/8aeb724f84f1acaf918e4dd336812c5978fee79c/pkg/internal/vault/vault.go#L367 

I'm a little bit afraid that it'll potentially create another issue if suddenly vault start to put that newline in place. (you'll potentially end up with a file containing one extra line between certificates...)

To me you should probably think the other way around, I mean, this certificate will be consumed by haproxy at the end, so, whatever you receive, the concatenated version of the certificates (and eventually private key in case the parsing occurs on the server certificate and not the ca bundle) must be valid for haproxy configuration. 

It impacts the need of checking if there is one or more trailing newline and have only one of them in any case at the end of the parsing.

Only my point of view here.

Thanks for your work.

Comment 14 Surya Seetharaman 2021-06-25 12:42:54 UTC
@operations and @wking based on the discussion here, are we sure we need this fixed in the CNO?

Here are the steps happening from CNO pov:

We first do a pem.Decode on the cert block: https://github.com/openshift/cluster-network-operator/blob/1e43785913eaad403e15c96c2589ed9fb83b22a9/pkg/util/validation/trustbundle.go#L45

That splits the cert data and rest of trailers including the newline.

We then feed the cert data alone into ParseCertificates: https://github.com/openshift/cluster-network-operator/blob/1e43785913eaad403e15c96c2589ed9fb83b22a9/pkg/util/validation/trustbundle.go#L54

We are ignoring the trailing data completely, including the newline or anything else given. So it seems useless to complain about missing newline or even add a new one after the parsing is done considering that's how the CNO works and as stated its upto the CNO/golang if it requires a newline or not right? This doesn't seem like a CNO bug to me. 

Please correct me if I am wrong.

Comment 15 W. Trevor King 2021-06-25 17:33:22 UTC
Created attachment 1794498 [details]
test showing buggy concatenation vs. AppendCertsFromPEM

Going back to comment 4, the issue is that the user-supplied data (which is missing a newline) is being blindly concatenated with the default CA pool.  See

  -----END CERTIFICATE-----# ACCVRAIZ1

in that comment.  So consumers of the network-operator-written data like the CVO which are using AppendCertsFromPEM will fail to parse the final user-supplied cert.  Testing with the attached script (and eliding some junk from the subject bytes to make the output more readable):

  $ go run test.go 
  test "%s\n%s\n"
  loaded any certificates? true
  2 subjects:
  ...Let's Encrypt...
  ...Internet Security Research Group...

  test "%s\n%s"
  loaded any certificates? true
  2 subjects:
  ...Let's Encrypt...
  ...Internet Security Research Group...

  test "%s# hash-for-the-next-key\n%s\n"
  loaded any certificates? true
  1 subjects:
  ...Internet Security Research Group...

  test "%s\n\n%s\n"
  loaded any certificates? true
  2 subjects:
  ...Let's Encrypt...
  ...Internet Security Research Group...

So as far as the CVO is concerned, we don't care if the network-operator written value has a trailing newline or not.  We will miss a user-intended CA if the network operator fails to add a newline before appending the default CAs.  We don't care if the network operator adds additional newlines before appending the default CAs.  So in the generous (b) case, where the network operator decides to follow Go and not care about a trailing newline at the end of the whole string, one possible route-(b) fix would be something like:

  $ git --no-pager diff -U1
  diff --git a/pkg/controller/proxyconfig/controller.go b/pkg/controller/proxyconfig/controller.go
  index 8ce1d1e..2b4a38c 100644
  --- a/pkg/controller/proxyconfig/controller.go
  +++ b/pkg/controller/proxyconfig/controller.go
  @@ -373,3 +373,3 @@ func (r *ReconcileProxyConfig) 
  mergeTrustBundlesToConfigMap(additionalData, syst
          combinedTrustData = append(combinedTrustData, additionalData...)
  -       combinedTrustData = append(combinedTrustData, systemData...)
  +       combinedTrustData = append(combinedTrustData, []byte("\n"), systemData...)

Comment 16 Daniel Del Ciancio 2021-07-16 13:55:46 UTC
@wking   My customer ran into the same issue on 4.7.18.  Do we have a corresponding BZ or need to clone this one?

Comment 17 W. Trevor King 2021-07-16 19:37:16 UTC
I don't think we need to clone.  This bug already sets Version: 4.6.z, so at least 4.6.z and later are impacted.  I dunno if folks have tracked down a formal first-impact time; I'd guess the controller was probably born with this issue.  This bug has Target Release set to 4.9.0, since that's where master's currently pointed.  Fixes for earlier 4.y will happen as backports, once the master/4.9 fix is landed and verified.  It's possible to clone this bug back and set earlier Target Release values if folks don't want to hear about it until the 4.y they care about is seeing some action.  But I don't expect this bug to be particularly chatty either, so just attaching here and moving back if/when backport bugs are opened seems fine too.

Surya, on the fix front, does the newline injection I floated in comment 15 sound ok to you?  If it does, I'm happy to turn it into a PR if that would be helpful.

Comment 18 Surya Seetharaman 2021-07-19 08:19:49 UTC
Hi, apologies for the delay, ideally I'd like the CNO to reject the ones without a newline, but to be fair that's an implementation requirement. So I am totally fine with going ahead with what Trevor is suggesting here. I have posted the PR and also tested it against cluster-bot.

Without the PR:

RJPkcV9aF5Oc08hytpBfeU/H8v+e2DQK258=
-----END CERTIFICATE-----# Red Hat IT Root CA
-----BEGIN CERTIFICATE-----

With the PR:

RJPkcV9aF5Oc08hytpBfeU/H8v+e2DQK258=
-----END CERTIFICATE-----
# Red Hat IT Root CA
-----BEGIN CERTIFICATE-----


I'll expedite the merging process as much as possible and do the backports.

Cheers,
Surya.

Comment 20 zhaozhanqi 2021-07-27 08:30:38 UTC
Verified this bug on 4.9.0-0.nightly-2021-07-27-043424

1. Tested with the following file 
$cat ca.crt 
-----BEGIN CERTIFICATE-----
xxxxxxx
-----END CERTIFICATE-----[zzhao$

2. oc create configmap custom-ca49 --from-file=ca-bundle.crt=/home/zzhao/bug_1961811/ca.crt -n openshift-config

3. oc patch proxy/cluster      --type=merge      --patch='{"spec":{"trustedCA":{"name":"custom-ca49"}}}'

4. oc get cm -n openshift-config-managed trusted-ca-bundle -o yaml > trusted-ca

5. $ cat trusted-ca | grep yo= -A 3
    Mn9OTfIwCuXDuvQwp5s7vZ56mpqJYKJ9lyo=
    -----END CERTIFICATE-----
    # ACCVRAIZ1
    -----BEGIN CERTIFICATE-----

Comment 23 errata-xmlrpc 2021-10-18 17:31:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759