Bug 2058661

Summary: Since FF94 if a CA has no subject we get "improperly formatted DER-encoded message"
Product: [Fedora] Fedora Reporter: Michele Baldessari <michele>
Component: nssAssignee: Bob Relyea <rrelyea>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 35CC: crypto-team, elio.maldonado.batiz, erack, gecko-bugs-nobody, jeckersb, jfischer, jgato, jhorak, kai-engert-fedora, kdudka, klaas, pe.antonov, pjasicek, rhughes, rrelyea, rstrode, sandmann, stransky, tbielawa, zeguan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-09 09:19:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michele Baldessari 2022-02-25 14:45:06 UTC
Preamble:
I fully realize that it's very likely that this is part server configuration issue and/or part NSS, but I am a bit at loss as to where the exact problem lies and I'd love if you guys could throw some cluebones my way :)

Description of problem:
Starting with Firefox 94, opening the http url of an openshift operator called "openshift-gitops" inside an OCP cluster returns "Error code: SEC_ERROR_BAD_DER."

Chrome works without issues and so does Firefox 93 and curl. openssl seems to be fine with it (barring the unknown CA), so "openssl s_client -connect openshift-gitops-server-openshift-gitops.apps.bandini-dc.blueprints.rhecoeng.com:443" complains about the verification bits but nothing close to the DER-encoded message issue.

I sort of bisected the breakage between nightlies to have happened here:
* 94.0a1 (https://ftp.mozilla.org/pub/firefox/nightly/2021/09/2021-09-30-14-38-54-mozilla-central/firefox-94.0a1.en-US.linux-x86_64.tar.bz2) -> WORKS
* 94.0a1 (https://ftp.mozilla.org/pub/firefox/nightly/2021/10/2021-10-04-09-53-42-mozilla-central/firefox-94.0a1.en-US.linux-x86_64.tar.bz2) -> BROKEN

So the change must have happened (in the nightlies) between 2021-09-30 and 2021-10-04. I cloned the mercurial mozilla-central tree and since the nightly folders above contain the hg revisions. I went ahead and tried my luck with: hg log -r "85b7b3d04d9f249418325f02d4b93046724b496b:06e67beeafc265ff1aef7d033706a67d91ef0186"

The only change that sort of made sense was:
changeset:   593811:96a66f46f801
user:        Benjamin Beurdouche <bbeurdouche>
date:        Thu Sep 30 12:57:34 2021 +0000
summary:     Bug 1729163 - land NSS NSS_3_71_RTM UPGRADE_NSS_RELEASE, r=djackson DONTBUILD

Looking at the bug I see this changeset: https://hg.mozilla.org/mozilla-central/rev/330c22fc463e which seems potentially related so, as suspected, likely more of an NSS thing.

After some googling here and there I ran firefox with the following:
export NSPR_LOG_MODULES="nsHttp:3,nsHostResolver:3,pkix:5,nss_mod_log:4" NSS_STRICT_SHUTDOWN="x" NSS_OUTPUT_FILE=/tmp/tls.log NSPR_LOG_FILE=/tmp/firefox.log  SSLTRACE=127

The firefox log (which does not really have much in terms of ssl errors/warnings) is here: https://acksyn.org/files/tls-der-issue/firefox.log.moz_log

The cert pem file itself is here: https://acksyn.org/files/tls-der-issue/gitops.pem
The corresponding CA is here: https://acksyn.org/files/tls-der-issue/ca.pem

I sniffed the traffic for this specific failure here:
https://acksyn.org/files/tls-der-issue/filtered.pcap

Note that the cert is not only signed by a private ephemeral CA, its dns/cname entries are also not matching the FQDN:
$ openssl x509 -in gitops.pem -text |grep -i -e dns -e cname
DNS:openshift-gitops, DNS:openshift-gitops-grpc, DNS:openshift-gitops.openshift-gitops.svc.cluster.local

I also tried a couple of ssl checker websites https://www.sslshopper.com/ssl-checker.htm and https://www.digicert.com/help/ and barring the CA thing, they were okay with the cert. Interestingly SSLLabs keeps having internal server errors on my url.

To check the NSS parts I tried the following:
certutil -N -d /tmp/ssl/db
certutil -d /tmp/ssl/db -A -t "C,," -n ocp-ca -i ca.pem
certutil -d /tmp/ssl/db -A -t ",," -n gitops -i gitops.pem

I see that "certutil -d /tmp/ssl/db -L -n gitops" spits out the cert just fine. Maybe the problem here is that the Issuer is empty?:
$ certutil -d /tmp/ssl/db -L -n gitops |head -n10
Certificate:
  ¦ Data:
  ¦ ¦ ¦ Version: 3 (0x2)
  ¦ ¦ ¦ Serial Number:
  ¦ ¦ ¦ ¦ ¦ 6f:e2:33:d6:c0:54:e0:d4
  ¦ ¦ ¦ Signature Algorithm: PKCS #1 SHA-256 With RSA Encryption
  ¦ ¦ ¦ Issuer: "(empty)"
  ¦ ¦ ¦ Validity:
  ¦ ¦ ¦ ¦ ¦ Not Before: Fri Feb 25 09:07:04 2022
  ¦ ¦ ¦ ¦ ¦ Not After : Sat Feb 25 09:07:05 2023

Interestingly the CA verification (If I got the command right) seems to work:
$ certutil -d /tmp/ssl/db -V -n gitops -u Y
certutil: certificate is valid


The expectation is though that we get prompted if we're okay with this self-signed/internal cert, just like FF93 did and chrome still does.

Any tips for me to debug this further? What's the right incantation to get firefox to log all tls/pki bits to a file? Or should I move this to NSS component and try my luck there?

Comment 2 Martin Stransky 2022-02-28 08:37:09 UTC
Bob, any idea here? Moving to NSS to get attention of NSS crew, please move back if that's a Firefox bug. 
Thanks.

Comment 3 Bob Relyea 2022-03-02 16:49:18 UTC
In NSS there are multiple cert validators: 1) the classic validator, 2) libpkix, 3) Firefox pkix. certutil uses the classic validator, and can be told to use libpkix. Firefox uses the firefox pkix validator, which has not been integrated into NSS proper. It lives in the NSS tree, but doesn't have it's own shared library. As part of the NSS release, we don't build it, but it would land into mozilla with a new release of NSS.

I think a good test is: run the fedora Firefox 93 with NSS 3.72 installed. If the problems are still there, then it's likely some change in NSS that only the Firefox validator sees. If goes away, then it's probably in the Firefox validator (which would be in the firefox package in our release).

bob

Comment 5 Bob Relyea 2022-03-02 17:26:03 UTC
Please try the experiment outlined in comment 3. Thanks.

Comment 6 Bob Relyea 2022-03-02 17:28:42 UTC
Oh, wait, the issue was introduced in 3.71, so the patch I put up was irrelevant. The test is still valid, though. If you can run Firefox 93 with NSS 3.72 without a problem, then we know it's in mozpkix.

Comment 7 jfischer 2022-03-04 10:08:33 UTC
The root cause for this is most likely, that GitOps Operator creates an own CA certificate to sign other TLS certificates it creates. However, it doesn't set a Subject for the CA certificate. This is most likely what Firefox doesn't like about the certificate presented by GitOps Operator.

Once the CA certificate has a subject set, Firefox happily accepts the TLS certificate of GitOps Operator.

Tested with Firefox 97.0.1 (64-bit)

Upstream PR https://github.com/argoproj-labs/argocd-operator/pull/582

Comment 8 Michele Baldessari 2022-03-09 09:19:55 UTC
Thanks everyone. I changed the subject to something more searchable in case someone else hits this. While it'd be nice if the error from firefox were a bit more informative, this change in behavior w/ FF94 is not really a bug, so I'll close this one out.

Thanks again!

Comment 9 Jose Gato 2022-05-17 10:35:56 UTC
Hi,
I am having the same issue with Fedora 35 and Firefox 100. What is the solution? By the moment I patching ArgoCD with:

oc patch argocd -n openshift-gitops openshift-gitops --type='merge' -p '{"spec":{"server":{"route":{"tls":{"termination":"reencrypt"}}}}}'

Comment 10 jfischer 2022-05-17 10:52:47 UTC
@jgato The CA creation has been fixed in OpenShift GitOps 1.5.0. For new installs and instances, it should work ootb now. If you upgraded, you need to

* first delete the openshift-gitops-ca secret and let the Operator recreate it to establish a new CA,
* then delete the openshift-gitops-tls secret and let the Operator recreate it to create a new cert, signed by the new CA.

HTH.

Comment 11 Jose Gato 2022-05-17 11:39:43 UTC
do you know if this will be also solved on previous versions like 1.4.x?

Comment 12 jfischer 2022-05-17 12:01:10 UTC
No, sorry, we haven't backported this fix. I think the suggestion would be to either use reencrypt (as you already do), or to provide your own CA keypair in openshift-gitops-tls. If you choose the latter, openshift-gitops-tls secret has to be deleted/re-created as well.