Bug 1131079

Summary: TLS client gets SSL_ERROR_BAD_MAC_ALERT in FIPS mode if NSS is initialized twice (without/with certdir)
Product: Red Hat Enterprise Linux 7 Reporter: Alicja Kario <hkario>
Component: nss-softoknAssignee: Elio Maldonado Batiz <emaldona>
Status: CLOSED ERRATA QA Contact: Alicja Kario <hkario>
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: arubin, emaldona, hkario, james.antill, jantill, jenifer.golmitz, jlyle, kdudka, kengert, ksrot, moorereason, omoris, ovasik, pmatilai, rrelyea, tmraz, vmukhame
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nss-softokn-3.16.2-12.el7 Doc Type: Bug Fix
Doc Text:
Cause: Softoken did not check the mechanism for user tokens correctly. Consequence: When both client and server worked in FIPS mode, yum was unable to connect to openssl based servers. The server would report "decryption failed or bad record mac" error message. Fix: softoken now allows FIPS user slots to have the full list of mechanisms just like the main slot. Result: yum is now able to connect to openssl based servers.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 08:28:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 717789, 839624    
Attachments:
Description Flags
Patch to allow FIPS user slots to have the full list of mechanisms just like the main slot.
none
V2: Patch to allow FIPS user slots to have the full list of mechanisms just like the main slot. none

Description Alicja Kario 2014-08-18 12:58:38 UTC
Description of problem:
When both client and server are working in FIPS mode, yum is unable to connect to openssl based servers. The server reports "decryption failed or bad record mac" error message.

Version-Release number of selected component (if applicable):
nss-3.16.2-2.el7_0.x86_64
openssl-1.0.1e-34.el7_0.4.x86_64
yum-3.4.3-118.el7.noarch
curl-7.29.0-19.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Set system to FIPS mode
2. Setup a repository, make openssl serve it
3. Connect to repo using yum

Actual results:
Checksum type 'md5' disabled
Loaded plugins: product-id, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
https://example.com:443/httpstestrepo1/repodata/repomd.xml: [Errno 14] curl#35 - "SSL peer reports incorrect Message Authentication Code."

Expected results:
Repository working

Additional info:
Component is set to curl temporarily until the root issue is found.

Using curl binary does not cause the bug to be reproducible, in detail,  (py)curl works but urlgrabber does not.

Marking as regression, since the issue is not reproducible on RHEL 6.

Comment 1 Kamil Dudka 2014-08-18 13:29:58 UTC
Forwarding the previous e-mail communication here...

On Thursday, August 14, 2014 12:13:47 Hubert Kario wrote:
> When openssl server is configured to require client certificates
> in fips mode, yum is unable to download data as server breaks connection
> and reports bad mac alert:
> 139738160654240:error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac:s3_pkt.c:484: 139738160654240:error:14076129:SSL
> routines:SSL23_GET_CLIENT_HELLO:only tls allowed in fips mode:s23_srvr.c:428:

On Monday, August 18, 2014 14:26:01 Kamil Dudka wrote:
> The issue with wrong SSL version being used is only a consequence of the
> first error.  If there is no particular SSL version set, libcurl tries to
> connect using TLSv1, and if the handshake fails with certain error codes,
> it tries to connect using SSLv3.
> 
> The problem seems to be related to a particular cipher-suite(s).  If I
> insert self.curl_obj.setopt(pycurl.SSL_CIPHER_LIST, "rsa_aes_256_sha") in
> urlgrabber code, the test passes just fine.
> 
> I have no idea why it works with (py)curl but not with urlgrabber.  Is there
> an easy way to check what cipher-suite is being negotiated by NSS during
> SSL handshake?

Comment 2 Kamil Dudka 2014-08-18 14:24:58 UTC
I have written a minimal example using python-urlgrabber only:

import urlgrabber

g = urlgrabber.grabber.URLGrabber()
g.opts.ssl_ca_cert = "./myca1/cert1.pem"
g.opts.ssl_cert = "./client1/cert1.pem"
g.opts.ssl_key = "./client1/key1.pem"
g.urlgrab("https://pes-guest-66.lab.eng.brq.redhat.com:443/httpstestrepo1/repodata/repomd.xml")

... and it does not trigger the failure that yum triggers.

I am switching the component to yum until we have a yum-independent reproducer...

Comment 4 James Antill 2014-09-29 14:43:00 UTC
(In reply to Kamil Dudka from comment #2)
> I have written a minimal example using python-urlgrabber only:

 Add:

import rpm

...to the top of your example, and try again.

Comment 6 Panu Matilainen 2014-10-02 13:22:15 UTC
(In reply to James Antill from comment #4)
> (In reply to Kamil Dudka from comment #2)
> > I have written a minimal example using python-urlgrabber only:
> 
>  Add:
> 
> import rpm
> 
> ...to the top of your example, and try again.

Curious, I would've sort of expected this in older versions, but rpm in rhel-7 uses a private NSS context so as to avoid clashing with other NSS users in the same process (see bug 871485).

Comment 7 Kamil Dudka 2014-10-07 13:57:57 UTC
Sorry for the delay, I came back from vacation today.

This seems to be difficult to debug.  So far I discovered two facts:

1. If I change urlgrabber to force the rsa_aes_256_sha cipher suite, it works.

2. If I change libcurl to initialize NSS with NSS_INIT_NOCERTDB | NSS_INIT_NOMODDB flags, it works.

Comment 8 Kamil Dudka 2014-10-08 11:55:42 UTC
This seems to be caused by incorrect initialization of nss-softokn.  RPM first initializes NSS with no certificate database.  libcurl then initializes NSS with certdir equal to "sql:/etc/pki/nssdb", which breaks some internal assumptions of NSS.  The fact that it results in SSL_ERROR_BAD_MAC_ALERT suggests there is insufficient error handling elsewhere in the code, which makes it difficult to debug.  The following patch gets the testing program running:

--- a/nss/lib/softoken/fipstokn.c
+++ b/nss/lib/softoken/fipstokn.c
@@ -440,49 +440,51 @@ void fc_log_init_error(CK_RV crv) {
 
 /* FC_Initialize initializes the PKCS #11 library. */
 CK_RV FC_Initialize(CK_VOID_PTR pReserved) {
     const char *envp;
     CK_RV crv;
 
     if ((envp = PR_GetEnv("NSS_ENABLE_AUDIT")) != NULL) {
        sftk_audit_enabled = (atoi(envp) == 1);
     }
 
     /* At this point we should have already done post and integrity checks.
      * if we haven't, it probably means the FIPS product has not been installed
      * or the tests failed. Don't let an application try to enter FIPS mode */
     crv = sftk_FIPSEntryOK();
     if (crv != CKR_OK) {
        sftk_fatalError = PR_TRUE;
        fc_log_init_error(crv);
        return crv;
     }
 
 
     sftk_ForkReset(pReserved, &crv);
 
+#if 0
     if (nsf_init) {
        return CKR_CRYPTOKI_ALREADY_INITIALIZED;
     }
+#endif
 
     crv = nsc_CommonInitialize(pReserved, PR_TRUE);
 
     /* not an 'else' rv can be set by either SFTK_LowInit or SFTK_SlotInit*/
     if (crv != CKR_OK) {
        sftk_fatalError = PR_TRUE;
        return crv;
     }
 
     sftk_fatalError = PR_FALSE; /* any error has been reset */
     nsf_init = PR_TRUE;
     isLevel2 = PR_TRUE; /* assume level 2 unless we learn otherwise */
 
     return CKR_OK;
 }
 
 /*FC_Finalize indicates that an application is done with the PKCS #11 library.*/
 CK_RV FC_Finalize (CK_VOID_PTR pReserved) {
    CK_RV crv;
 
    if (sftk_ForkReset(pReserved, &crv)) {
        return crv;
    }

I am switching the component to nss-softokn...

Comment 9 Bob Relyea 2014-11-04 23:07:13 UTC

The given patch clearly isn't going to fly, ignorning nsf_init value will likely stomp on all sorts of internal state. The result of returning return CKR_CRYPTOKI_ALREADY_INITIALIZED is that we'll create a second slot FIPS slot (rather than stomping on the first). SSL_ERROR_BAD_MAC_ALERT  is actually exactly the error I would expect if there is anything going wrong with key handling in the SSL engine. Various forms of attack requires the SSL engine to defer all error handling to the mac stage so that it never becomes an oracle to attack the server private key or data encrypted with the server private key.

Most likely something has gone wrong with the softoken mechanisms, and we aren't providing all the proper mechanisms in FIPS mode. We probably hit a place where the SSL engine needs to move a key to complete the handshake and can't because it's in a FIPS token.

Comment 10 Bob Relyea 2014-11-04 23:12:32 UTC
Created attachment 953870 [details]
Patch to allow FIPS user slots to have the full list of mechanisms just like the main slot.

Comment 12 Bob Relyea 2014-11-04 23:40:38 UTC
Created attachment 953874 [details]
V2: Patch to allow FIPS user slots to have the full list of mechanisms just like the main slot.

OK, this is the real patch, not a broken empty patch.

Comment 14 Kamil Dudka 2014-11-05 08:56:39 UTC
(In reply to Bob Relyea from comment #9)
> The given patch clearly isn't going to fly, ignorning nsf_init value will
> likely stomp on all sorts of internal state.

Sure.  The patch was not meant to fix it, only to better describe the problem.

> The result of returning return
> CKR_CRYPTOKI_ALREADY_INITIALIZED is that we'll create a second slot FIPS
> slot (rather than stomping on the first). SSL_ERROR_BAD_MAC_ALERT  is
> actually exactly the error I would expect if there is anything going wrong
> with key handling in the SSL engine.

The bad mac alert comes from server while the failure happened on client.

(In reply to Bob Relyea from comment #12)
> Created attachment 953874 [details]
> V2: Patch to allow FIPS user slots to have the full list of mechanisms just
> like the main slot.

The above patch fixes the problem for me.

Comment 15 Bob Relyea 2014-11-05 19:16:54 UTC
> Sure.  The patch was not meant to fix it, only to better describe the problem.

it did help, thanks.

> The bad mac alert comes from server while the failure happened on client.

That's where I would have expected the alert. The server is the non-oracle device.

> The above patch fixes the problem for me.

Thanks, verification was the one step I couldn't do.

Now pm and blocker and we're set.

bob

Comment 21 Cameron Moore 2014-11-27 01:07:48 UTC
For posterity's sake, I just wanted to report that I ran across this issue while running the official RHEL7 AMI in FIPS mode on Amazon AWS EC2 using the official RH yum repos.

The rh-amazon-rhui-client package uses urlgrabber.  I modified /usr/lib/yum-plugins/rhui-lb.py to print a useful error message.  Here's what I was seeing:

Checksum type 'md5' disabled
Loaded plugins: amazon-id, rhui-lb
Error: curl#35 - "SSL peer reports incorrect Message Authentication Code."
Could not contact CDS load balancer rhui2-cds01.us-west-2.aws.ce.redhat.com, trying others.
Error: curl#35 - "SSL peer reports incorrect Message Authentication Code."
Could not contact CDS load balancer rhui2-cds02.us-west-2.aws.ce.redhat.com, trying others.
Could not contact any CDS load balancers: rhui2-cds01.us-west-2.aws.ce.redhat.com, rhui2-cds02.us-west-2.aws.ce.redhat.com.

Comment 23 errata-xmlrpc 2015-03-05 08:28:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0364.html