Bug 1249426

Summary: curl crashes with "Illegal instruction (core dumped)" at intel_aes_gcmINIT()
Product: Red Hat Enterprise Linux 6 Reporter: Yoshifumi Kinoshita <ykinoshi>
Component: nss-softoknAssignee: nss-nspr-maint <nss-nspr-maint>
Status: CLOSED DUPLICATE QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: high Docs Contact:
Priority: high    
Version: 6.7CC: asanders, carl, cww, emaldona, jboutaud, john.haxby, jwright, kdudka, kengert, kevin, me, mkolbas, nkinder, pasteur, pwouters, redhat-e27, rrelyea, salmy, tis, tmraz, wburrows
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-28 14:55:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1269194    
Attachments:
Description Flags
CPU flag test
none
Suggested Fix none

Comment 2 Kamil Dudka 2015-08-03 08:34:36 UTC
The crypto algorithm in question is implemented by nss-softokn, so I am changing the component such.  Please attach the contents of /proc/cpuinfo from the machine where it crashes.

Comment 4 Kamil Dudka 2015-08-03 16:27:58 UTC
The code in aes_InitContext() mistakenly detected the 'avx' flag on a CPU that does not implement it.  The detection needs to be improved such that it does not enable accelerated GCM in this case.

Please try to use the following command as a workaround:

export NSS_DISABLE_HW_GCM=1

Comment 5 Yoshifumi Kinoshita 2015-08-03 16:48:56 UTC
The user verified the workaround works on their system.

Comment 16 Joe Wright 2016-01-26 15:01:15 UTC
Is there any way at all to fix this without breaking FIPS?

Comment 17 David Zambonini 2016-02-02 11:13:31 UTC
I've experienced this issue in a paravirtualised environment (Virtuozzo) running on a Haswell processor, which most certainly does have AVX2 support. Although this is a non-standard kernel/environment, you may want to test against this to determine whether this is also the problem facing the stock OS.

# curl https://google.com
Illegal instruction

# NSS_DISABLE_HW_GCM=1 curl https://google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
etc.

The program terminates with:

SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x2ae0fd11fd60}

So (in my case) the problem was not in the instruction issued, but with the operand.

A trace reveals:

#0  intel_aes_gcmINIT () at intel-gcm.s:71
71          vmovdqu      16*0(KS), T

Where KS is %rsi and T is %xmm0
%rsi was aligned (0x15966600)

Therefore my guess would be that the problem is not with issuing AVX instructions, but failing to detect whether extended processor state is being saved or not.

A quick rough and ready test appears to verify this (in my case at least, cpuid.c):

Failed environment (EL6, but on older kernel):
# ./cpuid
%eax    = 0x000306f2
%ebx    = 0x02100800
%ecx    = 0x77fefbff
%edx    = 0xbfebfbff
aes     = 1
clmul   = 1
avx     = 1
xsave   = 1
osxsave = 0

Successful environment (EL6, identical processor):
# ./cpuid
%eax    = 0x000306f2
%ebx    = 0x13100800
%ecx    = 0x7ffefbff
%edx    = 0xbfebfbff
aes     = 1
clmul   = 1
avx     = 1
xsave   = 1
osxsave = 1

I'd suggest testing against both xsave and osxsave in nss-softokn (patch supplied) and see if this solves the problem.

Comment 18 David Zambonini 2016-02-02 11:14:46 UTC
Created attachment 1120375 [details]
CPU flag test

Simple rough and ready test based on freebl tests.

Comment 19 David Zambonini 2016-02-02 11:15:37 UTC
Created attachment 1120376 [details]
Suggested Fix

"Works for me" fix by adding XSAVE and OS XSAVE tests before enabling GCM.

Comment 20 Elio Maldonado Batiz 2016-02-02 23:27:22 UTC
I'm afraid the fix would break FIPS because it patches nss-softokn-3.14.3/mozilla/security/nss/lib/freebl/rijndael.c which is inside the crypto boundary.

Comment 21 David Zambonini 2016-02-03 01:06:12 UTC
Sorry, I blithely walked into that and stated the obvious without considering FIPS. While it isn't a consideration for me, I understand why this presents a problem for you. 

If /security/nss/lib/freebl/mpi/mpcpucache* is outside of the FIPS boundary, then altering the result of freebl_cpuid by effectively ANDing the AVX bit in %ecx/%rcx against the XSAVE and OSXSAVE bits before returning (sorry, my assembly is a little rusty), something like:

bt %ecx, 26 ; test for xsave support
jnc failavx
bt %ecx, 27 ; test for osxsave support
jc out
failavx:
btr %ecx, 28 ; clear avx support
out:

would appear to work while not giving any side-effects elsewhere in the code. If it's inside the boundary, then I can't see any way forward, at least for correct detection while maintaining FIPS validation.

Comment 22 Elio Maldonado Batiz 2016-02-03 15:10:57 UTC
(In reply to David Zambonini from comment #21)
We may be able to use the original proposed patch as long as the customer is not concerned with preserving FIPS validation and can to upgrade beyond nss-softokn-3.14.3-22.el6_6 which is the one to be validated one. I'll have more the say later.

Comment 27 Kevin Stange 2016-06-09 17:27:15 UTC
We're starting to see this issue now that 6.8 has released with NSS 3.21, which includes a GCM cipher suite.  Prior, with NSS 3.19, the issue did not occur for us.

It appears that NSS fixed this issue in 3.15.4 (which is after the 3.14 version shipped for softokn-freebl), using a patch similar to David's.

https://bugzilla.mozilla.org/show_bug.cgi?id=940794
https://hg.mozilla.org/projects/nss/rev/edda2ba82d22

I'm guessing that 7.2 doesn't have this issue because softokn-freebl is 3.16.

What's the "correct" way to fix this?  The environmental variable workaround is not ideal because it's hard to make sure it's set for every process.  Maintaining our own softokn-freebl package is not really ideal either.

Comment 28 Kai Engert (:kaie) (inactive account) 2016-06-28 14:55:16 UTC
According to Bob Relyea, this is a duplicate of bug 1335280, a fix should become available with nss-softokn-3.14.3-23.3.el6_8

*** This bug has been marked as a duplicate of bug 1335280 ***