Bug 1249426
Summary: | curl crashes with "Illegal instruction (core dumped)" at intel_aes_gcmINIT() | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Yoshifumi Kinoshita <ykinoshi> | ||||||
Component: | nss-softokn | Assignee: | nss-nspr-maint <nss-nspr-maint> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | BaseOS QE Security Team <qe-baseos-security> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 6.7 | CC: | asanders, carl, cww, emaldona, jboutaud, john.haxby, jwright, kdudka, kengert, kevin, me, mkolbas, nkinder, pasteur, pwouters, redhat-e27, rrelyea, salmy, tis, tmraz, wburrows | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-06-28 14:55:16 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1269194 | ||||||||
Attachments: |
|
Comment 2
Kamil Dudka
2015-08-03 08:34:36 UTC
The code in aes_InitContext() mistakenly detected the 'avx' flag on a CPU that does not implement it. The detection needs to be improved such that it does not enable accelerated GCM in this case. Please try to use the following command as a workaround: export NSS_DISABLE_HW_GCM=1 The user verified the workaround works on their system. Is there any way at all to fix this without breaking FIPS? I've experienced this issue in a paravirtualised environment (Virtuozzo) running on a Haswell processor, which most certainly does have AVX2 support. Although this is a non-standard kernel/environment, you may want to test against this to determine whether this is also the problem facing the stock OS. # curl https://google.com Illegal instruction # NSS_DISABLE_HW_GCM=1 curl https://google.com <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"> <TITLE>302 Moved</TITLE></HEAD><BODY> etc. The program terminates with: SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x2ae0fd11fd60} So (in my case) the problem was not in the instruction issued, but with the operand. A trace reveals: #0 intel_aes_gcmINIT () at intel-gcm.s:71 71 vmovdqu 16*0(KS), T Where KS is %rsi and T is %xmm0 %rsi was aligned (0x15966600) Therefore my guess would be that the problem is not with issuing AVX instructions, but failing to detect whether extended processor state is being saved or not. A quick rough and ready test appears to verify this (in my case at least, cpuid.c): Failed environment (EL6, but on older kernel): # ./cpuid %eax = 0x000306f2 %ebx = 0x02100800 %ecx = 0x77fefbff %edx = 0xbfebfbff aes = 1 clmul = 1 avx = 1 xsave = 1 osxsave = 0 Successful environment (EL6, identical processor): # ./cpuid %eax = 0x000306f2 %ebx = 0x13100800 %ecx = 0x7ffefbff %edx = 0xbfebfbff aes = 1 clmul = 1 avx = 1 xsave = 1 osxsave = 1 I'd suggest testing against both xsave and osxsave in nss-softokn (patch supplied) and see if this solves the problem. Created attachment 1120375 [details]
CPU flag test
Simple rough and ready test based on freebl tests.
Created attachment 1120376 [details]
Suggested Fix
"Works for me" fix by adding XSAVE and OS XSAVE tests before enabling GCM.
I'm afraid the fix would break FIPS because it patches nss-softokn-3.14.3/mozilla/security/nss/lib/freebl/rijndael.c which is inside the crypto boundary. Sorry, I blithely walked into that and stated the obvious without considering FIPS. While it isn't a consideration for me, I understand why this presents a problem for you. If /security/nss/lib/freebl/mpi/mpcpucache* is outside of the FIPS boundary, then altering the result of freebl_cpuid by effectively ANDing the AVX bit in %ecx/%rcx against the XSAVE and OSXSAVE bits before returning (sorry, my assembly is a little rusty), something like: bt %ecx, 26 ; test for xsave support jnc failavx bt %ecx, 27 ; test for osxsave support jc out failavx: btr %ecx, 28 ; clear avx support out: would appear to work while not giving any side-effects elsewhere in the code. If it's inside the boundary, then I can't see any way forward, at least for correct detection while maintaining FIPS validation. (In reply to David Zambonini from comment #21) We may be able to use the original proposed patch as long as the customer is not concerned with preserving FIPS validation and can to upgrade beyond nss-softokn-3.14.3-22.el6_6 which is the one to be validated one. I'll have more the say later. We're starting to see this issue now that 6.8 has released with NSS 3.21, which includes a GCM cipher suite. Prior, with NSS 3.19, the issue did not occur for us. It appears that NSS fixed this issue in 3.15.4 (which is after the 3.14 version shipped for softokn-freebl), using a patch similar to David's. https://bugzilla.mozilla.org/show_bug.cgi?id=940794 https://hg.mozilla.org/projects/nss/rev/edda2ba82d22 I'm guessing that 7.2 doesn't have this issue because softokn-freebl is 3.16. What's the "correct" way to fix this? The environmental variable workaround is not ideal because it's hard to make sure it's set for every process. Maintaining our own softokn-freebl package is not really ideal either. According to Bob Relyea, this is a duplicate of bug 1335280, a fix should become available with nss-softokn-3.14.3-23.3.el6_8 *** This bug has been marked as a duplicate of bug 1335280 *** |