1257332 – crash when using startTLS from Android LDAP client

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1257332 - crash when using startTLS from Android LDAP client

Summary: crash when using startTLS from Android LDAP client

Keywords:
Status:	CLOSED DUPLICATE of bug 1335280
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	nss-softokn
Sub Component:
Version:	6.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Bob Relyea
QA Contact:	BaseOS QE Security Team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1258003
TreeView+	depends on / blocked

Reported:	2015-08-26 20:04 UTC by Rich Megginson
Modified:	2020-09-13 21:31 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-28 14:56:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Sample test program to see what cpuid is returning. (1.53 KB, text/x-csrc) 2015-08-27 16:09 UTC, Bob Relyea	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	389ds 389-ds-base issues 1613	0	None	None	None	2020-09-13 21:31:55 UTC

Description Rich Megginson 2015-08-26 20:04:56 UTC

Description of problem:
389-ds-base crashes when using startTLS from Android LDAP client.  Crash appears to be deep in nss-softokn assembly code - illegal instruction.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Rich Megginson 2015-08-26 20:05:31 UTC

Marking as security because unauthenticated users can crash the server.

Comment 2 Noriko Hosoi 2015-08-26 20:09:34 UTC

Version of 389-ds-base: 389-ds-base-1.2.11.15-60.el6.x86_64

Do we have the version info of NSS?

Comment 4 Rich Megginson 2015-08-26 20:25:04 UTC

nss-3.19.1-3.el6_6.x86_64
nss-softokn-3.14.3-22.el6_6.x86_64

Comment 9 Noriko Hosoi 2015-08-26 23:02:38 UTC

Hello Andrew,

1) We are interested in the processor you are running the Directory Server on.  Could you please provide the output of `uname -a` and the content of /proc/cpuinfo?

2) We would like to know setting the following environment variable(s) changes the behaviour.
2-1)
Open /etc/sysconfig/dirsrv and add the following line:
export NSS_DISABLE_HW_GCM=1
Restart the Directory Server.
Does the LDAP/TLS request crash the server?
2-2)
If the server crashes, add another variable to /etc/sysconfig/dirsrv:
export NSS_DISABLE_HW_AES=1
Restart the Directory Server.
Does the LDAP/TLS request crash the server?

Thanks.

Comment 13 andyzwieg 2015-08-27 00:07:37 UTC

1)
[root@zldap1 ~]# uname -a
Linux zldap1.zwiegnet.local 2.6.32-504.8.1.el6.x86_64 #1 SMP Wed Jan 28 21:11:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

[root@zldap1 ~]# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 1
model name      : AMD Ftm)-8120 Eight-Core Processor          
stepping        : 2
cpu MHz         : 3110.448
cache size      : 2048 KB
physical id     : 0
siblings        : 1
core id         : 4
cpu cores       : 1
apicid          : 0
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu de tsc msr pae cx8 cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm rep_good unfair_spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch xop perfctr_core perfctr_nb
bogomips        : 6220.89
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 1
model name      : AMD Ftm)-8120 Eight-Core Processor          
stepping        : 2
cpu MHz         : 3110.448
cache size      : 2048 KB
physical id     : 0
siblings        : 1
core id         : 4
cpu cores       : 1
apicid          : 0
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu de tsc msr pae cx8 cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm rep_good unfair_spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch xop perfctr_core perfctr_nb
bogomips        : 6220.89
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:



2-1 no longer crashes the dir server
NSS_DISABLE_HW_GCM=1

2-2 no longer crashes the dir server
NSS_DISABLE_HW_AES=1

Comment 14 Noriko Hosoi 2015-08-27 00:26:15 UTC

Thank you, Andrew.

The NSS expert gave us the explanation and the above instructions:

> The processor on the machine is claiming it can do AVX when it can't.
> We can get the user working securely by setting NSS_DISABLE_HW_GCM=1.
> That will keep NSS away from using any AVX instructions.

Please see also 
> http://www.felixcloutier.com/x86/MOVDQU.html

It seems "AMD Ftm)-8120 Eight-Core Processor" has the issue.  Since setting "NSS_DISABLE_HW_GCM=1" solves the crash issue, could you keep it in your sysconfig dirsrv file?  Please note that setting NSS_DISABLE_HW_AES=1 is not necessary.

Comment 15 Rich Megginson 2015-08-27 00:28:55 UTC

It doesn't appear that the processor is advertising avx support.  That is, why does NSS think it has avx support?  For example, here is my laptop info which advertises aes, avx, and clmul:

flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt

It has aes, avx, avx2, and pclmulqdq

Comment 17 Bob Relyea 2015-08-27 00:37:29 UTC

We're reading bit 28 of ecx from the cpuid command (numbered low to high starting at 0) for avx:

            freebl_cpuid(1, &eax, &ebx, &ecx, &edx);
            has_intel_aes = (ecx & (1 << 25)) != 0 ? 1 : -1;
#ifdef USE_HW_GCM
            disable_hw_gcm = getenv("NSS_DISABLE_HW_GCM");
            if (disable_hw_gcm == NULL) {
                has_intel_clmul = (ecx & (1 << 1)) != 0 ? 1 : -1;
                has_intel_avx = (ecx & (1 << 28)) != 0 ? 1 : -1;
            }


   use_hw_gcm = (PRBool)
                (use_hw_aes && has_intel_avx>0 && has_intel_clmul>0);

Comment 18 andyzwieg 2015-08-27 00:59:00 UTC

"Since setting "NSS_DISABLE_HW_GCM=1" solves the crash issue, could you keep it in your sysconfig dirsrv file?" Yes I can.

Will there be an official patch to fix this issue? I assume others may be using the processor, and likely could have the same issue.

Comment 19 Bob Relyea 2015-08-27 16:09:39 UTC

Created attachment 1067827 [details]
Sample test program to see what cpuid is returning.

If the reporter could compile this program and paste it's outputs, we could get an idea what NSS is seeing from the cpu:

cc -o raw_cpuid raw_cpuid.c

./raw_cpuid

Comment 20 Bob Relyea 2015-08-27 16:25:51 UTC

According to http://support.amd.com/TechDocs/25481.pdf we are checking the correct bit.

According to https://en.wikipedia.org/wiki/Bulldozer_%28microarchitecture%29 This processor should support AVX, I wonder why 1) it's failing, and 2) why the kernel knows to turn off AVX support on it.

bob

Comment 21 andyzwieg 2015-08-27 16:42:58 UTC

The server in question is a VM, running on XenServer 6.1

./raw_cpuid 
eax = 0x00600f12
ebx = 0x03080800
ecx = 0x1698220b
edx = 0x178bfbff

has_intel_aes = 1
has_intel_clmul = 1
has_intel_avx = 1
has_intel_avx = 1

use_hw_aes = 1
use_hw_gcm = 1

Comment 22 Bob Relyea 2015-08-27 17:31:04 UTC

OK, so the processor can do avx, but Xen turns it off, but doesn't catch the cpu_id command. This is looking like a Xen issue.

Comment 23 Bob Relyea 2015-08-27 18:09:44 UTC

OK, I've researched this some more, and it looks like we need to check whether or not not OS supports save and restore. Now RHEL 6 always supports save and restore, but it looks like Xen doesn't always support it.

The issue here is the code in question is FIPS, so the bar to change it is pretty high, and we do have a work around. I recommend fixing the issue upstream, creating a knowledge base doc describing the issue and the work around. When and if we require a softoken refresh with the relevant FIPS validation, we would then pick up the fix.

bob

Comment 24 Bob Relyea 2015-08-27 18:52:43 UTC

So part 1 of my recommendation is already done. There is an upstream patch that handles this case. It's also in RHEL 7. When we do a RHEL 6 FIPS refresh, we could pick it up then. In the meantime we can release not the work around for RHEL 6 running under as a client under Xen.

bob

Comment 25 Bob Relyea 2015-08-27 18:58:14 UTC

> release not the work around

That should be release note the work around

Comment 26 andyzwieg 2015-09-20 15:04:46 UTC

I was now able to duplicate this on ANOTHER PROCESSOR, seperate server. Same way to resolve, add  export NSS_DISABLE_HW_GCM=1 to /etc/sysconfig/dirsrv

I think there needs to be a RHEL/389 patch for this

[root@zldap2 ~]# cat /proc/cpuinfo 
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 1
model name	: AMD FX(tm)-4130 Quad-Core Processor            
stepping	: 2
cpu MHz		: 3825.752
cache size	: 2048 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de tsc msr pae cx8 cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm rep_good unfair_spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch xop fma4 perfctr_core perfctr_nb
bogomips	: 7651.50
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management:

Comment 27 andyzwieg 2015-09-20 15:06:29 UTC

different XenServer Version as well, it could impact a broad range of hardware/software

Comment 28 Rich Megginson 2015-09-21 16:48:22 UTC

(In reply to andyzwieg from comment #26)
> I was now able to duplicate this on ANOTHER PROCESSOR, seperate server. Same
> way to resolve, add  export NSS_DISABLE_HW_GCM=1 to /etc/sysconfig/dirsrv
> 
> I think there needs to be a RHEL/389 patch for this

There needs to be a Xenserver patch to return the correct processor codes.

There already is an NSS patch for Fedora and RHEL7 that will eventually make its way into RHEL6.

This isn't a 389 issue, it's just that 389 is one of the few servers that uses NSS for crypto.

Comment 29 Rich Megginson 2015-09-24 17:32:30 UTC

Can we open this bug to the general public?  I don't think this is a CVE issue, and we have another 389 community user who is running into the same issue.

Comment 30 Bob Relyea 2015-09-29 17:50:45 UTC

Yes, this bug should be opened so that others can find it until there's a release note on this.

It also should be assigned to NSS, though right now the only fix is the environment work around.

The environments that fail are RHEL6 running under Xen earilier than I think 4.5 and running on a processor with AVX support.

Note You need to log in before you can comment on or make changes to this bug.