Bug 454328 - Stack overflow when using memberOf plug-in on x86_64
Summary: Stack overflow when using memberOf plug-in on x86_64
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: 389
Classification: Retired
Component: Unknown
Version: 1.1.1
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
Assignee: Nathan Kinder
QA Contact: Chandrasekar Kannan
URL:
Whiteboard:
Depends On:
Blocks: 249650 FDS112 453229
TreeView+ depends on / blocked
 
Reported: 2008-07-07 18:19 UTC by Nathan Kinder
Modified: 2015-01-04 23:33 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2008-08-27 20:39:32 UTC
Embargoed:


Attachments (Terms of Use)
Call Stacks (8.13 KB, text/plain)
2008-07-07 18:19 UTC, Nathan Kinder
no flags Details
CVS Diffs (1.70 KB, patch)
2008-07-08 18:31 UTC, Nathan Kinder
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2008:0602 0 normal SHIPPED_LIVE Moderate: redhat-ds-base and redhat-ds-admin security and bug fix update 2008-08-27 20:38:30 UTC

Description Nathan Kinder 2008-07-07 18:19:07 UTC
I've been running into crashes related to stack corruption when doing some tests
with the memberOf plug-in on x86_64 hardware.  I do not run into any problems on
32-bit i386 hardware.

I've also found that both the IPA and DS memberOf plug-in's both trigger the
problem, but the stacks are a bit different (which is not too odd considering
the code is quite a bit different and the stacks are corrupt).

To reproduce the issue, use the following steps:

  - Setup a DS instance and enable the memberOf plug-in.
  - Add 20 user entries (user1 - user20):

    dn: cn=user1,dc=example,dc=com
    changetype: add
    cn: user1
    objectclass: person
    objectclass: inetuser
    sn: 1

  - Add 20 group entries (group1 - group20):

    dn: cn=group1,dc=example,dc=com
    changetype: add
    objectClass: groupOfNames
    objectClass: inetUser
    cn: group1

  - Add each user as a member to their matching group (user1 to group1, etc.):

    dn: cn=group1,dc=example,dc=com
    changetype: modify
    add: member
    member: cn=user1,dc=example,dc=com

  - Create a group loop across all 20 groups (g1->g2->g3....g20->g1):

    dn: cn=group1,dc=example,dc=com
    changetype: modify
    add: member
    member: cn=group2,dc=example,dc=com

The ns-slapd process will crash about half-way through the group loop creation.

I've attached the top portions of some stack traces for a few different
scenarios.  For each scenario, the stack is consistently corrupted.  The stacks
included are one with the IPA memberOf plug-in, one with the DS memberOf
plug-in, and another with the DS plug-in with the "-fstack-protector-all" gcc
option enabled at build time.  This gcc option doesn't seem to detect the
corruption, but I believe the stack is different due to the use of stack canaries.

Comment 1 Nathan Kinder 2008-07-07 18:19:07 UTC
Created attachment 311199 [details]
Call Stacks

Comment 2 Nathan Kinder 2008-07-08 01:45:37 UTC
I'm beginning to think that the problem is that we're overflowing the stack. 
The crash is occurring at the end of the function prologue when it attempts to
make space for the local variables (the sub assembly instruction).  Displaying
the instruction in the program counter register in gdb shows that the next
instruction is the "mov" of the first passed in variable, meaning that the "sub"
instruction must have failed.

I do not have access to my test system at the moment to confirm this by
increasing the stack size, but I think that the theory holds water.  I built a
little test program that intentionally overflows the stack in a function called
from main.  I see the crash happen at the same point in the function prologue as
it does when ns-slapd crashes:

  (gdb) disassemble
  Dump of assembler code for function big_stuff:
  0x000000000040048e <big_stuff+0>:       push   %rbp
  0x000000000040048f <big_stuff+1>:       mov    %rsp,%rbp
  0x0000000000400492 <big_stuff+4>:       sub    $0x1c9c318,%rsp
  0x0000000000400499 <big_stuff+11>:      mov    %edi,0xfffffffffe363c7c(%rbp)
  0x000000000040049f <big_stuff+17>:      mov    %esi,0xfffffffffe363c78(%rbp)
  0x00000000004004a5 <big_stuff+23>:      mov    0xfffffffffe363c78(%rbp),%eax
  0x00000000004004ab <big_stuff+29>:      add    0xfffffffffe363c7c(%rbp),%eax
  0x00000000004004b1 <big_stuff+35>:      leaveq 
  0x00000000004004b2 <big_stuff+36>:      retq   
  End of assembler dump.
  (gdb) x/i $pc
  0x400499 <big_stuff+11>:        mov    %edi,0xfffffffffe363c7c(%rbp)

I think the reason I don't see this same issue on my 32-bit test machine is that
the larger 64-bit pointer size used for all local pointer variables used by this
thread is enough to push us past the stack limit with the test that I'm running.
 The stack limit on both my 32-bit and 64-bit test machines is a default of 10MB.

Comment 3 Nathan Kinder 2008-07-08 04:36:52 UTC
This issue is indeed stack size related.  I tried increasing the limit with
"ulimit -s", but it didn't help.  I started looking into the server code to see
if we hard-code the stack size, and I noticed this define in slapi-plugin.h:

  /* All 64-bit builds get a bigger stack size */
  #elif ( defined ( __LP64__ )) || defined (_LP64)
  #define SLAPD_DEFAULT_THREAD_STACKSIZE  262144L
  #else
  #define SLAPD_DEFAULT_THREAD_STACKSIZE  0
  #endif

We pass this as the requested stack size to PR_CreateThread().  If we pass a
"0", then NSPR will use the default pthread stack size, which is the same as the
current soft resource limit (10MB).  This is what happens on a 32-bit Linux system.

On a LP64 system, we are hard-coding a stack size of 256k.  This explains why we
run out of stack space on such a 20 group loop on 64-bit Linux when I've been
able to create group loops of 100 groups on 32-bit Linux.  Changing this to "0"
for LP64 systems makes the crash go away.  I think we should just get rid of the
LP64 specific define and use the default define of "0" for
SLAPD_DEFAULT_THREAD_STACKSIZE.



Comment 4 Rich Megginson 2008-07-08 15:33:24 UTC
I think you've got it.  Note - this is why recursion is generally bad in daemon
code :P   Would probably be better to rewrite to limit recursion or avoid
recursion altogether, but let's go with your solution for now.

Comment 5 Nathan Kinder 2008-07-08 18:31:40 UTC
Created attachment 311304 [details]
CVS Diffs

This fixes the define for the default stack size.  I also dealt with a compiler
error with a function not returning a value when it is supposed to return an
int in the memberOf code.

Comment 6 Nathan Kinder 2008-07-08 20:00:14 UTC
Checked into ldapserver (HEAD).  Thanks to Rich and Noriko for their reviews!

Checking in ldap/servers/plugins/memberof/memberof.c;
/cvs/dirsec/ldapserver/ldap/servers/plugins/memberof/memberof.c,v  <--  memberof.c
new revision: 1.12; previous revision: 1.11
done
Checking in ldap/servers/slapd/slapi-plugin.h;
/cvs/dirsec/ldapserver/ldap/servers/slapd/slapi-plugin.h,v  <--  slapi-plugin.h
new revision: 1.25; previous revision: 1.24
done

Comment 7 Nathan Kinder 2008-07-09 17:08:50 UTC
Checked into Directory71RtmBranch.

Checking in slapd/slapi-plugin.h;
/cvs/dirsec/ldapserver/ldap/servers/slapd/slapi-plugin.h,v  <--  slapi-plugin.h
new revision: 1.8.2.3; previous revision: 1.8.2.2
done

Comment 8 Nathan Kinder 2008-07-10 22:42:22 UTC
Checked into Directory_Server_8_0_Branch.

Checking in ldap/servers/slapd/slapi-plugin.h;
/cvs/dirsec/ldapserver/ldap/servers/slapd/slapi-plugin.h,v  <--  slapi-plugin.h
new revision: 1.19.2.1; previous revision: 1.19
done

Comment 13 errata-xmlrpc 2008-08-27 20:39:32 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0602.html


Note You need to log in before you can comment on or make changes to this bug.