Bug 458683 - openldap crashes with large groups
openldap crashes with large groups
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: openldap (Show other bugs)
9
All Linux
medium Severity high
: ---
: ---
Assigned To: Jan Safranek
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-11 11:39 EDT by Marek Greško
Modified: 2008-09-04 09:34 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-09-04 09:34:51 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
DB_CONFIG (886 bytes, application/octet-stream)
2008-08-25 05:51 EDT, Marek Greško
no flags Details
slapd.conf (2.35 KB, application/octet-stream)
2008-08-25 05:52 EDT, Marek Greško
no flags Details
Backtrace of crashed slapd (4.42 KB, text/plain)
2008-09-04 07:19 EDT, Marek Greško
no flags Details

  None (edit)
Description Marek Greško 2008-08-11 11:39:36 EDT
Description of problem:
I created more than 30 000 users using smbldap-tools.
Now, when I try to assign them to groups using smbldap-tools, my ldap server gets crashed.

Every user is to be a member of 2 or 3 groups. Several groups are pretty large. One of them contains nearly all users. I successfully added around 12 000 users to groups, but after then my ldap server crashes after every several hundreds of assigned users.

Now I run ldap server with -s 2 syslog-level. I saw this:

Aug 11 17:30:47 boot slapd[532]: conn=970 fd=49 ACCEPT from IP=127.0.0.1:38042 (IP=0.0.0.0:389)
Aug 11 17:30:47 boot slapd[532]: conn=970 op=0 EXT oid=1.3.6.1.4.1.1466.20037
Aug 11 17:30:47 boot slapd[532]: conn=970 op=0 STARTTLS
Aug 11 17:30:47 boot slapd[532]: conn=970 op=0 RESULT oid= err=0 text=
Aug 11 17:30:47 boot slapd[532]: conn=970 fd=49 TLS established tls_ssf=128 ssf=128
Aug 11 17:30:47 boot slapd[532]: conn=970 op=1 BIND dn="cn=smbldap-tools,ou=DSA,dc=mydomain,dc=lan" method=128
Aug 11 17:30:47 boot slapd[532]: conn=970 op=1 BIND dn="cn=smbldap-tools,ou=DSA,dc=mydomain,dc=lan" mech=SIMPLE ssf=0
Aug 11 17:30:47 boot slapd[532]: conn=970 op=1 RESULT tag=97 err=0 text=
Aug 11 17:30:47 boot slapd[532]: conn=970 op=2 SRCH base="ou=Groups,dc=mydomain,dc=lan" scope=2 deref=2 filter="(&(objectClass=posixGroup)(cn=groupname))"
Aug 11 17:30:47 boot slapd[532]: conn=970 op=2 SEARCH RESULT tag=101 err=0 nentries=1 text=
Aug 11 17:30:47 boot slapd[532]: conn=970 op=3 SRCH base="ou=Groups,dc=mydomain,dc=lan" scope=2 deref=2 filter="(&(objectClass=posixGroup)(cn=groupname))"
Aug 11 17:30:47 boot slapd[532]: conn=970 op=3 SEARCH RESULT tag=101 err=0 nentries=1 text=
Aug 11 17:30:48 boot slapd[532]: conn=970 op=4 SRCH base="dc=mydomain,dc=lan" scope=2 deref=2 filter="(&(objectClass=posixAccount)(uid=1234567890))"
Aug 11 17:30:48 boot slapd[532]: conn=970 op=4 SEARCH RESULT tag=101 err=0 nentries=1 text=
Aug 11 17:30:48 boot slapd[532]: conn=970 op=5 SRCH base="cn=groupname,ou=Groups,dc=mydomain,dc=lan" scope=0 deref=2 filter="(&(memberUid=1234567890))"
Aug 11 17:30:48 boot slapd[532]: conn=970 op=5 SEARCH RESULT tag=101 err=0 nentries=0 text=
Aug 11 17:30:48 boot slapd[532]: conn=970 op=6 MOD dn="cn=groupname,ou=Groups,dc=mydomain,dc=lan"
Aug 11 17:30:48 boot slapd[532]: conn=970 op=6 MOD attr=memberUid
Aug 11 17:30:48 boot slapd[532]: ch_malloc of 1379704 bytes failed


Version-Release number of selected component (if applicable):
openldap-2.4.10-1.fc9.i386




How reproducible:


Steps to Reproduce:
1. Create many users using smbldap-tools.
2. Assign users to groups. (Large groups).
3. Openldap server crashes.
  
Actual results:
Openldap server crashes on assigning users to groups.

Expected results:
Openldap server does not crash.

Additional info:
When running getent group on some large group I do not get complete list of members, but it is probably different bug in nss_ldap.
Comment 1 Jan Safranek 2008-08-22 09:27:58 EDT
I have been unable to reproduce the bug, the smbldap tools are painfully slow - I tried to run some tests with large groups and users over night, but nothing interesting happened. 

1. Could you please post your slapd.conf and DB_CONFIG + some script, which will simulate the traffic you have and which will reliably reproduce the bug? 

2. Please try to run slapd with loglevel 256 - it should be small enough and it should contains basically brief summary of all operations and errors only.

3. And please try to test with openldap-2.4.11-1.fc10 from rawhide, just to see if it has been fixed upstream.

Thanks in advance
Comment 2 Marek Greško 2008-08-25 05:43:09 EDT
Maybe it is because of string length of group membership. Try longer user names. I use usernames which consists only of numbers with average length of 10 characters.

Ad 1.: I will post. Access rules will be stripped. Hope this does not matter.

Ad 2. and 3.: I should create some test environment and test it afterwards. In our production environment we decided to drop using groups until this bug is fixed.
Comment 3 Marek Greško 2008-08-25 05:51:14 EDT
Created attachment 314918 [details]
DB_CONFIG
Comment 4 Marek Greško 2008-08-25 05:52:42 EDT
Created attachment 314919 [details]
slapd.conf
Comment 5 Marek Greško 2008-09-03 04:14:20 EDT
The problem is probably not related to large groups. Large groups make the problem only more apprent. The crash is probably triggered by concurrent access of smbldap-tools to ldap.

I also found this in the log, but it is not always here:

kernel: slapd[24863]: segfault at 0 ip 002dde26 sp 9ce87c18 error 6 in libcrypto.so.0.9.8g[29f000+137000]

So the bug may be not in slapd but in libcrypto.
Comment 6 Jan Safranek 2008-09-03 04:18:10 EDT
could you please post a complete stack trace?
Comment 7 Marek Greško 2008-09-04 07:19:43 EDT
Created attachment 315738 [details]
Backtrace of crashed slapd
Comment 8 Jan Safranek 2008-09-04 07:43:47 EDT
(adding content of /proc/<pid of slapd>/limits reporter send by email)

[root@boot ~]# cat /proc/31125/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            ms        
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          524288000            524288000            bytes     
Max processes             1024                 32767                processes 
Max open files            1024                 1024                 files     
Max locked memory         32768                32768                bytes     
Max address space         524288000            524288000            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       32767                32767                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us
Comment 9 Jan Safranek 2008-09-04 07:47:15 EDT
(In reply to comment #8)

> Max resident set          524288000            524288000            bytes     
> Max address space         524288000            524288000            bytes     

Why this? It could be a bit low for snmpd. What RSS the snmpd usually has before it crashes? Could it hit the 512MB limit?

Try to look at /etc/security/*, what sets the memory constraints so low and why. Try to increase the limits or lower the BDB cache size in DB_CONFIG, maybe it helps.
Comment 10 Marek Greško 2008-09-04 09:08:49 EDT
/etc/security/limits.conf:

*  hard    rss     512000
*  hard    as      512000

If this is the cause, you may close the bug as not a bug / configuration error.
I will reopen if slapd continues crashing after removal.
Comment 11 Jan Safranek 2008-09-04 09:34:51 EDT
Fine, please post your results if it crashes again.

Note You need to log in before you can comment on or make changes to this bug.