Bug 450528

Summary: "binder-based resource limits - slapi_reslimit_get_integer_limit(): slapi_get_object_extension() returned NULL" error and server abnormal exit
Product: [Retired] 389 Reporter: Aleksander Adamowski <bugs-redhat>
Component: Directory ServerAssignee: Rich Megginson <rmeggins>
Status: CLOSED NEXTRELEASE QA Contact: Orla Hegarty <ohegarty>
Severity: high Docs Contact:
Priority: low    
Version: 1.1.0CC: ckannan
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-25 20:04:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
errors log none

Description Aleksander Adamowski 2008-06-09 12:20:22 UTC
Description of problem:

I've had the server exit abnormally twice in a row with the following errors in
the logs:

[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server3"
(server3:636): Beginning linger on the connection
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server3"
(server3:636): State: sending_updates -> wait_for_c
hanges
[09/Jun/2008:13:30:53 +0200] - _cl5PositionCursorForReplay
(agmt="cn=with_server1" (server1:636)): Consumer RUV:
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server1"
(server1:636): {replicageneration} 47ffda1100000001
0000
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server1"
(server1:636): {replica 3 ldap://server1.example.co
m:389} 48018448000000030000 484d30e9000000030000 00000000
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server1"
(server1:636): {replica 1 ldap://server3.example.co
m:389} 47ffe6c4000000010000 484d1afe000000010000 00000000
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server1"
(server1:636): {replica 2 ldap://server2.example.co
m:389} 4800030b000000020000 484d2f93000000020000 00000000
[09/Jun/2008:13:30:53 +0200] - _cl5PositionCursorForReplay
(agmt="cn=with_server1" (server1:636)): Supplier RUV:
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server1"
(server1:636): {replicageneration} 47ffda1100000001
0000
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server1"
(server1:636): {replica 2 ldap://server2.example.co
m:389} 4800030b000000020000 484d2f93000000020000 00000000
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server1"
(server1:636): {replica 3 ldap://server1.example.co
m:389} 48018448000000030000 484d30e9000000030000 00000000
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server1"
(server1:636): {replica 1 ldap://server3.example.co
m:389} 47ffe6c4000000010000 484d1afe000000010000 00000000
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server1"
(server1:636): No changes to send
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server1"
(server1:636): Successfully released consumer
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server1"
(server1:636): Beginning linger on the connection
[09/Jun/2008:13:30:53 +0200] NSMMReplicationPlugin - agmt="cn=with_server1"
(server1:636): State: sending_updates -> wait_for_c
hanges
[09/Jun/2008:13:30:53 +0200] binder-based resource limits -
slapi_reslimit_get_integer_limit(): slapi_get_object_extension() re
turned NULL
[09/Jun/2008:13:30:53 +0200] binder-based resource limits -
slapi_reslimit_get_integer_limit(): slapi_get_object_extension() re
turned NULL



Version-Release number of selected component (if applicable):
fedora-ds-1.1.0-3.fc6 on RHEL5 on x86_64

How reproducible:
Hard to reproduce, exact cause unknown. Attaching error logs that include
replication debugging data.

Comment 1 Aleksander Adamowski 2008-06-09 12:20:23 UTC
Created attachment 308681 [details]
errors log

Comment 2 Aleksander Adamowski 2008-06-09 12:23:23 UTC
I can also supply access and audit logs, however these contain mildly sensitive
data s I would need to send them to a private Red Hat address.

Comment 3 Rich Megginson 2008-06-09 18:02:26 UTC
Is there anything in the access log which looks suspicious?  If so, perhaps you
could attach a small excerpt with any sensitive data obscured.

Are there any messages in the syslog which show the process was killed or
crashed?  Any core dumps?  Note that it does not appear that the reslimit errors
would cause the server to exit - so this message is probably a symptom of the
actual failure.

Are you using bind dn based resource limits?

Comment 4 Aleksander Adamowski 2008-06-10 10:16:31 UTC
I'm not using resource limits anywhere.

Searching with this filter on my directory yields 0 entries as a result:

'(|(nsLookThroughLimit=*)(nsSizeLimit=*)(nsTimeLimit=*)(nsIdleTimeout=*))'

I'll send the access and audit logs to your private mail address.

Comment 5 Rich Megginson 2008-06-23 18:54:58 UTC
This problem only occurs on a VM guest, correct?  Can this problem be reproduced
on a bare metal system?

Comment 6 Aleksander Adamowski 2008-06-23 23:00:33 UTC
I'll try, however the bare metal system is a test system and I doubt I can
simulate this specific load.

Reproducing problems on LDAP servers is usually quite tricky, since most of the
time the actual query contents matter, not the number of simultaneous
connections etc. Every detail might be important, like the scope, the attributes
requested, actual data returned;

On OpenLDAP I've once discovered a socket leaking problem that was caused by
binding multiple times with different DN's on the same connection for a long
period of time.



Comment 7 Rich Megginson 2008-06-23 23:20:33 UTC
Ok.  It will help a great deal if we can determine that this is a Xen only bug.

Comment 8 Rich Megginson 2009-03-25 20:04:00 UTC
This may be fixed with the next release.