Bug 1261218

Summary: replication and ns-slapd crash in csnset_dup in ipa context
Product: Red Hat Enterprise Linux 7 Reporter: Marc Sauton <msauton>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED DUPLICATE QA Contact: Viktor Ashirov <vashirov>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.1CC: gparente, nkinder, rmeggins
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-10 19:38:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marc Sauton 2015-09-09 00:05:02 UTC
Description of problem:

ns-slapd crashes in ipa context, on several replica, multiple times, with same exact stack trace signature, in customer environment, not reproduced in house.


the toplogy is like this in the customer report:
                    m1.1
                /      |      \
          m2.1     m1.2 - m1.3
                \      /
               m2.2
at the time of this report, the crashes happened on the masters I called m1.2 and m1.3


The stack trace is showing a MOD operation for replication on a host group entry
cn=...,cn=hostgroups,cn=accounts,dc=...
on one of the member entries
fqdn=..,cn=computers,cn=accounts,dc=...
possibly a host enrollment.


Program terminated with signal 11, Segmentation fault.
#0  csnset_dup (csnset=<optimized out>) at ldap/servers/slapd/csnset.c:381
381                     csnset_add_csn(curnode,n->type,&n->csn);

(gdb) list
376             CSNSet *newcsnset= NULL;
377             CSNSet **curnode = &newcsnset;
378             const CSNSet *n= csnset;
379             while(n!=NULL)  
380             {
381                     csnset_add_csn(curnode,n->type,&n->csn);
382                     n= n->next;
383                     curnode = &((*curnode)->next);
384             }
385             return newcsnset;

the "n" value seem "large", like "corrupted":

Thread 1 (Thread 0x7fc0227f4700 (LWP 41887)):
#0  csnset_dup (csnset=<optimized out>) at ldap/servers/slapd/csnset.c:381
        newcsnset = 0x7fbfcc028f50
        curnode = 0x7fbfcc028f68
        n = 0x6e632c736e696775

and:
(gdb) p curnode
$6 = (CSNSet **) 0x7f5b7000ccd8
(gdb) p *curnode
$7 = (CSNSet *) 0x0

(gdb) p n
$1 = (const CSNSet *) 0x6e632c736e696775

(gdb) p n->type
Cannot access memory at address 0x6e632c736e696775
(gdb)


#0  csnset_dup (csnset=<optimized out>) at ldap/servers/slapd/csnset.c:381
#1  0x00007fc05f6891ce in slapi_value_dup (v=0x7fbfcc023d90) at ldap/servers/slapd/value.c:173
#2  0x00007fc05f68b269 in valueset_set_valueset (vs1=0x7fbfcc027fb8, vs2=0x7fbfcc022dc8) at ldap/servers/slapd/valueset.c:1205
#3  0x00007fc05f5ffc67 in slapi_attr_dup (attr=attr@entry=0x7fbfcc022dc0) at ldap/servers/slapd/attr.c:440
#4  0x00007fc05f615cc8 in slapi_entry_dup (e=0x7fbfcc011f00) at ldap/servers/slapd/entry.c:2161
#5  0x00007fc053b669ef in backentry_dup (e=0x7fbfcc011e90) at ldap/servers/slapd/back-ldbm/backentry.c:114
#6  0x00007fc053bad484 in ldbm_back_modify (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:669
#7  0x00007fc05f6430e1 in op_shared_modify (pb=pb@entry=0x7fc0227f3ae0, pw_change=pw_change@entry=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1086
#8  0x00007fc05f64442f in do_modify (pb=pb@entry=0x7fc0227f3ae0) at ldap/servers/slapd/modify.c:419
#9  0x00007fc05fb24361 in connection_dispatch_operation (pb=0x7fc0227f3ae0, op=0x7fc066361740, conn=0x7fc03c359410) at ldap/servers/slapd/connection.c:660
#10 connection_threadmain () at ldap/servers/slapd/connection.c:2534
#11 0x00007fc05da4b9db in _pt_root () from /lib64/libnspr4.so
#12 0x00007fc05d3ecdf5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007fc05d11a1ad in clone () from /lib64/libc.so.6



Version-Release number of selected component (if applicable):
RHEL 7.1
redhat-release-server-7.1-1.el7.x86_64
389-ds-base-1.3.3.1-15.el7_1.x86_64


How reproducible:
N/A
but multiple crashes in customer environment.


Steps to Reproduce:
1. N/A
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Marc Sauton 2015-09-09 01:34:15 UTC
actually, the "newer" 389-ds-base version is 
389-ds-base-1.3.3.1-20.el7_1

this is the one used when the crashes happened.
with have several sosreports, and I picked an old version number in the initial description of the bz 1261218 report

Comment 11 Marc Sauton 2015-09-10 19:38:44 UTC
closing bz 1261218 as a dup of bz 1243970 as per dev review and request
bz 1243970 has all acks and is in the 7.2 errata
there will be a cloned bz to backport the 48226 patch to 7.1 and include it in 7.1.z

*** This bug has been marked as a duplicate of bug 1243970 ***