Bug 1368520

Summary: Crash in import_wait_for_space_in_fifo().
Product: Red Hat Enterprise Linux 7 Reporter: Noriko Hosoi <nhosoi>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED ERRATA QA Contact: Viktor Ashirov <vashirov>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: lmiksik, nkinder, rmeggins, sramling, tbordaz
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.5.10-9.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 20:45:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Noriko Hosoi 2016-08-19 15:58:25 UTC
Description of problem:

An online reinitialization from a supplier to a consumer is causing the crash
of the consumer after 15 hours.

Version-Release number of selected component (if applicable):

How reproducible:

Not sure how easy to reproduce as the import is running for a long time ( 15
hours ) before the crash.

Steps to Reproduce:

Customer was reinitializing a consumer from a supplier using the Console.
After some hours, the consumer crashed.

Additional info:
The crash seems to happen in the import code:

========================================
Core was generated by `/usr/sbin/ns-slapd -D /etc/dirsrv/slapd-389 -i
/var/run/dirsrv/slapd-389.pid -w'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f688fa8d072 in import_wait_for_space_in_fifo (job=0x7f6744032850,
new_esize=7011) at ldap/servers/slapd/back-ldbm/import-threads.c:1857
1857                temp_ep = job->fifo.item[i].entry;
========================================

An excerpt of the code:
========================================
1844 static void
1845 import_wait_for_space_in_fifo(ImportJob *job, size_t new_esize)
1846 {
1847     struct backentry *temp_ep = NULL;
1848     size_t i;
1849     int slot_found;
1850     PRIntervalTime sleeptime;
1851
1852     sleeptime = PR_MillisecondsToInterval(import_sleep_time);
1853
1854     /* Now check if fifo has enough space for the new entry */
1855     while ((job->fifo.c_bsize + new_esize) > job->fifo.bsize) {
1856         for ( i = 0, slot_found = 0 ; i < job->fifo.size ; i++ ) {
1857             temp_ep = job->fifo.item[i].entry;
1858             if (temp_ep) {
1859                 if (temp_ep->ep_refcnt == 0 && temp_ep->ep_id <=
job->ready_EID) {
1860                     job->fifo.item[i].entry = NULL;
1861                     if (job->fifo.c_bsize > job->fifo.item[i].esize)
1862                         job->fifo.c_bsize -= job->fifo.item[i].esize;
1863                     else
1864                         job->fifo.c_bsize = 0;
1865                     backentry_free(&temp_ep);
1866                     slot_found = 1;
1867                 }
1868             }
1869         }
1870         if ( slot_found == 0 )
1871             DS_Sleep(sleeptime);
1872     }
1873 }
========================================

See also the original bug: https://bugzilla.redhat.com/show_bug.cgi?id=1368209

Comment 3 Sankar Ramalingam 2016-09-19 13:30:06 UTC
1. Created 2 masters 2 consumers replication setup.
2. Synced entries across masters and consumers
3. Created few entries in M1 and left the setup for 2 days.
4. After 2 days, then stopped M2 and created 15000 entries on M1.
5. Started M2 and initialized from replica.
6. Re initialization didn't create any problem
7. No crash observed.


 [root@ratangad MMR_WINSYNC]# PORT=1189; SUFF="dc=passsync,dc=com"; for PORT in `echo "1189 1289 1389 1489 "`; do /usr/bin/ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b $SUFF |grep -i dn: | wc -l ; done
15585
1
15585
15585
[root@ratangad MMR_WINSYNC]# PORT=1189; SUFF="dc=passsync,dc=com"; for PORT in `echo "1189 1289 1389 1489 "`; do /usr/bin/ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b $SUFF |grep -i dn: | wc -l ; done
15585
15585
15585
15585
[root@ratangad MMR_WINSYNC]# ps -eaf |grep -i slapd
dsuser   16729     1  0 Sep17 ?        00:04:06 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-C1 -i /var/run/dirsrv/slapd-C1.pid
dsuser   16732     1  0 Sep17 ?        00:03:56 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-C2 -i /var/run/dirsrv/slapd-C2.pid
dsuser   16736     1  0 Sep17 ?        00:04:32 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-M1 -i /var/run/dirsrv/slapd-M1.pid
dsuser   16757     1  0 Sep17 ?        00:03:29 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-newinst2 -i /var/run/dirsrv/slapd-newinst2.pid
dsuser   22971     1  1 18:52 ?        00:00:07 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-M2 -i /var/run/dirsrv/slapd-M2.pid
root     23054 12186  0 18:53 pts/0    00:00:00 tail -f /var/log/dirsrv/slapd-M1/errors /var/log/dirsrv/slapd-M2/access
root     23199 22837  0 18:59 pts/2    00:00:00 grep --color=auto -i slapd
[root@ratangad MMR_WINSYNC]# rpm -qa |grep -i 389-ds-base
389-ds-base-snmp-1.3.5.10-11.el7.x86_64
389-ds-base-libs-1.3.5.10-11.el7.x86_64
389-ds-base-debuginfo-1.3.5.10-6.el7.x86_64
389-ds-base-1.3.5.10-11.el7.x86_64
389-ds-base-devel-1.3.5.10-11.el7.x86_64


Hence, marking the bug as Verified.

Comment 5 errata-xmlrpc 2016-11-03 20:45:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2594.html