Red Hat Bugzilla – Bug 1368520
Crash in import_wait_for_space_in_fifo().
Last modified: 2016-11-03 16:45:00 EDT
Description of problem: An online reinitialization from a supplier to a consumer is causing the crash of the consumer after 15 hours. Version-Release number of selected component (if applicable): How reproducible: Not sure how easy to reproduce as the import is running for a long time ( 15 hours ) before the crash. Steps to Reproduce: Customer was reinitializing a consumer from a supplier using the Console. After some hours, the consumer crashed. Additional info: The crash seems to happen in the import code: ======================================== Core was generated by `/usr/sbin/ns-slapd -D /etc/dirsrv/slapd-389 -i /var/run/dirsrv/slapd-389.pid -w'. Program terminated with signal 11, Segmentation fault. #0 0x00007f688fa8d072 in import_wait_for_space_in_fifo (job=0x7f6744032850, new_esize=7011) at ldap/servers/slapd/back-ldbm/import-threads.c:1857 1857 temp_ep = job->fifo.item[i].entry; ======================================== An excerpt of the code: ======================================== 1844 static void 1845 import_wait_for_space_in_fifo(ImportJob *job, size_t new_esize) 1846 { 1847 struct backentry *temp_ep = NULL; 1848 size_t i; 1849 int slot_found; 1850 PRIntervalTime sleeptime; 1851 1852 sleeptime = PR_MillisecondsToInterval(import_sleep_time); 1853 1854 /* Now check if fifo has enough space for the new entry */ 1855 while ((job->fifo.c_bsize + new_esize) > job->fifo.bsize) { 1856 for ( i = 0, slot_found = 0 ; i < job->fifo.size ; i++ ) { 1857 temp_ep = job->fifo.item[i].entry; 1858 if (temp_ep) { 1859 if (temp_ep->ep_refcnt == 0 && temp_ep->ep_id <= job->ready_EID) { 1860 job->fifo.item[i].entry = NULL; 1861 if (job->fifo.c_bsize > job->fifo.item[i].esize) 1862 job->fifo.c_bsize -= job->fifo.item[i].esize; 1863 else 1864 job->fifo.c_bsize = 0; 1865 backentry_free(&temp_ep); 1866 slot_found = 1; 1867 } 1868 } 1869 } 1870 if ( slot_found == 0 ) 1871 DS_Sleep(sleeptime); 1872 } 1873 } ======================================== See also the original bug: https://bugzilla.redhat.com/show_bug.cgi?id=1368209
1. Created 2 masters 2 consumers replication setup. 2. Synced entries across masters and consumers 3. Created few entries in M1 and left the setup for 2 days. 4. After 2 days, then stopped M2 and created 15000 entries on M1. 5. Started M2 and initialized from replica. 6. Re initialization didn't create any problem 7. No crash observed. [root@ratangad MMR_WINSYNC]# PORT=1189; SUFF="dc=passsync,dc=com"; for PORT in `echo "1189 1289 1389 1489 "`; do /usr/bin/ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b $SUFF |grep -i dn: | wc -l ; done 15585 1 15585 15585 [root@ratangad MMR_WINSYNC]# PORT=1189; SUFF="dc=passsync,dc=com"; for PORT in `echo "1189 1289 1389 1489 "`; do /usr/bin/ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b $SUFF |grep -i dn: | wc -l ; done 15585 15585 15585 15585 [root@ratangad MMR_WINSYNC]# ps -eaf |grep -i slapd dsuser 16729 1 0 Sep17 ? 00:04:06 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-C1 -i /var/run/dirsrv/slapd-C1.pid dsuser 16732 1 0 Sep17 ? 00:03:56 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-C2 -i /var/run/dirsrv/slapd-C2.pid dsuser 16736 1 0 Sep17 ? 00:04:32 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-M1 -i /var/run/dirsrv/slapd-M1.pid dsuser 16757 1 0 Sep17 ? 00:03:29 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-newinst2 -i /var/run/dirsrv/slapd-newinst2.pid dsuser 22971 1 1 18:52 ? 00:00:07 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-M2 -i /var/run/dirsrv/slapd-M2.pid root 23054 12186 0 18:53 pts/0 00:00:00 tail -f /var/log/dirsrv/slapd-M1/errors /var/log/dirsrv/slapd-M2/access root 23199 22837 0 18:59 pts/2 00:00:00 grep --color=auto -i slapd [root@ratangad MMR_WINSYNC]# rpm -qa |grep -i 389-ds-base 389-ds-base-snmp-1.3.5.10-11.el7.x86_64 389-ds-base-libs-1.3.5.10-11.el7.x86_64 389-ds-base-debuginfo-1.3.5.10-6.el7.x86_64 389-ds-base-1.3.5.10-11.el7.x86_64 389-ds-base-devel-1.3.5.10-11.el7.x86_64 Hence, marking the bug as Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2594.html