Bug 1007988 - Under specific values of nsDS5ReplicaName, replication may get broken or updates missing
Under specific values of nsDS5ReplicaName, replication may get broken or upda...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base (Show other bugs)
7.0
Unspecified Unspecified
unspecified Severity urgent
: rc
: ---
Assigned To: Rich Megginson
Sankar Ramalingam
:
Depends On: 1007452
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-13 13:10 EDT by Nathan Kinder
Modified: 2014-06-17 22:59 EDT (History)
6 users (show)

See Also:
Fixed In Version: 389-ds-base-1.3.1.6-4.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1007452
Environment:
Last Closed: 2014-06-13 08:00:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Nathan Kinder 2013-09-13 13:10:03 EDT
+++ This bug was initially created as a clone of Bug #1007452 +++

This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47489

In a replication environment, if the changelog db file name contains extension string multiple times in the file name, the change log file is getting recreated if we perform the db2ldif and ldif2db on the master/hub instance.

Ex: d736e482-198111e1-8d7bedb4-8c53b85f_502ce263000000020000.db4
    In this file name "db4"  is present twice, once as the extension and other one is in the replica name string ("8d7bedb4").

There is a logic problem in the below function where it is trying to find the filename ends with extension. It calls strstr()function to search the "ext" and which returns the first occurrence of the "ext" string in the filename. if the the "ext" string exist multiple times in the file name it returns false always, which result in creating multiple changelog db file. 

====
filename: cl5_api.c

/*
 * return 1: true (the "filename" ends with "ext")
 * return 0: false
 */
static int _cl5FileEndsWith(const char *filename, const char *ext)
{
	char *p = NULL;
	int flen = strlen(filename);
	int elen = strlen(ext);
	if (0 == flen || 0 == elen)
	{
		 return 0;
	}
	p = strstr(filename, ext);
	if (NULL == p)
 	{
		 return 0;
	}
	if (p - filename + elen == flen)
        {
		 return 1;
	}
 	return 0;
}
=====

I have modified this function to fix this issue. Could you please verify the same and include the fix in the master branch?

/*
 * return 1: true (the "filename" ends with "ext")
 * return 0: false
 */
static int _cl5FileEndsWith(const char *filename, const char *ext)
{
	char *p = NULL;
	int flen = strlen(filename);
	int elen = strlen(ext);
	if (0 == flen || 0 == elen)
	{
		return 0;
	}
	p = strstr(filename, ext);
	if (NULL == p)
	{
		return 0;
	}
        
        do {
	    if (p - filename + elen == flen)
	    {
		return 1;
	    }
            p = strstr(p+elen, ext);
        } while ( p != NULL );
 
	return 0;
}
=====

Thanks and Regards,
Jyoti

--- Additional comment from Rich Megginson on 2013-09-12 10:09:56 EDT ---

This issue seems serious enough to warrant inclusion in RHEL 6.5.
Comment 1 Jenny Galipeau 2013-09-13 13:12:34 EDT
need steps to reproduce
Comment 2 Rich Megginson 2013-09-13 13:13:56 EDT
need steps to reproduce
Comment 3 thierry bordaz 2013-09-13 14:54:19 EDT
Steps to reproduce are
	Create Master, C1, C2
        Create ReplAgreement M->C1 and M->C2

        <before doing any update>
        Stop Master
        Identify database suffix like db4 (e.g. /var/lib/dirsrv/slapd-master/db/userRoot/id2entry.db4)
        Edit master dse.ldif. 
              Look for cn=replica,cn=<suffix>,cn=mapping tree,cn=config
	      Update nsDS5ReplicaName to add database suffix in the name like
                before: afbf227b-1ca411e3-8cdaf60b-fc2f2a5a
                after : afbf227b-1ca411e3-8cdafdb4-fc2f2a5a
        Start Master

        # The following steps will created the changelog file that contains 'db4'
	Create user t1
	Create user t2
	<check replication is working>

        # The following steps create entry t3 on M and C1 but not on C2.
        # t3 is also recorded in the changelog
	Stop C2
	Create user t3
	<check t3 is replicated on C1>

        # The following step will corrupt the t3 entry in the changelog, so that
        # the entry ADD can no longer be replicated
	Stop Master, C1
	export Master (-r)
	import C1 (this step can likely be skipped)
	Start Master, C1, C2

        # The following step triggers that C2 will miss t3
	Create user t4

		-> On Master: t1, t2, t3, t4
		-> On Cons.1: t1, t2, t3, t4
		-> On Cons.2: t1, t2,     t4

        # An update on t3, should break replication
Comment 4 Rich Megginson 2013-10-01 19:25:34 EDT
moving all ON_QA bugs to MODIFIED in order to add them to the errata (can't add bugs in the ON_QA state to an errata).  When the errata is created, the bugs should be automatically moved back to ON_QA.
Comment 8 Amita Sharma 2014-02-03 10:28:35 EST
[root@dhcp201-149 userRoot]# tail -f /var/log/dirsrv/slapd-dhcp201-149/errors
[03/Feb/2014:20:29:14 +051800] - All database threads now stopped
[03/Feb/2014:20:29:14 +051800] - slapd stopped.
[03/Feb/2014:20:30:04 +051800] - export userRoot: Processed 11 entries (100%).
[03/Feb/2014:20:30:04 +051800] - All database threads now stopped
[03/Feb/2014:20:30:46 +051800] - 389-Directory/1.3.1.6 B2014.030.026 starting up
[03/Feb/2014:20:30:46 +051800] slapi_ldap_bind - Error: could not send bind request for id [cn=replication manager,cn=config] authentication mechanism [SIMPLE]: error -1 (Can't contact LDAP server), system error -5987 (Invalid function argument.), network error 107 (Transport endpoint is not connected, host "dhcp201-149.englab.pnq.redhat.com")
[03/Feb/2014:20:30:46 +051800] NSMMReplicationPlugin - agmt="cn=Master-consumer1" (dhcp201-149:391): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
[03/Feb/2014:20:30:46 +051800] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[03/Feb/2014:20:30:49 +051800] slapi_ldap_bind - Error: could not send bind request for id [cn=replication manager,cn=config] authentication mechanism [SIMPLE]: error -1 (Can't contact LDAP server), system error -5987 (Invalid function argument.), network error 107 (Transport endpoint is not connected, host "dhcp201-149.englab.pnq.redhat.com")
[03/Feb/2014:20:30:55 +051800] NSMMReplicationPlugin - agmt="cn=Master-consumer1" (dhcp201-149:391): Replication bind with SIMPLE auth resumed


After staring all the nodes , all entries were there on all the nodes (Master and consumers) . Replication did not break
Hence marking as VERIFIED.
Comment 9 Ludek Smid 2014-06-13 08:00:15 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.