Hide Forgot
Description of problem: Replication problems between freeipa servers. Password synchronization doesn't seem to work. The whole problem imho is caused by this uniqueid: .... targetuniqueid: ec5ffe96-c65211e5-a537be1e-d065a3fa .... targetDn: cn=repl keep alive 3,dc=intra,dc=sub,dc=domain,dc=tld Scenario: 1) ipamaster [root@ipamaster ~]# rpm -qa 389-ds-* 389-ds-base-libs-1.3.4.0-33.el7_2.x86_64 389-ds-base-debuginfo-1.3.4.0-33.el7_2.x86_64 389-ds-base-1.3.4.0-33.el7_2.x86_64 2) ipamaster-1 [root@ipamaster-1 ~]# rpm -qa 389-ds-* 389-ds-base-libs-1.3.4.0-33.el7_2.x86_64 389-ds-base-debuginfo-1.3.4.0-33.el7_2.x86_64 389-ds-base-1.3.4.0-33.el7_2.x86_64 root@ipamaster # tail -F /var/log/dirsrv/slapd-INTRA-SUB-DOMAIN-TLD/errors* [06/Dec/2016:11:54:30 +0100] NSMMReplicationPlugin - process_postop: Failed to apply update (580f4ec7000100030000) error (-1). Aborting replication session(conn=2632893 op=6) [06/Dec/2016:11:54:35 +0100] NSMMReplicationPlugin - process_postop: Failed to apply update (580f4ec7000100030000) error (-1). Aborting replication session(conn=2632898 op=6) [06/Dec/2016:11:54:41 +0100] NSMMReplicationPlugin - process_postop: Failed to apply update (580f4ec7000100030000) error (-1). Aborting replication session(conn=2632904 op=6) [06/Dec/2016:11:54:46 +0100] NSMMReplicationPlugin - process_postop: Failed to apply update (580f4ec7000100030000) error (-1). Aborting replication session(conn=2632905 op=6) root@ipamaster-1 # tail -F /var/log/dirsrv/slapd-INTRA-SUB-DOMAIN-TLD/errors* [05/Dec/2016:15:40:07 +0100] NSMMReplicationPlugin - agmt="cn=meToipamaster.intra.astro.rug.nl" (ipamaster:389): Replication bind with GSSAPI auth resumed [05/Dec/2016:15:40:08 +0100] NSMMReplicationPlugin - agmt="cn=meToipamaster.intra.astro.rug.nl" (ipamaster:389): Failed to send modify operation: LDAP error -1 (Can't contact LDAP server) [05/Dec/2016:15:40:08 +0100] NSMMReplicationPlugin - agmt="cn=meToipamaster.intra.astro.rug.nl" (ipamaster:389): Failed to send update operation to consumer (uniqueid ec5ffe96-c65211e5-a537be1e-d065a3fa, CSN 580f6b6b000200030000): Can't contact LDAP server. Will retry later. [05/Dec/2016:15:40:08 +0100] NSMMReplicationPlugin - agmt="cn=meToipamaster.intra.astro.rug.nl" (ipamaster:389): Consumer failed to replay change (uniqueid ec5ffe96-c65211e5-a537be1e-d065a3fa, CSN 580f52a6000300030000): Can't contact LDAP server(-1). Will retry later. [05/Dec/2016:15:40:08 +0100] NSMMReplicationPlugin - agmt="cn=meToipamaster.intra.astro.rug.nl" (ipamaster:389): Warning: unable to send endReplication extended operation (Can't contact LDAP server) Running the following command on the ipamaster, produces this output root@ipamaster # ldapsearch -x -W -h localhost -p 389 -D "cn=Directory Manager" -b "cn=changelog" | grep -C17 'ec5ffe96-c65211e5-a537be1e-d065a3fa' # 31567089, changelog dn: changenumber=31567089,cn=changelog objectClass: top objectClass: changelogentry objectClass: extensibleObject targetuniqueid: ec5ffe96-c65211e5-a537be1e-d065a3fa changeNumber: 31567089 targetDn: cn=repl keep alive 3,dc=intra,dc=sub,dc=domain,dc=tld changeTime: 20161205144220Z changeType: modify changes:: cmVwbGFjZToga2VlcGFsaXZldGltZXN0YW1wCmtlZXBhbGl2ZXRpbWVzdGFtcDogMjAx NjEyMDUxNDQyMjBaCi0KcmVwbGFjZTogbW9kaWZpZXJzbmFtZQptb2RpZmllcnNuYW1lOiBjbj1Nd Wx0aW1hc3RlciBSZXBsaWNhdGlvbiBQbHVnaW4sY249cGx1Z2lucyxjbj1jb25maWcKLQpyZXBsYW NlOiBtb2RpZtld0aW1lc3RhbXAKbW9kaWZ5dGltZXN0YW1wOiAyMDE2MTIwNTE0NDIyMFoKLQpyZXB sYWNlOiBlbnRyeXVzbgplbnRyeXVzbjogMTA1NzY4NTQzCi0KAA== Anyone any suggestion?
Hello, Valentin. Could you tell us how this issue started? The replication between ipamaster and ipamaster-1 was working, then it stopped? Or it never worked? You mentioned "Password synchronization". Could you give us more details about it? For instance, could you share the steps to duplicate the problem with us? We usually use the term "password synchronization" for the password synchronization between Active Directory and Directory Server/IPA. I'd think you don't mean it? Thierry, do you have any idea what is causing this problem? Thanks.
Hello Valentin, It is looking like the bug is related to a case, would you please link the case so that I can access all sosreport files. With the current data I would say: - On ipamaster it exists the entry cn=repl keep alive 3,dc=intra,dc=sub,dc=domain,dc=tld because its update was successfully applied in ipamaster as it was recorded in the retroCL I do not know if it was created on ipamaster or on ipamaster-1 - ipamaster-1 --> ipamaster fails to send 580f52a6000300030000 but it is likely the consequence of 580f4ec7000100030000 aborted on ipamaster that triggered the connection closure under 580f52a6000300030000 A complete dataset is needed (config/access/errors log). In addition while it seems broken, you may turn replication logging on both servers to get more details. It would be interested to know what is 580f4ec7000100030000 that is aborted. Access log would help but also dumping the change from the changelog. Is it a test environment or a production
Hi Valentin, We can not really progress on this bug without the requested data (https://bugzilla.redhat.com/show_bug.cgi?id=1401897#c2). By any chance are they still available somewhere ? Else we would have to close the bug. regards thierry
(In reply to thierry bordaz from comment #4) > Hi Valentin, > > We can not really progress on this bug without the requested data > (https://bugzilla.redhat.com/show_bug.cgi?id=1401897#c2). > By any chance are they still available somewhere ? Else we would have to > close the bug. > > regards > thierry Hi Thierry, Thanks for getting back on this. The problem has been solved just today by re-initializing the replication. It took ~40s to replicate and the message was: [10/Jan/2017:12:07:41 +0100] NSMMReplicationPlugin - Beginning total update of replica "agmt="cn=meToipamaster.sub.domain.tld" (ipamaster:389)". [10/Jan/2017:12:08:13 +0100] NSMMReplicationPlugin - Finished total update of replica "agmt="cn=meToipamaster.sub.domain.tld" (ipamaster:389)". Sent 2289 entries. I'm not sure if this contributes anything to the problem, but this was the most straight forward step to take. We first tried to use force-sync which wasn't a success. Please close this bug report as you best see fit given the description.
Hi Valentin, Thanks for your feedback. The messages "Beginning total" and "Finished total" are normal and show the normal completion of the re-init. It is difficult to comment the import rate (2289entries/40s) because many parameters can contribute (power of the box, size of entries, plugins enabled, indexed attributes...) but I would assume it is normal. You are right, reinit (moreover if it takes only few sec) is usually the fastest approach to recover. If you do not have the old logs (from early december) it will not be possible to identify the RC of the bug and closing it makes sense. In case you hit again such error: NSMMReplicationPlugin - process_postop: Failed to apply update (xxx) error (-1). Aborting replication session(conn=xxx op=xxx) You would get more info turning on some error log level (http://www.port389.org/docs/389ds/FAQ/faq.html#Troubleshooting) especially replication and possibly plugin. Then reopen that bug and attach the log(access/error)/config thanks
Following previous update, closing the BZ