| Summary: | Replication issues between 2 freeipa servers | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Valentin Bajrami <valentin.bajrami> |
| Component: | 389-ds-base | Assignee: | Noriko Hosoi <nhosoi> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Viktor Ashirov <vashirov> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.2 | CC: | nhosoi, nkinder, rmeggins, tbordaz, valentin.bajrami |
| Target Milestone: | pre-dev-freeze | ||
| Target Release: | 7.3 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-01-19 16:32:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Valentin Bajrami
2016-12-06 11:19:29 UTC
Hello, Valentin. Could you tell us how this issue started? The replication between ipamaster and ipamaster-1 was working, then it stopped? Or it never worked? You mentioned "Password synchronization". Could you give us more details about it? For instance, could you share the steps to duplicate the problem with us? We usually use the term "password synchronization" for the password synchronization between Active Directory and Directory Server/IPA. I'd think you don't mean it? Thierry, do you have any idea what is causing this problem? Thanks. Hello Valentin,
It is looking like the bug is related to a case, would you please link the case so that I can access all sosreport files.
With the current data I would say:
- On ipamaster it exists the entry
cn=repl keep alive 3,dc=intra,dc=sub,dc=domain,dc=tld
because its update was successfully applied in ipamaster as it was recorded in the retroCL
I do not know if it was created on ipamaster or on ipamaster-1
- ipamaster-1 --> ipamaster fails to send 580f52a6000300030000 but it is
likely the consequence of 580f4ec7000100030000 aborted on ipamaster that
triggered the connection closure under 580f52a6000300030000
A complete dataset is needed (config/access/errors log). In addition while it seems broken, you may turn replication logging on both servers to get more details.
It would be interested to know what is 580f4ec7000100030000 that is aborted.
Access log would help but also dumping the change from the changelog.
Is it a test environment or a production
Hi Valentin, We can not really progress on this bug without the requested data (https://bugzilla.redhat.com/show_bug.cgi?id=1401897#c2). By any chance are they still available somewhere ? Else we would have to close the bug. regards thierry (In reply to thierry bordaz from comment #4) > Hi Valentin, > > We can not really progress on this bug without the requested data > (https://bugzilla.redhat.com/show_bug.cgi?id=1401897#c2). > By any chance are they still available somewhere ? Else we would have to > close the bug. > > regards > thierry Hi Thierry, Thanks for getting back on this. The problem has been solved just today by re-initializing the replication. It took ~40s to replicate and the message was: [10/Jan/2017:12:07:41 +0100] NSMMReplicationPlugin - Beginning total update of replica "agmt="cn=meToipamaster.sub.domain.tld" (ipamaster:389)". [10/Jan/2017:12:08:13 +0100] NSMMReplicationPlugin - Finished total update of replica "agmt="cn=meToipamaster.sub.domain.tld" (ipamaster:389)". Sent 2289 entries. I'm not sure if this contributes anything to the problem, but this was the most straight forward step to take. We first tried to use force-sync which wasn't a success. Please close this bug report as you best see fit given the description. Hi Valentin, Thanks for your feedback. The messages "Beginning total" and "Finished total" are normal and show the normal completion of the re-init. It is difficult to comment the import rate (2289entries/40s) because many parameters can contribute (power of the box, size of entries, plugins enabled, indexed attributes...) but I would assume it is normal. You are right, reinit (moreover if it takes only few sec) is usually the fastest approach to recover. If you do not have the old logs (from early december) it will not be possible to identify the RC of the bug and closing it makes sense. In case you hit again such error: NSMMReplicationPlugin - process_postop: Failed to apply update (xxx) error (-1). Aborting replication session(conn=xxx op=xxx) You would get more info turning on some error log level (http://www.port389.org/docs/389ds/FAQ/faq.html#Troubleshooting) especially replication and possibly plugin. Then reopen that bug and attach the log(access/error)/config thanks Following previous update, closing the BZ |