Bug 962885
Summary: | RHEL 6.2 to 6.4 ipa upgrade selinuxusermap data not replicating | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Scott Poore <spoore> |
Component: | 389-ds-base | Assignee: | Rich Megginson <rmeggins> |
Status: | CLOSED ERRATA | QA Contact: | Sankar Ramalingam <sramling> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.4 | CC: | jgalipea, jrusnack, lnovich, mkosek, nhosoi, nkinder, rmeggins |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 389-ds-base-1.2.11.15-22.el6 | Doc Type: | Bug Fix |
Doc Text: |
Cause: IPA upgrade changes nsslapd-port to 0 for security reasons, to completely deny any traffic on the unencrypted port.
Consequence: The nsslapd-port is also used to construct the RUV used by replication. The replication startup code checks the existing RUV against the current hostname and port, finds that it is changed, removes the RUV, and removes the changelog. This causes a loss of changes and a partial reset of the state of the replica. A supplier will then attempt to send changes that already exist. A certain combination of these will cause the supplier to get into an endless loop attempting to send the same duplicate update over and over again.
Fix: At replication startup, if the port number is 0, do not remove the RUV element for the localhost, just assume the port number should not be changed.
Result: Changing the nsslapd-port to 0 will not remove the local replica from the RUV, and replication will continue to work.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-11-21 21:07:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Scott Poore
2013-05-14 17:13:51 UTC
Created attachment 747768 [details]
slapd log from master
Created attachment 747769 [details]
slapd log from replica
Is this a 389-ds-base issue? Should I change the component? Yes. The directory server supplier is sending over the same changes twice. The reason is because the RUV returned from the consumer (the Consumer RUV in the supplier error log) is bogus - the RUV element for the supplier (rid 4) is empty and even has the wrong port number in the pURL. The RUV element for the supplier should contain the max CSN of the most recent changes sent over. What I find odd is that we did a full complement of replication testing for RHEL 6.4 and we did not see this issue. Ok. Changing the component. This bug do sounds like something we would want for RHEL-6.5 (also based on your additional investigation). This is a case of a duplicate ADD - the entries were added directly to the replica earlier: [14/May/2013:12:57:00 -0400] conn=7 op=25 ADD dn="cn=selinux,dc=testrelm,dc=com" [14/May/2013:12:57:00 -0400] conn=7 op=25 RESULT err=0 tag=105 nentries=0 etime=0 csn=51926cdd000000030000 The replica was unable to send this change to the master because there was a problem with replication: [14/May/2013:12:57:00 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -2 (Local error) [14/May/2013:12:57:00 -0400] NSMMReplicationPlugin - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): Replication bind with GSSAPI auth failed: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Cannot determine realm for numeric host address)) replication resumes: [14/May/2013:12:59:12 -0400] NSMMReplicationPlugin - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): Replication bind with GSSAPI auth resumed schema repl issue: [14/May/2013:12:59:12 -0400] NSMMReplicationPlugin - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): Warning: unable to replicate schema: rc=1 The above add, and several other changes, appear to be missing from the supplier RUV: [14/May/2013:12:59:12 -0400] - _cl5PositionCursorForReplay (agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389)): Supplier RUV: [14/May/2013:12:59:12 -0400] NSMMReplicationPlugin - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): {replicageneration} 51926881000000040000 [14/May/2013:12:59:12 -0400] NSMMReplicationPlugin - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): {replica 3 ldap://ipaqavma.testrelm.com:389} 51926d51000000030000 51926d61000000030000 51926d60 Note that the min csn in the RUV element for the replica (rid 3) is 51926d51000000030000, which is greater than 51926cdd000000030000. But the consumer has this: [14/May/2013:12:59:12 -0400] NSMMReplicationPlugin - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): {replica 3 ldap://ipaqavma.testrelm.com:389} 51926887000800030000 51926cc4000000030000 00000000 51926cc4000000030000 is less than 51926cdd000000030000, so the master has not seen that change yet. The replica attempts to replay these changes to the master: [14/May/2013:12:59:12 -0400] agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389) - session start: anchorcsn=51926cc4000000030000 [14/May/2013:12:59:12 -0400] agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389) - clcache_load_buffer: rc=-30988 [14/May/2013:12:59:12 -0400] NSMMReplicationPlugin - changelog program - agmt="cn=meToqe-blade-09.testrelm.com" (qe-blade-09:389): CSN 51926aa6000000040000 not found and no purging, probably a reinit So, for some reason, the replica doesn't have 51926cc4000000030000. This is a change which originated on the replica (rid 3) - not sure why it isn't found. Since it isn't found, it tries to use the min csn from the supplier, which is 51926d51000000030000, which skips the changes made earlier. Upstream ticket: https://fedorahosted.org/389/ticket/47362 TET RHEL64 Sending testcases/DS/6.0/mmrepl/accept/accept.sh Transmitting file data . Committed revision 7625. TET trunk Sending testcases/DS/6.0/mmrepl/accept/accept.sh Transmitting file data . Committed revision 7626. As per comment #17, marking this bug with qe_test_coverage+ flag. TestCase [trac47362] result-> [PASS] with 389-ds-base-1.2.11.15-22 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1653.html |