Bug 1377730
Summary: | Replica install fails with old IPA master sometimes during replication process | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Kaleem <ksiddiqu> | |
Component: | 389-ds-base | Assignee: | Noriko Hosoi <nhosoi> | |
Status: | CLOSED ERRATA | QA Contact: | Viktor Ashirov <vashirov> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 7.3 | CC: | anazmy, gagriogi, gparente, mkolaja, mreynolds, msauton, nkinder, pvoborni, rcritten, rmeggins, spichugi, tbordaz, vonsch | |
Target Milestone: | rc | Keywords: | Regression, ZStream | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | 389-ds-base-1.3.6.1-3.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1378353 1384785 1385300 (view as bug list) | Environment: | ||
Last Closed: | 2017-08-01 21:10:21 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1378353, 1384785, 1385300 |
Description
Kaleem
2016-09-20 13:27:07 UTC
- Schema definitions can be different between two replicated instances. Replication relies of all instances sharing a common schema. If a supplier detects that schema definitions differs, it 'pushes' its schema at a beginning of a replication session. A problem exists if the 'pushed' schema (from supplier) is a subset of of the consumer schema. To prevent this it was introduced since DS 1.3.3 (48988, 47490 and 47676) a mechanism to check, before accepting a pushed schema, that the supplier schema is a superset of the consumer schema. - The pushed schema is rejected (UNWILLING_TO_PERFORM) During the total update, the schema is sent BUT the server schema (6.8) being a subset of replica schema (7.3), the update fails Server [20/Sep/2016:10:04:00 +0200] NSMMReplicationPlugin - agmt="cn=meTo<vm-fqdn>" (<vm>:389): Schema replication update failed: Server is unwilling to perform [20/Sep/2016:10:04:00 +0200] NSMMReplicationPlugin - Warning: unable to replicate schema to host <vm-fqdn>, port 389. Continuing with total update session. [20/Sep/2016:10:04:00 +0200] NSMMReplicationPlugin - Beginning total update of replica "agmt="cn=meTo<vm-fqdn>" (<vm>:389)". Consumer [20/Sep/2016:10:05:17.857331444 +0200] conn=4 op=1 BIND dn="cn=replication manager,cn=config" method=128 version=3 [20/Sep/2016:10:05:17.857701775 +0200] conn=4 op=1 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=replication manager,cn=config" ... [20/Sep/2016:10:05:17.860818168 +0200] conn=4 op=4 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop" [20/Sep/2016:10:05:18.206903643 +0200] conn=4 op=5 RESULT err=0 tag=101 nentries=1 etime=1 ... [20/Sep/2016:10:05:18.264291536 +0200] conn=4 op=6 MOD dn="cn=schema" [20/Sep/2016:10:05:18.970809613 +0200] conn=4 op=6 RESULT err=53 tag=103 nentries=0 etime=0 [20/Sep/2016:10:05:18.971069207 +0200] NSMMReplicationPlugin - process_postop: Failed to apply update () error (53). Aborting replication session(conn=4 op=6) - The total update continues on server side [20/Sep/2016:10:04:00 +0200] NSMMReplicationPlugin - Warning: unable to replicate schema to host <vm-fqdn>, port 389. Continuing with total update session. - Total update is aborted on the client side Since DS 1.3.5.1 (https://fedorahosted.org/389/ticket/47788), UNWILLING_TO_PERFORM of a replicated operation is seen as a fatal error in the session and triggers the closure of the connection. In that case, the total init does not complete and server-replica will not be insync - There is no easy workaround The only one I can think of is to copy the consumer (7.3) schema on the server side (6.8) - it is not clear to me why this problem is not systematic The scenario 6.x Server and 7.3 Replica, with schema on 6.x being a subset of 7.3 schema, should systematically trigger the failure of total update In conclusion: Total update can fail between a 6.x server and a replica >= 1.3.5.1, depending if 6.x schema < 7.3 schema The DS upstream ticket is https://fedorahosted.org/389/ticket/48992 A test fix was provided and replica installation succeeded everytime. Continuing testing (beaker) How to detect that you hit that bug: On the replica side: - In the access log - On a replication connection the update of 'cn=schema' return err=53 (unwilling to perform) - The replication connection gets closed immediately after the failure [22/Sep/2016:12:43:32.573970277 +051800] conn=4 fd=66 slot=66 connection ... [22/Sep/2016:12:43:32.574302922 +051800] conn=4 op=0 EXT oid="1.3.6.1.4.1.1466.20037" name="start_tls_plugin" [22/Sep/2016:12:43:32.574384201 +051800] conn=4 op=0 RESULT err=0 tag=120 nentries=0 etime=0 [22/Sep/2016:12:43:32.586336334 +051800] conn=4 TLS1.2 256-bit AES [22/Sep/2016:12:43:32.586562444 +051800] conn=4 op=1 BIND dn="cn=replication manager,cn=config" method=128 version=3 [22/Sep/2016:12:43:32.587033902 +051800] conn=4 op=1 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=replication manager,cn=config" .. [22/Sep/2016:12:43:32.589824012 +051800] conn=4 op=4 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop" [22/Sep/2016:12:43:32.747245763 +051800] conn=4 op=4 RESULT err=0 tag=120 nentries=0 etime=0 [22/Sep/2016:12:43:32.846425733 +051800] conn=4 op=5 SRCH base="cn=schema" scope=0 filter="(objectClass=*)" attrs="nsSchemaCSN" [22/Sep/2016:12:43:33.096844173 +051800] conn=4 op=5 RESULT err=0 tag=101 nentries=1 etime=0 [22/Sep/2016:12:43:33.228306558 +051800] conn=4 op=6 MOD dn="cn=schema" [22/Sep/2016:12:43:33.918336785 +051800] conn=4 op=6 RESULT err=53 tag=103 nentries=0 etime=1 [22/Sep/2016:12:43:33.921507058 +051800] conn=4 op=-1 fd=66 closed - B4 - In the error log at the same time [22/Sep/2016:12:43:33.860875397 +051800] schema - [C] Local objectClasses must not be overwritten (set replication log for additional info) [22/Sep/2016:12:43:33.918889184 +051800] NSMMReplicationPlugin - process_postop: Failed to apply update () error (53). Aborting replication session(conn=4 op=6) [22/Sep/2016:12:43:33.921988415 +051800] ERROR bulk import abandoned On the server side: - In the error log: The following messages are present [22/Sep/2016:12:43:34 +051800] NSMMReplicationPlugin - agmt="...: Schema replication update failed: Server is unwilling to perform [22/Sep/2016:12:43:34 +051800] NSMMReplicationPlugin - Warning: unable to replicate schema to host .... Continuing with total update session. [22/Sep/2016:12:43:34 +051800] NSMMReplicationPlugin - Beginning total update of replica "agmt="...." But NOT this on [22/Sep/2016:12:43:40 +051800] NSMMReplicationPlugin - Finished total update of replica "agmt="...". Sent 274 entries. The RC is identified and is the one described in https://bugzilla.redhat.com/show_bug.cgi?id=1377730#c3. When it fails we can see [22/Sep/2016:12:43:32.573970277 +051800] conn=4 fd=66 slot=66 connection ... [22/Sep/2016:12:43:32.574302922 +051800] conn=4 op=0 EXT oid="1.3.6.1.4.1.1466.20037" name="start_tls_plugin" [22/Sep/2016:12:43:32.574384201 +051800] conn=4 op=0 RESULT err=0 tag=120 nentries=0 etime=0 [22/Sep/2016:12:43:32.586336334 +051800] conn=4 TLS1.2 256-bit AES [22/Sep/2016:12:43:32.586562444 +051800] conn=4 op=1 BIND dn="cn=replication manager,cn=config" method=128 version=3 [22/Sep/2016:12:43:32.587033902 +051800] conn=4 op=1 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=replication manager,cn=config" .. [22/Sep/2016:12:43:32.589824012 +051800] conn=4 op=4 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop" [22/Sep/2016:12:43:32.747245763 +051800] conn=4 op=4 RESULT err=0 tag=120 nentries=0 etime=0 [22/Sep/2016:12:43:32.846425733 +051800] conn=4 op=5 SRCH base="cn=schema" scope=0 filter="(objectClass=*)" attrs="nsSchemaCSN" [22/Sep/2016:12:43:33.096844173 +051800] conn=4 op=5 RESULT err=0 tag=101 nentries=1 etime=0 [22/Sep/2016:12:43:33.228306558 +051800] conn=4 op=6 MOD dn="cn=schema" [22/Sep/2016:12:43:33.918336785 +051800] conn=4 op=6 RESULT err=53 tag=103 nentries=0 etime=1 [22/Sep/2016:12:43:33.921507058 +051800] conn=4 op=-1 fd=66 closed - B4 If making the MOD failure not fatal (e.g. err=19), the total update continues even after err=19 [22/Sep/2016:11:00:11.610214950 -0400] conn=4 op=0 EXT oid="1.3.6.1.4.1.1466.20037" name="start_tls_plugin" [22/Sep/2016:11:00:11.610263490 -0400] conn=4 op=0 RESULT err=0 tag=120 nentries=0 etime=0 [22/Sep/2016:11:00:11.627744863 -0400] conn=4 TLS1.2 256-bit AES [22/Sep/2016:11:00:11.628268117 -0400] conn=4 op=1 BIND dn="cn=replication manager,cn=config" method=128 version=3 [22/Sep/2016:11:00:11.628666564 -0400] conn=4 op=1 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=replication manager,cn=config" ... [22/Sep/2016:11:00:12.388909563 -0400] conn=4 op=6 MOD dn="cn=schema" [22/Sep/2016:11:00:13.015912789 -0400] conn=4 op=6 RESULT err=19 tag=103 nentries=0 etime=1 [22/Sep/2016:11:00:13.670500128 -0400] conn=4 op=7 EXT oid="2.16.840.1.113730.3.5.6" name="replication-multimaster-extop" [22/Sep/2016:11:00:13.676098009 -0400] conn=4 op=7 RESULT err=0 tag=120 nentries=0 etime=0 [22/Sep/2016:11:00:13.676219673 -0400] conn=4 op=8 EXT oid="2.16.840.1.113730.3.5.6" name="replication-multimaster-extop" [22/Sep/2016:11:00:13.676332467 -0400] conn=4 op=8 RESULT err=0 tag=120 nentries=0 etime=0 [22/Sep/2016:11:00:13.676443585 -0400] conn=4 op=9 EXT oid="2.16.840.1.113730.3.5.6" name="replication-multimaster-extop" [22/Sep/2016:11:00:13.676536335 -0400] conn=4 op=9 RESULT err=0 tag=120 nentries=0 etime=0 [22/Sep/2016:11:00:13.676628945 -0400] conn=4 op=10 EXT oid="2.16.840.1.113730.3.5.6" name="replication-multimaster-extop" [22/Sep/2016:11:00:13.676706609 -0400] conn=4 op=10 RESULT err=0 tag=120 nentries=0 etime=0 ... [22/Sep/2016:11:00:18.233988631 -0400] conn=4 op=282 UNBIND The problem occurs with two conditions: - replica is > DS 1.3.5 (bug https://fedorahosted.org/389/ticket/47788) This bug fix, triggers the connection closure if a replicated update is failure for example with err=53 - The server/supplier is pushing a schema although it is a subset of the replica schema This should only occur with 6.x. In 7.x server checks its schema is a superset before sending it. Impact of the bug: This bug could trigger a failure of total init as well as incremental update. It can be transient, because local 7.x server may "learn" definitions from the received schema so that it increase its nsSchemaCSN so that later replication connection will find greater nsSchemaCSN and will not attempt to send the schema Fix was pushed upstream master and 1.3.5. Move it to POST setting GSSApproved for RHEL-7.3.x this is important for successful IdM customer migration from RHEL 6 to RHEL 7. typo fix: setting GSSApproved for RHEL-7.3.z this is important for successful IdM customer migration from RHEL 6 to RHEL 7. As per Kaleem recommendation in IRC, I mark this bugzilla as VERIFIED because of the comment - https://bugzilla.redhat.com/show_bug.cgi?id=1378353#c9 It shows that IPA replica-install on 7.4 works with IPA master on 6.9. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2086 |