Description of problem: Full updates initialized by a server of version 389-Directory/1.2.6 B2010.238.2133 via "nsDS5BeginReplicaRefresh: start" result in the following log entries: [26/Sep/2010:22:00:09 -0400] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=scripts,dc=mit,dc=edu is going off line; disabling replication [26/Sep/2010:22:00:09 -0400] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [26/Sep/2010:22:00:14 -0400] - sasl_io_start_packet: failed - read only 2 bytes of sasl packet length on connection 2 [26/Sep/2010:22:00:14 -0400] - ERROR bulk import abandoned [26/Sep/2010:22:00:14 -0400] - import userRoot: Aborting all Import threads... [26/Sep/2010:22:00:20 -0400] - import userRoot: Import threads aborted. [26/Sep/2010:22:00:20 -0400] - import userRoot: Closing files... [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/entryrdn.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/uidnumber.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/apacheServerAlias.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/gidnumber.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/parentid.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/nsuniqueid.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/aci.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/apacheServerName.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/uniquemember.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/cn.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/objectclass.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/uid.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - libdb: userRoot/id2entry.db4: unable to flush: No such file or directory [26/Sep/2010:22:00:21 -0400] - import userRoot: Import failed. [26/Sep/2010:22:00:21 -0400] - process_bulk_import_op: NULL backend The replication fails and the entire database is wiped and needs to be reinitialized. Full updates from 1.2.5 B2010.012.2024 work fine. [26/Sep/2010:22:13:25 -0400] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=scripts,dc=mit,dc=edu is going offline; disabling replication [26/Sep/2010:22:13:26 -0400] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [26/Sep/2010:22:13:37 -0400] - import userRoot: Workers finished; cleaning up... [26/Sep/2010:22:13:37 -0400] - import userRoot: Workers cleaned up. [26/Sep/2010:22:13:37 -0400] - import userRoot: Indexing complete. Post-processing... [26/Sep/2010:22:13:38 -0400] - import userRoot: Flushing caches... [26/Sep/2010:22:13:38 -0400] - import userRoot: Closing files... [26/Sep/2010:22:13:39 -0400] - import userRoot: Import complete. Processed 14280 entries in 13 seconds. (1098.46 entries/sec) [26/Sep/2010:22:13:39 -0400] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=scripts,dc=mit,dc=edu is coming online; enabling replication Version-Release number of selected component (if applicable): 389-Directory/1.2.6 B2010.238.2133 How reproducible: Always (though I haven't tried on a bare-bones setup yet.) Steps to Reproduce: 1. Setup two directory servers as per a multimaster replication setup. Initialize one server with some data. 2. Ask that server to do a full update of the other 3. Observe the logs of the "slave" sever. Actual results: Update fails, database is wiped. Expected results: Update proceeds as usual. Additional info: The source code that emits this error is: /* * NOTE: A better way to do this would be to read the bytes and add them to· * sp->encrypted_buffer - if offset < 4, tell caller we didn't read enough * bytes yet - if offset >= 4, decode the length and proceed. However, it * is highly unlikely that a request to read 4 bytes will return < 4 bytes, * perhaps only in error conditions, in which case the ret < 0 case above * will run */ So an obvious first fix is to implement the real algorithm suggested here, and see if another error shows up.
Created attachment 452769 [details] git patch file (master) Description: A SASL packet is made from the 4 byte length and the length size of payload. When the first 4 bytes were not successfully received by one PR_Recv call, sasl_io_start_packet in sasl_io.c considered an error occurred and set PR_IO_ERROR, which terminates the SASL IO session. To give clients a chance to send the rest of the length in the next packet, this patch sets PR_WOULD_BLOCK_ERROR to the nspr error code and EWOULDBLOCK/EAGAIN to errno and once the succeeding packet comes in, it appends it to the previous incomplete length data and continues the SASL IO. File: ldap/servers/slapd/sasl_io.c
Created attachment 452772 [details] git patch file (master) Description: A SASL packet is made from the 4 byte length and the length size of payload. When the first 4 bytes were not successfully received by one PR_Recv call, sasl_io_start_packet in sasl_io.c considered an error occurred and set PR_IO_ERROR, which terminates the SASL IO session. To give clients a chance to send the rest of the length in the next packet, this patch sets PR_WOULD_BLOCK_ERROR to the nspr error code and EWOULDBLOCK/EAGAIN to errno and once the succeeding packet comes in, it appends it to the previous incomplete length data and continues the SASL IO. File: ldap/servers/slapd/sasl_io.c
Thanks to Rich and Nathan for their comments and reviews. Pushed to master. $ git merge work Updating 6a1c273..310c056 Fast-forward ldap/servers/slapd/sasl_io.c | 101 +++++++++++++++++++++++++---------------- 1 files changed, 61 insertions(+), 40 deletions(-) $ git push Counting objects: 11, done. Delta compression using up to 4 threads. Compressing objects: 100% (6/6), done. Writing objects: 100% (6/6), 1.48 KiB, done. Total 6 (delta 4), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git 6a1c273..310c056 master -> master
Hello all, we verified today that this patch appears to fix our error, and we are now able to do full updates from 1.2.6 to 1.2.6 with GSSAPI. Thanks for the fix, and we look forward to seeing it in Fedora 13!
(In reply to comment #7) > Hello all, we verified today that this patch appears to fix our error, and we > are now able to do full updates from 1.2.6 to 1.2.6 with GSSAPI. Thanks for the > fix, and we look forward to seeing it in Fedora 13! Yes. 389-ds-base-1.2.7.a3 should be out in Updates Testing soon.
Thanks Rich, I am marking this bug as VERIFIED U based on comment#10.