637852 – sasl_io_start_packet: failed - read only 3 bytes of sasl packet length on connection 4

Bug 637852 - sasl_io_start_packet: failed - read only 3 bytes of sasl packet length on connection 4

Summary: sasl_io_start_packet: failed - read only 3 bytes of sasl packet length on con...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	389
Classification:	Retired
Component:	Security - SASL
Sub Component:
Version:	1.2.6
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Noriko Hosoi
QA Contact:	Ben Levenson
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	389_1.2.7 639035
TreeView+	depends on / blocked

Reported:	2010-09-27 15:39 UTC by Edward Z. Yang
Modified:	2015-12-10 18:38 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-12-10 18:38:49 UTC
Embargoed:

Attachments	(Terms of Use)
git patch file (master) (8.28 KB, patch) 2010-10-11 18:19 UTC, Noriko Hosoi	no flags	Details \| Diff
git patch file (master) (8.40 KB, patch) 2010-10-11 18:37 UTC, Noriko Hosoi	nkinder: review+ rmeggins: review+	Details \| Diff
Show Obsolete (1) View All

Description Edward Z. Yang 2010-09-27 15:39:46 UTC

Description of problem:
Full updates initialized by a server of version 389-Directory/1.2.6 B2010.238.2133 via "nsDS5BeginReplicaRefresh: start" result in the following log entries:

[26/Sep/2010:22:00:09 -0400] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=scripts,dc=mit,dc=edu is going off
line; disabling replication
[26/Sep/2010:22:00:09 -0400] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to
 access the database
[26/Sep/2010:22:00:14 -0400] - sasl_io_start_packet: failed - read only 2 bytes of sasl packet length on connection 2
[26/Sep/2010:22:00:14 -0400] - ERROR bulk import abandoned
[26/Sep/2010:22:00:14 -0400] - import userRoot: Aborting all Import threads...
[26/Sep/2010:22:00:20 -0400] - import userRoot: Import threads aborted.
[26/Sep/2010:22:00:20 -0400] - import userRoot: Closing files...
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/entryrdn.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/uidnumber.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/apacheServerAlias.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/gidnumber.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/parentid.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/nsuniqueid.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/aci.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/apacheServerName.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/uniquemember.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/cn.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/objectclass.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/uid.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - libdb: userRoot/id2entry.db4: unable to flush: No such file or directory
[26/Sep/2010:22:00:21 -0400] - import userRoot: Import failed.
[26/Sep/2010:22:00:21 -0400] - process_bulk_import_op: NULL backend

The replication fails and the entire database is wiped and needs to be reinitialized.

Full updates from 1.2.5 B2010.012.2024 work fine.

[26/Sep/2010:22:13:25 -0400] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=scripts,dc=mit,dc=edu is going offline; disabling replication
[26/Sep/2010:22:13:26 -0400] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[26/Sep/2010:22:13:37 -0400] - import userRoot: Workers finished; cleaning up...
[26/Sep/2010:22:13:37 -0400] - import userRoot: Workers cleaned up.
[26/Sep/2010:22:13:37 -0400] - import userRoot: Indexing complete.  Post-processing...
[26/Sep/2010:22:13:38 -0400] - import userRoot: Flushing caches...
[26/Sep/2010:22:13:38 -0400] - import userRoot: Closing files...
[26/Sep/2010:22:13:39 -0400] - import userRoot: Import complete.  Processed 14280 entries in 13 seconds. (1098.46 entries/sec)
[26/Sep/2010:22:13:39 -0400] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=scripts,dc=mit,dc=edu is coming online; enabling replication

Version-Release number of selected component (if applicable):
389-Directory/1.2.6 B2010.238.2133

How reproducible:
Always (though I haven't tried on a bare-bones setup yet.)

Steps to Reproduce:
1. Setup two directory servers as per a multimaster replication setup. Initialize one server with some data.
2. Ask that server to do a full update of the other
3. Observe the logs of the "slave" sever.
  
Actual results:
Update fails, database is wiped.

Expected results:
Update proceeds as usual.

Additional info:
The source code that emits this error is:

/*
 * NOTE: A better way to do this would be to read the bytes and add them to·
 * sp->encrypted_buffer - if offset < 4, tell caller we didn't read enough
 * bytes yet - if offset >= 4, decode the length and proceed.  However, it
 * is highly unlikely that a request to read 4 bytes will return < 4 bytes,
 * perhaps only in error conditions, in which case the ret < 0 case above
 * will run
 */

So an obvious first fix is to implement the real algorithm suggested here, and see if another error shows up.

Comment 3 Noriko Hosoi 2010-10-11 18:19:21 UTC

Created attachment 452769 [details]
git patch file (master)

Description: A SASL packet is made from the 4 byte length and
the length size of payload.  When the first 4 bytes were not
successfully received by one PR_Recv call, sasl_io_start_packet
in sasl_io.c considered an error occurred and set PR_IO_ERROR,
which terminates the SASL IO session.
To give clients a chance to send the rest of the length in the
next packet, this patch sets PR_WOULD_BLOCK_ERROR to the nspr
error code and EWOULDBLOCK/EAGAIN to errno and once the succeeding
packet comes in, it appends it to the previous incomplete length
data and continues the SASL IO.

File: 
 ldap/servers/slapd/sasl_io.c

Comment 4 Noriko Hosoi 2010-10-11 18:37:59 UTC

Created attachment 452772 [details]
git patch file (master)

Description: A SASL packet is made from the 4 byte length and
the length size of payload.  When the first 4 bytes were not
successfully received by one PR_Recv call, sasl_io_start_packet
in sasl_io.c considered an error occurred and set PR_IO_ERROR,
which terminates the SASL IO session.
To give clients a chance to send the rest of the length in the
next packet, this patch sets PR_WOULD_BLOCK_ERROR to the nspr
error code and EWOULDBLOCK/EAGAIN to errno and once the succeeding
packet comes in, it appends it to the previous incomplete length
data and continues the SASL IO.

File:
 ldap/servers/slapd/sasl_io.c

Comment 6 Noriko Hosoi 2010-10-11 23:41:18 UTC

Thanks to Rich and Nathan for their comments and reviews.

Pushed to master.

$ git merge work
Updating 6a1c273..310c056
Fast-forward
 ldap/servers/slapd/sasl_io.c |  101 +++++++++++++++++++++++++----------------
 1 files changed, 61 insertions(+), 40 deletions(-)

$ git push
Counting objects: 11, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 1.48 KiB, done.
Total 6 (delta 4), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   6a1c273..310c056  master -> master

Comment 7 Edward Z. Yang 2010-10-27 13:00:36 UTC

Hello all, we verified today that this patch appears to fix our error, and we are now able to do full updates from 1.2.6 to 1.2.6 with GSSAPI. Thanks for the fix, and we look forward to seeing it in Fedora 13!

Comment 8 Rich Megginson 2010-10-27 21:26:46 UTC

(In reply to comment #7)
> Hello all, we verified today that this patch appears to fix our error, and we
> are now able to do full updates from 1.2.6 to 1.2.6 with GSSAPI. Thanks for the
> fix, and we look forward to seeing it in Fedora 13!

Yes.  389-ds-base-1.2.7.a3 should be out in Updates Testing soon.

Comment 11 Amita Sharma 2011-09-29 08:43:46 UTC

Thanks Rich, I am marking this bug as VERIFIED U based on comment#10.

Note You need to log in before you can comment on or make changes to this bug.