Bug 237731
Summary: | Random SASL GSSAPI test failure on shadowfoot | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] 389 | Reporter: | Noriko Hosoi <nhosoi> | ||||||||||
Component: | Security - SASL | Assignee: | Noriko Hosoi <nhosoi> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Viktor Ashirov <vashirov> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 1.0.4 | CC: | nkinder, rmeggins | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2015-12-07 16:51:53 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 152373, 240316, 427409 | ||||||||||||
Attachments: |
|
Description
Noriko Hosoi
2007-04-25 01:22:11 UTC
Created attachment 153394 [details]
cvs diff sasl_io.c and the fix description
(In reply to comment #1) > Created an attachment (id=153394) [edit] > cvs diff sasl_io.c and the fix description It looks ok, but how do we handle this condition in other code? SSL code or even regular code may have the same problem, or they solve it in a different way? (In reply to comment #2) > (In reply to comment #1) > > Created an attachment (id=153394) [edit] [edit] > > cvs diff sasl_io.c and the fix description > > It looks ok, but how do we handle this condition in other code? SSL code or > even regular code may have the same problem, or they solve it in a different way? Regular code does not have this problem. When connection_read_ldap_data returns 0 (returned from PR_Recv), it really means the connection was closed by peer. Not like this SASL case: the connection is okay and a packet was received, but more packet(s) is expected to decrypt the entire message. I'll double-check the SSL code next (but I think the chance that is sharing the problem is very slim...). (In reply to comment #3) > Regular code does not have this problem. When connection_read_ldap_data returns > 0 (returned from PR_Recv), it really means the connection was closed by peer. > Not like this SASL case: the connection is okay and a packet was received, but > more packet(s) is expected to decrypt the entire message. I'll double-check the > SSL code next (but I think the chance that is sharing the problem is very slim...). Ok. Then I think your proposal is fine. I also think your proposal is fine. Created attachment 153451 [details]
cvs commit message
Thank you, Rich and Nathan, for the comments and the discussion.
Checked in into HEAD.
> It failed again, but it's changed the behavior! > [27/Apr/2007:05:14:39 -0700] conn=13 fd=64 slot=64 connection from 10.14.0.25 to 10.14.0.25 > [27/Apr/2007:05:14:39 -0700] conn=13 op=0 BIND dn="" method=sasl version=3 mech=GSSAPI > [27/Apr/2007:05:14:39 -0700] conn=13 op=0 RESULT err=14 tag=97 nentries=0 etime=0, SASL bind in progress > [27/Apr/2007:05:14:39 -0700] conn=13 op=1 BIND dn="" method=sasl version=3 mech=GSSAPI > [27/Apr/2007:05:14:39 -0700] conn=13 op=1 RESULT err=14 tag=97 nentries=0 etime=0, SASL bind in progress > [27/Apr/2007:05:14:39 -0700] conn=13 op=2 BIND dn="" method=sasl version=3 mech=GSSAPI > [27/Apr/2007:05:14:39 -0700] conn=13 op=2 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=sasl testentry 1,ou=testpeople,o=sasl.com" > *[27/Apr/2007:05:14:39 -0700] conn=13 op=-1 fd=64 closed error 11 (Resource temporarily unavailable) - unknown* > [27/Apr/2007:05:14:39 -0700] conn=14 fd=64 slot=64 connection from 10.14.0.25 to 10.14.0.25 > [27/Apr/2007:05:14:39 -0700] conn=14 op=0 BIND dn="cn=directory manager" method=128 version=3 > > Now, we have the better sight on this problem. PR_Recv in sasl_recv_connection gets EAGAIN (== errno 11), which is returned to connection_read_operation via connection_read_ldap_data. And the error number is checked in the macro SLAPD_PR_WOULD_BLOCK_ERROR if it's a temporary non-blocking I/O error or not as follows. > # define SLAPD_PR_WOULD_BLOCK_ERROR( prerrno ) \ > ((prerrno) == PR_WOULD_BLOCK_ERROR || (prerrno) == PR_IO_TIMEOUT_ERROR) > This EAGAIN is another candidate to be retried, but for now it's never been done and the connection is closed. > > We have another macro SLAPD_SYSTEM_WOULD_BLOCK_ERROR in slap.h. > /* > * SLAPD_SYSTEM_WOULD_BLOCK_ERROR() returns non-zero if syserrno is an OS > * error code that indicates a temporary non-blocking I/O error, > * e.g., EAGAIN. > */ > #define SLAPD_SYSTEM_WOULD_BLOCK_ERROR( syserrno ) \ > ((syserrno)==EAGAIN || (syserrno)==EWOULDBLOCK) > > Can we check the error number with the macro, as well? And if PR_Poll agrees, can we retry reading from the connection instead of closing it immediately? > Index: connection.c > =================================================================== > RCS file: /cvs/dirsec/ldapserver/ldap/servers/slapd/connection.c,v > retrieving revision 1.14 > diff -t -w -U4 -r1.14 connection.c > --- connection.c 27 Feb 2007 02:57:29 -0000 1.14 > +++ connection.c 27 Apr 2007 17:03:00 -0000 > @@ -1744,9 +1744,10 @@ > return CONN_DONE; > } > /* err = PR_GetError(); */ > /* If we would block, we need to poll for a while */ > - if ( SLAPD_PR_WOULD_BLOCK_ERROR( err ) ) { > + if ( SLAPD_PR_WOULD_BLOCK_ERROR( err ) || > + SLAPD_SYSTEM_WOULD_BLOCK_ERROR( err ) ) { > struct PRPollDesc pr_pd; > PRIntervalTime timeout = PR_MillisecondsToInterval(CONN_TURBO_TIMEOUT_INTERVAL); > pr_pd.fd = (PRFileDesc *)conn->c_prfd; > pr_pd.in_flags = PR_POLL_READ; Ok. Created attachment 153651 [details]
cvs commit message
Reviewed by Rich (Thank you!)
Checked in into HEAD.
Today's acceptance test still shows the error: [13/May/2007:05:14:30 -0700] conn=13 op=-1 fd=65 closed error 11 (Resource temporarily unavailable) - unknown The proposal in the comment #7 was not good enough to catch EAGAIN. I should have checked system error. Index: connection.c =================================================================== RCS file: /cvs/dirsec/ldapserver/ldap/servers/slapd/connection.c,v retrieving revision 1.15 diff -t -w -U4 -r1.15 connection.c --- connection.c 27 Apr 2007 18:00:21 -0000 1.15 +++ connection.c 14 May 2007 21:09:32 -0000 @@ -1744,10 +1744,11 @@ return CONN_DONE; } /* err = PR_GetError(); */ /* If we would block, we need to poll for a while */ + syserr = PR_GetOSError(); if ( SLAPD_PR_WOULD_BLOCK_ERROR( err ) || - SLAPD_SYSTEM_WOULD_BLOCK_ERROR( err ) ) { + SLAPD_SYSTEM_WOULD_BLOCK_ERROR( syserr ) ) { struct PRPollDesc pr_pd; PRIntervalTime timeout = PR_MillisecondsToInterval(CONN_TURBO_TIMEOUT_INTERVAL); pr_pd.fd = (PRFileDesc *)conn->c_prfd; pr_pd.in_flags = PR_POLL_READ; (In reply to comment #9) > Index: connection.c > =================================================================== > RCS file: /cvs/dirsec/ldapserver/ldap/servers/slapd/connection.c,v > retrieving revision 1.15 > diff -t -w -U4 -r1.15 connection.c > --- connection.c 27 Apr 2007 18:00:21 -0000 1.15 > +++ connection.c 14 May 2007 21:09:32 -0000 > @@ -1744,10 +1744,11 @@ > return CONN_DONE; > } > /* err = PR_GetError(); */ > /* If we would block, we need to poll for a while */ > + syserr = PR_GetOSError(); > if ( SLAPD_PR_WOULD_BLOCK_ERROR( err ) || > - SLAPD_SYSTEM_WOULD_BLOCK_ERROR( err ) ) { > + SLAPD_SYSTEM_WOULD_BLOCK_ERROR( syserr ) ) { > struct PRPollDesc pr_pd; > PRIntervalTime timeout = > PR_MillisecondsToInterval(CONN_TURBO_TIMEOUT_INTERVAL); > pr_pd.fd = (PRFileDesc *)conn->c_prfd; > pr_pd.in_flags = PR_POLL_READ; > Ok. Created attachment 154692 [details]
cvs commit message
Reviewed by Rich (Thank you!)
Checked in into HEAD.
|