Bug 187494

Summary: CVE-2006-2275 SCTP traffic probably never resumes
Product: Red Hat Enterprise Linux 4 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: holtmann, jbaron, security-response-team, tao
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: impact=moderate,source=vendorsec,reported=20060509,public=20060509
Fixed In Version: RHSA-2006-0575 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 22:59:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409    
Attachments:
Description Flags
patch to correct deadlock
none
new patch that solves the deadlock issue
none
Version of patch that was accepted upstream none

Description Issue Tracker 2006-03-31 12:23:34 UTC
Escalated to Bugzilla from IssueTracker

Comment 3 Neil Horman 2006-03-31 21:56:58 UTC
I'm getting to understand what going on here I think.  It appears that we have
two problems:

1) Even though we are discarding frames due to lack of buffer space on the
receive side, we continue to ack them, constantly reopening our receive window.  

2) When we fill our receive window, and then receive a frame, we immediately
fill it up again with the next packet that arrives, which we then discard, as
the chunks are bundled, making for a bigger packet that we much unilaterraly
accept or deny. This leads to a constant lack of reception of frames which
eventually leads to the sender giving up and sending an abort message 

I think what I need to do here is write a patch that:
1) doesn't SACK frames that are discarded (i.e. don't add a GEN_SACK command in
sctp_eat_data_* if the return from sctp_eat_data is IGNORE_TSN).

2) provides hysteresis on receive buffer accounting that only processes received
frames when there is enough open space in the receive buffer to handle as much
data as has previously been dropped (to prevent the constant fill problem).

Comment 4 Neil Horman 2006-04-04 21:01:11 UTC
Created attachment 127313 [details]
patch to correct deadlock

This is the first pass at the patch that I am proposing upstream.  It still
needs some cleanup, but its functional, and solves the problem

Comment 5 Neil Horman 2006-04-07 12:10:07 UTC
FYI, upstream had some disagreements with the patch and I am currently
reworking.  Also, I've been meaning to mention this:  A good deal of the delay
you may be seeing with this issue is the fact that heartbeat message are on a 30
second timer, reducing this value may help your association recover more quickly.


Comment 6 Neil Horman 2006-04-07 20:43:40 UTC
Created attachment 127482 [details]
new patch that solves the deadlock issue

This is the new version of the patch that I used to solve the deadlock issue.
Its much smaller and cleaner, and I'm currently proposing it upstream.

Comment 7 Neil Horman 2006-04-11 19:35:31 UTC
Created attachment 127633 [details]
Version of patch that was accepted upstream

This is the patch that is getting acceptance upstream, and what I will be
backporting to RHEL4

Comment 9 Jason Baron 2006-04-25 10:40:33 UTC
committed in stream U4 build 34.23. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 12 Neil Horman 2006-05-10 15:14:13 UTC
*** Bug 191259 has been marked as a duplicate of this bug. ***

Comment 13 Marcel Holtmann 2006-05-10 15:24:27 UTC
The upstream fix is different from the proposed fix:

http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7c3ceb4fb9667f34f1599a062efecf4cdc4a4ce5


Comment 14 Marcel Holtmann 2006-05-10 15:26:41 UTC
This has been assigned CVE-2006-2275.


Comment 18 Neil Horman 2006-05-22 14:46:34 UTC
So I just tried the reproducer they provided, and this definately isn't a
regression.  In fact, this isn't really a bug at all, but rather its working as
designed.  When the receiver is reniced to +19, the recive queue slowly backs up
(as one would expect, since the scheduler doesn't run the reciever as often), to
the point where occasionally frames are dropped, and retranmits are required. 
So yes, traffic slows down, but it has to because traffic isn't being consumed
at the receiver as fast.  But its definately not deadlocking, as this bug was
opened to fix.  I'm removing the regression keyword.

Comment 20 Mike Gahagan 2006-07-14 21:27:32 UTC
Patch is in -42, setting verified.


Comment 22 Red Hat Bugzilla 2006-08-10 22:59:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html