Red Hat Bugzilla – Bug 187494
CVE-2006-2275 SCTP traffic probably never resumes
Last modified: 2007-11-30 17:07:24 EST
Escalated to Bugzilla from IssueTracker
I'm getting to understand what going on here I think. It appears that we have
1) Even though we are discarding frames due to lack of buffer space on the
receive side, we continue to ack them, constantly reopening our receive window.
2) When we fill our receive window, and then receive a frame, we immediately
fill it up again with the next packet that arrives, which we then discard, as
the chunks are bundled, making for a bigger packet that we much unilaterraly
accept or deny. This leads to a constant lack of reception of frames which
eventually leads to the sender giving up and sending an abort message
I think what I need to do here is write a patch that:
1) doesn't SACK frames that are discarded (i.e. don't add a GEN_SACK command in
sctp_eat_data_* if the return from sctp_eat_data is IGNORE_TSN).
2) provides hysteresis on receive buffer accounting that only processes received
frames when there is enough open space in the receive buffer to handle as much
data as has previously been dropped (to prevent the constant fill problem).
Created attachment 127313 [details]
patch to correct deadlock
This is the first pass at the patch that I am proposing upstream. It still
needs some cleanup, but its functional, and solves the problem
FYI, upstream had some disagreements with the patch and I am currently
reworking. Also, I've been meaning to mention this: A good deal of the delay
you may be seeing with this issue is the fact that heartbeat message are on a 30
second timer, reducing this value may help your association recover more quickly.
Created attachment 127482 [details]
new patch that solves the deadlock issue
This is the new version of the patch that I used to solve the deadlock issue.
Its much smaller and cleaner, and I'm currently proposing it upstream.
Created attachment 127633 [details]
Version of patch that was accepted upstream
This is the patch that is getting acceptance upstream, and what I will be
backporting to RHEL4
committed in stream U4 build 34.23. A test kernel with this patch is available
*** Bug 191259 has been marked as a duplicate of this bug. ***
The upstream fix is different from the proposed fix:
This has been assigned CVE-2006-2275.
So I just tried the reproducer they provided, and this definately isn't a
regression. In fact, this isn't really a bug at all, but rather its working as
designed. When the receiver is reniced to +19, the recive queue slowly backs up
(as one would expect, since the scheduler doesn't run the reciever as often), to
the point where occasionally frames are dropped, and retranmits are required.
So yes, traffic slows down, but it has to because traffic isn't being consumed
at the receiver as fast. But its definately not deadlocking, as this bug was
opened to fix. I'm removing the regression keyword.
Patch is in -42, setting verified.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.