Description of problem:
RST is received, but sockets are hung until timeout.
Version-Release number of selected component (if applicable):
- RHEL3 Update1 (BEA certified level).
- Proliant DL380
- WebLogic 8.1 (jrockit)
- There is load balancer in front of this
Sooner or later there is so many of these hung sockets that the
service will be blocked out. Happens randomly.
Steps to Reproduce:
1. Wait and wait
Jrockit runs out of free threads as those are 'sending until timeout'
RST would kill the connection/socket
Created attachment 101632 [details]
The capture tcpdump data + strace info
Either the checksum or something else about the RESET packet
makes it unacceptable. That is why there are still ACKs
coming back from the machine.
Something, either aaa.bbb.ccc.ddd or some machine in between
(most likely, load balancing and firewall boxes are notorious
for corrupting TCP packets) is messing with the contents.
I don't think anything shown so far indicate that the RHEL3
machine is doing anything wrong.
Another thing of note in the traces is that as 'machine' is
still sending ACKs back, aaa.bbb.ccc.ddd is not sending a
RST packet back in response when it very well should.
Something is definitely amiss on the path from aaa.bbb.ccc.ddd
to the RHEL3 box.
OK. Customer and BEA asked me to make this bugzilla entry.
That must be some other equipment (firewall or the load balancer
then). I have been kind of stupid. I have watched tools like nmap
sending ACKS to host to probe and see RST coming back. It's so obvious
when i read David's comment above.
I have to bug other people with this. lowering the value for
will help those to timeout sooner, but there is still risk that
WebLogic runs out of threads.
I change this entry myself to NOTABUG