205860 – yum does not detect/close TCP sockets in CLOSE_WAIT

Bug 205860 - yum does not detect/close TCP sockets in CLOSE_WAIT

Summary: yum does not detect/close TCP sockets in CLOSE_WAIT

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	python-urlgrabber
Sub Component:
Version:	9
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	James Antill
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-09-09 01:51 UTC by James Ralston
Modified:	2014-01-21 22:55 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-07-14 17:00:21 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description James Ralston 2006-09-09 01:51:31 UTC

Yum does not promptly close the TCP sockets it uses to contact various
repository servers.  This can be observed by waiting until yum has downloaded
all updates and is applying them, and then using lsof to examine yum's open file
descriptors.  Sockets to the repository servers yum utilized will still be in
the ESTABLISHED state.

Furthermore, if the remote server initiates a close of the TCP socket (S: FIN;
C: ACK), yum doesn't realize that the socket has new data waiting to be read
(the EOF).  On the yum host, the socket stays in CLOSE_WAIT; on the server, the
socket stays in FIN_WAIT_2.  This state persists until yum has completed all
operations, in which case it closes all sockets before it exits.  This can be
observed using lsof as per the previous paragraph.

This is nasty for two reasons:

1.  Yum shouldn't be holding sockets open to repo servers any longer than it
actually needs them.  Most repository servers are busy enough as it is; they
don't need additional resources needlessly consumed because yum isn't promptly
closing sockets.

2.  If the updates take any non-trivial amount of time to install, there is a
high probability that a stateful firewall somewhere the yum host and the repo
server will have timed out the connection, which means that the FIN packet from
the client (the third packet in the FIN/ACK/FIN/ACK sequence which closes a TCP
socket) will never reach the repo server, leaving a half-closed socket on both
the server and client (as the server waits for the final FIN and the client
waits for the final ACK).

Most operating systems nowadays will clean up these sockets eventually, but
again, they shouldn't occur in the first place.  Yum should promptly close
sockets that it is no longer using, and also promptly notice whenever the server
closes its end of the TCP socket.

Comment 1 Seth Vidal 2006-09-09 10:44:43 UTC

this is intentional. it is called 'keepalive' and it is also configurable

set keepalive=0 in your yum.conf under [main] and see if it goes away.

it's listed in the man page.

Comment 2 James Ralston 2007-05-02 16:53:17 UTC

While enabling the keepalive option is necessary for this bug to occur, yum's
mismanagement of TCP sockets has nothing to do with the keepalive option.

As per RFC793, a TCP connection progresses through a series of states during its
lifetime.  A TCP session is established via a three-way handshake: the client
sends a SYN packet to the server, the server responds with a SYN+ACK packet, and
the client responds with an ACK packet.  After the three-way handshake occurs,
the TCP connection is in the ESTABLISHED state on both the client and the server.

When yum is using the keepalive option, after yum establishes a TCP connection
to the web server, yum asks for the web server to enable the keepalive option
when it issues its request.  If the web server supports keepalive, after the web
server sends back its response, it will honor yum's request by keeping the TCP
connection open, instead of closing it immediately.

So, at this point, yum has a TCP connection to the web server in the ESTABLISHED
state.  If yum needs to issue another request to the web server, it can do so
using the already-established TCP connection, instead of making a new TCP
connection.  Yum is free to do other things while the ESTABLISHED connection
sits around; TCP connections can sit in the ESTABLISHED state indefinitely. 
Everything is fine with this picture so far.

If yum finishes its other business and wants to exit, it will close any
ESTABLISHED connections to web servers before doing so.  This is the proper
thing to do.  Everything is still fine.

However, web servers only honor the keepalive option for a limited time.  If the
web server doesn't see any additional requests from the client in a certain
amount of time (30 seconds is a common value), the web server will close the TCP
connection.

At a programming level, in order to close the TCP connection, the web server
will call the close() (or equivalent) call.  As per RFC793, the TCP/IP stack on
the web server will send send a FIN packet to the client.  The TCP/IP stack of
the client will automatically reply with an ACK packet, and the client's
operating system will note that there is new incoming data (an EOF indicator) to
be read on the TCP connection.

At this point, on the web server, the TCP connection is in the FIN_WAIT2 state.
 On the client, the TCP connection is in the CLOSE_WAIT state.  (See the diagram
on page 23 of RFC793.)  This is commonly referred to as a "half-closed" TCP
connection (although RFC793 does not use that terminology).

However, in order for the server to free the resources being consumed by the TCP
connection (called the TCB in RFC793), it needs the client to close its end of
the TCP connection.  The following steps need to occur:

1.  The client detects that the TCP connection to the server has new data to be
read.

2.  The client attempts to read the new data and receives an EOF indicator.

3.  The client, realizing that the server is attempting to close the TCP
connection, will call close() (or equivalent) on the TCP connection.

4.  The TCP/IP stack on the client will send a FIN packet to the server and move
to the LAST-ACK state.  The TCP/IP stack of the server, upon receiving the FIN,
replies with an ACK and moves to the TIME_WAIT state.  The TCP/IP stack of the
client, upon receiving the ACK, tears down the TCP connection and frees all
associated resources.  On the server, after the timeout for the TIME_WAIT state
expires, the TCP/IP stack tears down the TCP connection and frees all associated
resources.

*This* is where yum loses: yum never bothers to monitor any of its ESTABLISHED
(via keepalive) connections to detect when the (respective) servers are
attempting to close them.  Yum *must not* assume that it can let those
connections sit around for minutes or hours or days without bothering to check
them; yum *must* promptly notice new incoming data on TCP sockets and react
appropriately (detect the EOF and close its end of the connection).  If yum does
not do this, yum is essentially executing a denial of service attack against the
web server.

It's trivial to prove that yum is doing the wrong thing: run "yum update" and
ensure that at least one update is pending.  Wait until yum gets to the "Is this
ok [y/N]:" prompt, and then run lsof on the yum process:

$ date --iso-8601=seconds; lsof -p 5716 | grep TCP
2007-05-02T12:35:08-0400
yum 5716 root 6u IPv4 171185 TCP example.com:60773->ie.freshrpms.net:http
(ESTABLISHED)

$ date --iso-8601=seconds; lsof -p 5716 | grep TCP
2007-05-02T12:35:12-04
yum 5716 root 6u IPv4 171185 TCP example.com:60773->ie.freshrpms.net:http
(ESTABLISHED)

$ date --iso-8601=seconds; lsof -p 5716 | grep TCP
2007-05-02T12:35:15-04
yum 5716 root 6u IPv4 171185 TCP example.com:60773->ie.freshrpms.net:http
(CLOSE_WAIT)

$ date --iso-8601=seconds; lsof -p 5716 | grep TCP
2007-05-02T12:35:18-04
yum 5716 root 6u IPv4 171185 TCP example.com:60773->ie.freshrpms.net:http
(CLOSE_WAIT)

$ date --iso-8601=seconds; lsof -p 5716 | grep TCP
2007-05-02T12:35:21-04
yum 5716 root 6u IPv4 171185 TCP example.com:60773->ie.freshrpms.net:http
(CLOSE_WAIT)

$ date --iso-8601=seconds; lsof -p 5716 | grep TCP
2007-05-02T12:35:23-04
yum 5716 root 6u IPv4 171185 TCP example.com:60773->ie.freshrpms.net:http
(CLOSE_WAIT)

$ date --iso-8601=seconds; lsof -p 5716 | grep TCP
2007-05-02T12:35:26-04
yum 5716 root 6u IPv4 171185 TCP example.com:60773->ie.freshrpms.net:http
(CLOSE_WAIT)

The connection will sit in CLOSE_WAIT on the client (and FIN_WAIT2 on the
server) forever, because yum isn't paying the slightest attention to whether any
of its established TCP connections have pending data to be read.

This isn't an uncommon error in programs that deal with non-transient TCP
connections: at first blush, TCP connections look synchronous, but in fact they
are asynchronous beasts, and programs which use them must be prepared to handle
asynchronous events.  Both the Netscape Communicator and Mozilla browsers
suffered from the same problem.  After repeated bludgeoning, however, Mozilla
finally cleaned up its act:

https://bugzilla.mozilla.org/show_bug.cgi?id=104138
https://bugzilla.mozilla.org/show_bug.cgi?id=97833

Yum needs to do the same.

Comment 3 Seth Vidal 2007-05-02 17:00:53 UTC

just in case it wasn't obvious - yum doesn't do any of this directly.
reassigning to urlgrabber, but I'm betting it is below that level.

Comment 4 Bug Zapper 2008-05-14 02:20:31 UTC

Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 5 Bug Zapper 2009-06-09 22:16:55 UTC

This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 6 Bug Zapper 2009-07-14 17:00:21 UTC

Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.