Bug 357361 - slow package download (4/minute) with many tcp CLOSE_WAIT
Summary: slow package download (4/minute) with many tcp CLOSE_WAIT
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: pungi
Version: rawhide
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: David Cantrell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-10-29 21:26 UTC by John Reiser
Modified: 2013-01-10 01:42 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-16 12:37:08 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
netstat -a (9.63 KB, text/plain)
2007-10-29 21:27 UTC, John Reiser
no flags Details
Fedora8/work/i386/yumcache/rawhide/mirrorlist.txt (1.02 KB, text/plain)
2007-10-29 21:28 UTC, John Reiser
no flags Details
strace of pungi.gather on one already-cached file (16.09 KB, text/plain)
2007-10-30 04:16 UTC, John Reiser
no flags Details

Description John Reiser 2007-10-29 21:26:50 UTC
Description of problem: Running pungi.gather against a mostly-good cache, I get 
"Downloading <package>" once per 15 seconds; I expect once per 1 second.  There
are 17 lines in work/i386/yumcache/rawhide/mirrorlist.txt.  The network is
mostly idle, but capable of 150KB/sec (and tested concurrently, and the
occasional package that actually does get downloaded does see almost that
speed.)  "netstat -a" shows 103 connections in tcp CLOSE_WAIT, all of them to
mirrors.  Why are things so slow?

Version-Release number of selected component (if applicable):
pungi-1.1.8-1.fc8
kernel-2.6.23.1-37.fc8


How reproducible: always


Steps to Reproduce:
1.pungi -c /usr/share/pungi/f8-fedora.ks --destdir=/data/Fedora8 --name Fedora
--ver 8 --discs=3
2.
3.
  
Actual results: one package per 15 seconds (if already in cache)


Expected results: one package per 1 seconds (if already in cache)


Additional info: cache is from Sat.Oct.27, before most [many?] rpms were signed;
but only about 10% of packages are actually re-downloaded.

Comment 1 John Reiser 2007-10-29 21:27:56 UTC
Created attachment 242421 [details]
netstat -a

note many connections in tcp CLOSE_WAIT with non-zero Recv-Q

Comment 2 John Reiser 2007-10-29 21:28:47 UTC
Created attachment 242441 [details]
Fedora8/work/i386/yumcache/rawhide/mirrorlist.txt

mirror list with 16 sites listed

Comment 3 Jesse Keating 2007-10-29 21:43:32 UTC
Are you looking at the cli output or the logfile itself?  The logfile will list
the files we skipped over because they were complete in the cache, while the cli
will just say the ones we're actually downloading.

Other than that, all that code is within yum and urllib, not pungi.

Comment 4 John Reiser 2007-10-30 04:15:45 UTC
Attached will be an example strace which shows the 16-second elapsed time for
each already-cached file.  The net effect is a hardlink from a file in
/var/cache/pungi to the proper place in --destdir, but it takes 16 seconds. 
Some code thought that the cached copy was not good enough, tried to update it,
nothing changed, and the orginal cached copy got used anyway.

Background: Gathering all .rpm finally finished (17 hours over 2 days), then
pungi started to gather the .srpm, couldn't find some particular .srpm ("mo more
mirrors"), and aborted.  So I restarted pungi with the same commandline, but
this time pungi aborts soon after downloading the "indexes" because some file
already exists.  pungi does not tell the name of the culprit that already
exists.  So, I "rm -rf <destdir>", and re-issue the same pungi command again. 
Both the console (stderr, I guess) and the  logs/i386.log file say "Downloading
<package>", there is a 15 second pause during which there is some very small
network activity (only a very small fraction of what would be necessary to
download the entire package), then on about half the packages the i386.log file
says ".rpm already exists and appears to be complete".

The starring role in this excerpt of strace is played by:
-----
$ ls -l /var/cache/pungi/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm
-rw-r--r-- 2 root root 11617 2007-09-05 15:32
/var/cache/pungi/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm
-----
[Notice the date is 7 weeks ago.]
Near the beginning of the strace, 3 F_SETLK64 calls are done on the same fd 8,
at offsets 0x40000000, which makes no sense.  Then the .rpm is found in the
/var/cache/pungi, and pungi reads the whole file.  Then 3 more F_SETLK64 are
done, again at offsets 0x40000000, and finally pungi decides to ftp the file
from ftp.cse.buffalo.edu.  Pungi also decides to resume from offset 11617, and
gets no more data.  At last, a hardlink is created from /var/cache/pungi to the
proper place in --destdir.  That took from 19:52:45 to 19:53:01, which is 16
seconds.

Why was pungi not satisfied at first, yet finally happy to use the unmodified
original cached copy?  Could the be one or more logic bugs, such as comparing
dates or sizes using less-than instead of less-than-or-equal?  Or if the file
was indeed not the right one, then why was pungi satisfied in the end?





Comment 5 John Reiser 2007-10-30 04:16:45 UTC
Created attachment 242701 [details]
strace of pungi.gather on one already-cached file

Comment 6 John Reiser 2007-10-30 04:40:12 UTC
The answer is that the .rpms are different!

$ md5sum /var/cache/pungi/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm
53f36fa53f0436facc58a0e9730f75f2    ## the [stale] cached copy
/var/cache/pungi/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm
$ md5sum $HOME/Desktop/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm
5024e662e6e651dfd8811ed67d5b73f9    ## from a current mirror (27-Oct-2007 09:49)
             11617/home/jreiser/Desktop/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm

So pungi is letting stale cached rpms pass (at least if they have the same
length, and otherwise look OK?)

Comment 7 Jesse Keating 2007-11-15 19:45:35 UTC
Are you still seeing this problem?  I made a change late in how pungi validates
the package in the cache, making it use yum to validate it.  Any downloading or
validation bugs are now yum/urllib issues.

Comment 8 John Reiser 2007-11-16 05:13:58 UTC
No, I am not seeing this problem any more.


Note You need to log in before you can comment on or make changes to this bug.