Bug 357361 - slow package download (4/minute) with many tcp CLOSE_WAIT
slow package download (4/minute) with many tcp CLOSE_WAIT
Product: Fedora
Classification: Fedora
Component: pungi (Show other bugs)
All Linux
low Severity low
: ---
: ---
Assigned To: David Cantrell
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2007-10-29 17:26 EDT by John Reiser
Modified: 2013-01-09 20:42 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-11-16 07:37:08 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
netstat -a (9.63 KB, text/plain)
2007-10-29 17:27 EDT, John Reiser
no flags Details
Fedora8/work/i386/yumcache/rawhide/mirrorlist.txt (1.02 KB, text/plain)
2007-10-29 17:28 EDT, John Reiser
no flags Details
strace of pungi.gather on one already-cached file (16.09 KB, text/plain)
2007-10-30 00:16 EDT, John Reiser
no flags Details

  None (edit)
Description John Reiser 2007-10-29 17:26:50 EDT
Description of problem: Running pungi.gather against a mostly-good cache, I get 
"Downloading <package>" once per 15 seconds; I expect once per 1 second.  There
are 17 lines in work/i386/yumcache/rawhide/mirrorlist.txt.  The network is
mostly idle, but capable of 150KB/sec (and tested concurrently, and the
occasional package that actually does get downloaded does see almost that
speed.)  "netstat -a" shows 103 connections in tcp CLOSE_WAIT, all of them to
mirrors.  Why are things so slow?

Version-Release number of selected component (if applicable):

How reproducible: always

Steps to Reproduce:
1.pungi -c /usr/share/pungi/f8-fedora.ks --destdir=/data/Fedora8 --name Fedora
--ver 8 --discs=3
Actual results: one package per 15 seconds (if already in cache)

Expected results: one package per 1 seconds (if already in cache)

Additional info: cache is from Sat.Oct.27, before most [many?] rpms were signed;
but only about 10% of packages are actually re-downloaded.
Comment 1 John Reiser 2007-10-29 17:27:56 EDT
Created attachment 242421 [details]
netstat -a

note many connections in tcp CLOSE_WAIT with non-zero Recv-Q
Comment 2 John Reiser 2007-10-29 17:28:47 EDT
Created attachment 242441 [details]

mirror list with 16 sites listed
Comment 3 Jesse Keating 2007-10-29 17:43:32 EDT
Are you looking at the cli output or the logfile itself?  The logfile will list
the files we skipped over because they were complete in the cache, while the cli
will just say the ones we're actually downloading.

Other than that, all that code is within yum and urllib, not pungi.
Comment 4 John Reiser 2007-10-30 00:15:45 EDT
Attached will be an example strace which shows the 16-second elapsed time for
each already-cached file.  The net effect is a hardlink from a file in
/var/cache/pungi to the proper place in --destdir, but it takes 16 seconds. 
Some code thought that the cached copy was not good enough, tried to update it,
nothing changed, and the orginal cached copy got used anyway.

Background: Gathering all .rpm finally finished (17 hours over 2 days), then
pungi started to gather the .srpm, couldn't find some particular .srpm ("mo more
mirrors"), and aborted.  So I restarted pungi with the same commandline, but
this time pungi aborts soon after downloading the "indexes" because some file
already exists.  pungi does not tell the name of the culprit that already
exists.  So, I "rm -rf <destdir>", and re-issue the same pungi command again. 
Both the console (stderr, I guess) and the  logs/i386.log file say "Downloading
<package>", there is a 15 second pause during which there is some very small
network activity (only a very small fraction of what would be necessary to
download the entire package), then on about half the packages the i386.log file
says ".rpm already exists and appears to be complete".

The starring role in this excerpt of strace is played by:
$ ls -l /var/cache/pungi/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm
-rw-r--r-- 2 root root 11617 2007-09-05 15:32
[Notice the date is 7 weeks ago.]
Near the beginning of the strace, 3 F_SETLK64 calls are done on the same fd 8,
at offsets 0x40000000, which makes no sense.  Then the .rpm is found in the
/var/cache/pungi, and pungi reads the whole file.  Then 3 more F_SETLK64 are
done, again at offsets 0x40000000, and finally pungi decides to ftp the file
from ftp.cse.buffalo.edu.  Pungi also decides to resume from offset 11617, and
gets no more data.  At last, a hardlink is created from /var/cache/pungi to the
proper place in --destdir.  That took from 19:52:45 to 19:53:01, which is 16

Why was pungi not satisfied at first, yet finally happy to use the unmodified
original cached copy?  Could the be one or more logic bugs, such as comparing
dates or sizes using less-than instead of less-than-or-equal?  Or if the file
was indeed not the right one, then why was pungi satisfied in the end?

Comment 5 John Reiser 2007-10-30 00:16:45 EDT
Created attachment 242701 [details]
strace of pungi.gather on one already-cached file
Comment 6 John Reiser 2007-10-30 00:40:12 EDT
The answer is that the .rpms are different!

$ md5sum /var/cache/pungi/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm
53f36fa53f0436facc58a0e9730f75f2    ## the [stale] cached copy
$ md5sum $HOME/Desktop/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm
5024e662e6e651dfd8811ed67d5b73f9    ## from a current mirror (27-Oct-2007 09:49)

So pungi is letting stale cached rpms pass (at least if they have the same
length, and otherwise look OK?)
Comment 7 Jesse Keating 2007-11-15 14:45:35 EST
Are you still seeing this problem?  I made a change late in how pungi validates
the package in the cache, making it use yum to validate it.  Any downloading or
validation bugs are now yum/urllib issues.
Comment 8 John Reiser 2007-11-16 00:13:58 EST
No, I am not seeing this problem any more.

Note You need to log in before you can comment on or make changes to this bug.