Bug 357361
Summary: | slow package download (4/minute) with many tcp CLOSE_WAIT | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | John Reiser <jreiser> | ||||||||
Component: | pungi | Assignee: | David Cantrell <dcantrell> | ||||||||
Status: | CLOSED WORKSFORME | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | low | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | rawhide | ||||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-11-16 12:37:08 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
John Reiser
2007-10-29 21:26:50 UTC
Created attachment 242421 [details]
netstat -a
note many connections in tcp CLOSE_WAIT with non-zero Recv-Q
Created attachment 242441 [details]
Fedora8/work/i386/yumcache/rawhide/mirrorlist.txt
mirror list with 16 sites listed
Are you looking at the cli output or the logfile itself? The logfile will list the files we skipped over because they were complete in the cache, while the cli will just say the ones we're actually downloading. Other than that, all that code is within yum and urllib, not pungi. Attached will be an example strace which shows the 16-second elapsed time for each already-cached file. The net effect is a hardlink from a file in /var/cache/pungi to the proper place in --destdir, but it takes 16 seconds. Some code thought that the cached copy was not good enough, tried to update it, nothing changed, and the orginal cached copy got used anyway. Background: Gathering all .rpm finally finished (17 hours over 2 days), then pungi started to gather the .srpm, couldn't find some particular .srpm ("mo more mirrors"), and aborted. So I restarted pungi with the same commandline, but this time pungi aborts soon after downloading the "indexes" because some file already exists. pungi does not tell the name of the culprit that already exists. So, I "rm -rf <destdir>", and re-issue the same pungi command again. Both the console (stderr, I guess) and the logs/i386.log file say "Downloading <package>", there is a 15 second pause during which there is some very small network activity (only a very small fraction of what would be necessary to download the entire package), then on about half the packages the i386.log file says ".rpm already exists and appears to be complete". The starring role in this excerpt of strace is played by: ----- $ ls -l /var/cache/pungi/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm -rw-r--r-- 2 root root 11617 2007-09-05 15:32 /var/cache/pungi/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm ----- [Notice the date is 7 weeks ago.] Near the beginning of the strace, 3 F_SETLK64 calls are done on the same fd 8, at offsets 0x40000000, which makes no sense. Then the .rpm is found in the /var/cache/pungi, and pungi reads the whole file. Then 3 more F_SETLK64 are done, again at offsets 0x40000000, and finally pungi decides to ftp the file from ftp.cse.buffalo.edu. Pungi also decides to resume from offset 11617, and gets no more data. At last, a hardlink is created from /var/cache/pungi to the proper place in --destdir. That took from 19:52:45 to 19:53:01, which is 16 seconds. Why was pungi not satisfied at first, yet finally happy to use the unmodified original cached copy? Could the be one or more logic bugs, such as comparing dates or sizes using less-than instead of less-than-or-equal? Or if the file was indeed not the right one, then why was pungi satisfied in the end? Created attachment 242701 [details]
strace of pungi.gather on one already-cached file
The answer is that the .rpms are different! $ md5sum /var/cache/pungi/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm 53f36fa53f0436facc58a0e9730f75f2 ## the [stale] cached copy /var/cache/pungi/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm $ md5sum $HOME/Desktop/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm 5024e662e6e651dfd8811ed67d5b73f9 ## from a current mirror (27-Oct-2007 09:49) 11617/home/jreiser/Desktop/xorg-x11-drv-calcomp-1.1.0-4.fc8.i386.rpm So pungi is letting stale cached rpms pass (at least if they have the same length, and otherwise look OK?) Are you still seeing this problem? I made a change late in how pungi validates the package in the cache, making it use yum to validate it. Any downloading or validation bugs are now yum/urllib issues. No, I am not seeing this problem any more. |