Bug 675590

Summary: Fedora Project's metalink files don't use block checksums
Product: [Fedora] Fedora Reporter: Andre Robatino <robatino>
Component: distributionAssignee: Matt Domsch <matt_domsch>
Status: CLOSED NOTABUG QA Contact: Bill Nottingham <notting>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: dcantrell, james.antill, notting, rvokal
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-07 16:15:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Andre Robatino 2011-02-06 21:11:33 UTC
Description of problem:
Metalink files can use block checksums to provide BitTorrent-like robustness to direct downloads, so only bad blocks need to be re-downloaded. However, the automatically generated files only contain a single sha256 hash (like the regular checksum files). By generating the files using the "-d sha1pieces" option described in the metalink man page (in addition to "-d sha256"), it would be possible to repair bad downloads, not just detect them. It would make the metalink file bigger, but that's relatively small regardless.

For example, take a look at

http://mirrors.fedoraproject.org/metalink?path=pub/fedora/linux/releases/14/Fedora/i386/iso/Fedora-14-i386-DVD.iso

Disclaimer: I just starting looking into metalink files, so I might be misunderstanding something.

Comment 1 Bill Nottingham 2011-02-07 15:36:08 UTC
Assigning to MM maintainer; cc'ing yum folks for comments on what's supported there.

Comment 2 seth vidal 2011-02-07 15:41:31 UTC
yum doesn't download the entire iso. So that's out of scope for yum

For the data yum cares about I don't think storing block checksums of all packages or all metadata is a good way to go about providing partial downloads. Other than just creating a much larger file that must be downloaded everytime you check the repo.

Comment 3 Matt Domsch 2011-02-07 16:15:16 UTC
MirrorManager generates its own metalink documents, it doesn't use any stand-alone metalink creator.  You are correct, MM doesn't generate per-block SHA pieces for any file, including ISOs.  MM also "suggests" a single downloader connection (I say suggests because it can't force client behavior) per file, rather than encouraging multiple parallel connections to multiple mirrors each pulling partial files.  This is an intentional omission on my part.  Treating mirrors as if they were bittorrent servers, without the mirrors running as bittorrent seeds, serves to thrash the mirror buffer cache and cause increased disk I/O, reducing the capability to serve a larger number of users.  See "Bittorrent Considered Harmful" by John Hawley, kernel.org admin.

Comment 4 Andre Robatino 2011-02-07 16:45:44 UTC
But does providing block checksums make it more likely that a client will use parallel connections? (I'm ignorant in this area.)