Bug 326101

Summary: Please add some ECC to the distribution file set and also file SHA hashes for the individual file contents within the ISO images
Product: [Fedora] Fedora Reporter: c.h. <fc6_req>
Component: fedora-releaseAssignee: David Cantrell <dcantrell>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 8   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-10 14:18:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description c.h. 2007-10-10 10:48:56 UTC
Description of problem:

[Please forgive me for not finding the appropriate 
'general fedora distribution file set' component ID if there is one.
There's a 5-component high scrolling list box of hundreds and hundreds
of generally ratner cryptically named components and apparently no
way to navigate / search / copy-paste (not even view source apparently!)
the list to find promising sounding ones.]

Just as a personal use case example of a larger problem --
I seem to have found some problem which
corrupts restarted downloads of Fedora's file:
Fedora-7.92-x86_64-DVD.iso.
I'm using GNU Wget 1.10.2 on Cygwin, and I really don't know if it's
the fault of my wget, the mirror site, or what, but long story short,
after a ~10 hour download, I ended up with a "successfully downloaded"
(according to wget) image with the expected number of bytes, the only
problem being that it didn't at all match the listed MD5/SHA hash given
in the distribution tree.  Somewhere in there there must have been something
like a single byte error probably at the exact file offset location where
the transfer interrupted and was subsequently restarted from that byte
offset.

Anyway, short of another multi-hour download of 100% of the original
file, there is apparently no way with FTP/HTTP (and I'm not sure about rsync)
to recover from such a problem.  It's happened to me several times in the past,
and I'm sure I'm not alone.  It's especially bad for multi-gigabyte files,
especially if one may have some kind of bandwidth cap / quota within which
one has to download it all anyway.

I suggest it'd be in the best interests of the users as well as in conserving
the bandwidth of the mirror sites et. al. to generate some ECC files like
for instance 'par2' format (q.v. sourceforge, et. al.) ones for the 
Fedora distributed files, and make those a part of the standard 
distribution generation process. 

I believe that this technique has been a de-facto standard for USENET for
a number of years.

Even if the error detection / recovery files were only enough to fix even 1%
of the data in the original file, it'd be a very cheap (in time and bandwidth)
'fix' for anyone with a slightly corrupted download that's almost certainly
99.9999% correct but which is totally useless and must be wholly discarded
unless some such ECC correction is available.  

Such errors could easily also occur if there's slight local disk sector or
memory corruption that causes a slight amount of errors in the large 
downloaded files, so I believe it'd be useful in saving hundreds of people
thousands of hours of repeated downloads for a very trivial cost in added
distribution generation scripting and distribution storage space.


Furthermore in another use case I encountered, I redownloaded the large
x86_64 DVD image, and got the expected checksum finally; then I burned
it to DVD and wanted to verify that the ostensibly successful burn
was in fact generating media that 'PASS'es the correctness test.
Since the X86_64 box wasn't physically available, I loaded the disk into
an i686 system to run just the media test.  That did not work since
apparently there is no generic i386 media test option, it had to be run
on x86_64 just to test the media which is a bit strange.

Anyway it strikes me as odd to have the MD5/SHA listed for say the ISO
images et. al. but not ALSO have the ISO images themselves contain
MD5SUM or SHA1SUM or whatever files for the contained files on the disk.
That'd make it much easier to validate your mirrored or burned copy of
Fedora without having to physically reboot the box to run the 'media test'
utility!  

Something as simple as the following would make it much easier for people
to detect problems with the files within the iso images:
cd /FC8_DVD_image
find . -type f -print0 | xargs -0 sha1sum > AllFiles.sha1sums


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Jesse Keating 2007-10-10 14:18:29 UTC
rsync does this automatically, no need for secondary files.

Bittorent also does this automatically, no need for secondary files.

Many mirrors offer rsync, and the torrents are very actively used.  These are
the best two methods to achieve what you're looking for.