Bug 41757

Summary: filesystem corrupted - sectors not written out to optical/DVD-RAM disk
Product: [Retired] Red Hat Linux Reporter: Trevin Beattie <trevin>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: medium    
Version: 7.1   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:39:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Trevin Beattie 2001-05-22 02:46:38 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.2-2 i586)

Description of problem:
I first noticed this problem when I tried to update my backup files on
DVD-RAM and got dozens of errors, including strange things like regular
files being linked to directories.  I spent a couple of days repairing the
backups; each one had to be checked and often redone several times, the
most common problem occurring on rewrites being directory entries linked to
unused inodes or an inode linked with unused blocks.

After I finally had a stable backup that compared equal to the original
files, I upgraded from RedHat 7.0 to RedHat 7.1.  Then I attempted a
recursive diff between my backup of the /etc directory and the updated
configuration files.  To my surprise, the backup DVD which had passed
e2fsck before the upgrade was now showing dozens more errors!  This time,
every error looked like it was due to inodes which were zeroed out that
should have been in use.

I was hoping that the error was just in RedHat 7.0, so I opened up a new
optical disk (640MB for a Fujitsu DynaMO), created a new e2fs filesystem on
it, mounted it with the 'errors=remount-ro' option, and tried to copy
~600MB worth of files to it.  Halfway through (49% full) I got an error.

Just to make sure this wasn't a bad disk, I rad badblocks(8) on the disk,
re-created a new e2fs filesystem, and tried copying ~600MB worth of files
to it again.  This time I left the 'errors' option off.  It got almost all
the way through this time (85%) before reporting an I/O error on a file,
but when I ran a recursive diff, three files before that one were
mismatched.  I did a quick cmp of one of them and found that 32 blocks in
the middle of the file were never written out.  (They were allocated, but
the sectors were blank).

How reproducible:
Always

Steps to Reproduce:
1. You need either a MO drive or DVD-RAM, with 2048 bytes-per-sector media
(not sure if this is required, but I haven't seen any corruption on my hard
drive ... yet ... so I presume it's limited to optical disks).
2. "mke2fs /dev/scd0" (or whichever device is your optical drive)
3. "mount /dev/scd0 /mnt/dvdram"
4. "cp -a /etc /var /mnt/dvdram/" (copy whatever you like.  You should have
a lot of files to copy.  The occurrence of the problem seems to be random,
but the more files I have to back up, the more corrupted files/inodes I'm
likely to get.)
5. Your choice - to detect inode/directory corruption:
5a. "umount /dev/scd0"
5b. "e2fsck -f -C 0 /dev/scd0"
6. To detect file corruption:
6a. "mount -o ro /dev/scd0 /mnt/dvdram" (if you have unmounted it in step
5)
6b. "diff --recursive --brief /etc /mnt/dvdram/etc"
6c. "diff --recursive --brief /var /mnt/dvdram/var"


Actual Results:  Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'rc0.d' in /etc (898) has deleted/unused inode 1763.  Clear? no

Entry 'rc0.d' in /etc (898) has an incorrect filetype (was 0t, should be 0)
Fix? no

... (100 more errors exactly like the above) ...
... (I'm not kidding: # grep "deleted/unused inode" /tmp/backup-errors | wc
-l
    101
     ) ...

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 898 ref count is 47, should be 39.  Fix? no

Inode 1361 ref count is 10, should be 5.  Fix? no

Pass 5: Checking group summary information

Block bitmap differences:  -5300 -5301 -5302 -5303 -5304 -5305 -5306 -5307
-5308
... (a few dozen lines of this) ...

Inode bitmap differences:  -1409 -1410 -1411 -1412 -1413 -1472 -1473 -1474
-1475
... (a few dozen lines of that) ...
Fix? no

Directories count wrong for group #0 (208, counted=186).
Fix? no

/dev/scd0: 40440/305216 files (0.1% non-contiguous), 184801/609480 blocks


Additional info:

My specific equipment:
Creative PC-DVD RAM:
  Vendor: CREATIVE  Model: DVD-RAM RAM1216S  Rev: 1311
  Type:   CD-ROM                             ANSI SCSI revision: 02
Fujitsu DynaMO 640:
  Vendor: FUJITSU   Model: M2513A            Rev: 1700
  Type:   Optical Device                     ANSI SCSI revision: 02

Comment 1 Alan Cox 2001-05-22 09:18:49 UTC
Known problem. Use 2.2 if you are using non 512byte media. 


Comment 2 Trevin Beattie 2001-05-22 18:19:58 UTC
Sorry, I forgot to update the version # before submitting this report, but the
bug affects both 7.0 (kernels 2.2.16-22 and 2.2.17-14) and 7.1 (2.4.2-2).  As I
mentioned in the details, the first errors were found on backup disks I made
before upgrading.


Comment 3 Bugzilla owner 2004-09-30 15:39:01 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/