Bug 141388 - FAT32 file system zero length files corruption after remount
FAT32 file system zero length files corruption after remount
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Alexander Viro
:
Depends On:
Blocks: 132991
  Show dependency treegraph
 
Reported: 2004-11-30 16:22 EST by J. Lewis Muir
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-05-18 09:28:45 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Script used in step 4 to generate files (1.35 KB, text/plain)
2004-11-30 16:24 EST, J. Lewis Muir
no flags Details
Script referenced in comment (2.77 KB, text/plain)
2004-12-14 13:10 EST, J. Lewis Muir
no flags Details

  None (edit)
Description J. Lewis Muir 2004-11-30 16:22:39 EST
Description of problem:
After writing 8MB files to a FAT32 file system, the file system becomes
corrupt. After an unmount followed by a remount, some 8MB files end up with a
size of 0. It may only happen for larger file systems and may only happen
after a certain percentage of the file system is being used.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. # dd if=/dev/zero of=/dev/vg01/data bs=512 count=1
2. # mkfs.vfat -F 32 -n TESTFS -v /dev/vg01/data
3. # mount -t vfat -o 'noatime,nodev,uid=502,gid=502,umask=007' \
   /dev/vg01/data /data
   # df -hl | grep '/data'
   /dev/vg01/data        196G   32K  196G   1% /data
4. The make-test-data.sh script used in this step is an attachment on this bug
   report.
   $ mkdir /data/jlmuir
   $ /mnt/ds4/jlmuir
   $ ~/bin/make-test-data.sh 100
   $ du -hs /mnt/ds4/jlmuir/
   44G     /mnt/ds4/jlmuir
5. $ rsync -av --modify-window=1 /mnt/ds4/jlmuir/ /data/jlmuir
   # umount /data
   # mount /data
   $ find /data/jlmuir -type f -size 0
6. $ rename 'data' 'data2' /mnt/ds4/jlmuir/*
   $ rsync -av --modify-window=1 /mnt/ds4/jlmuir/ /data/jlmuir
   # umount /data
   # mount /data
   $ find /data/jlmuir -type f -size 0
7. $ rename 'data2' 'data3' /mnt/ds4/jlmuir/*
   $ rsync -av --modify-window=1 /mnt/ds4/jlmuir/ /data/jlmuir
   # umount /data
   # mount /data
   $ find /data/jlmuir -type f -size 0
8. $ rename 'data3' 'data4' /mnt/ds4/jlmuir/*
   $ rsync -av --modify-window=1 /mnt/ds4/jlmuir/ /data/jlmuir
   # umount /data
   # mount /data
   $ find /data/jlmuir -type f -size 0
   $ df -hl | grep '/data'
   /dev/vg01/data        196G  174G   22G  89% /data


Actual Results:  In step 8, the file system is now corrupt. The find command shows that all
files in the data4-set-* directories have lost their contents; they have all
become zero length files.


Expected Results:  Step 8 should be just like the previous steps 5, 6, and 7 - there should 
be no
zero length files - all of the files should have been copied correctly and the
file system should not be corrupt.


Additional info:

I don't know whether this problem is dependent on timing. For what it's worth,
I unmounted the /data file system a few seconds after the rsync finished in
step 8.

This sounds like it may be the same bug discussed on the linux-kernel mailing
list at

  http://seclists.org/lists/linux-kernel/2003/Dec/1065.html

Erik Andersen posted a patch in his reply to this email at

  http://seclists.org/lists/linux-kernel/2003/Dec/1127.html
Comment 1 J. Lewis Muir 2004-11-30 16:24:00 EST
Created attachment 107651 [details]
Script used in step 4 to generate files
Comment 3 Alexander Viro 2004-12-10 18:34:13 EST
I'm not sure that Eric's patch is a full fix, but it certainly shows
real problems around 2^31 * 0.5Kb barrier.  I'll do the full audit
tonight - 2.4 and 2.6 are close enough for that and I think I see how
to make sparse catch these guys; then fixes will be backported.
Comment 4 Alexander Viro 2004-12-10 18:37:18 EST
PS: there's a decent chance that 141381 is the same kind of issue;
I'm not saying that we should merge them right now, but patch for
this one will be definitely worth checking in case if it fixes both.
Comment 5 Alexander Viro 2004-12-10 18:41:05 EST
sigh...  s/141381/141253/.  Sorry.
Comment 7 J. Lewis Muir 2004-12-14 13:10:26 EST
Created attachment 108543 [details]
Script referenced in comment
Comment 8 J. Lewis Muir 2004-12-14 13:16:06 EST
I've created an image of a file system that is corrupt and put it at

    http://www.imca.aps.anl.gov/~jlmuir/corrupt-fat32-fs.img.bz2

(File size: ~11MB. Bugzilla wouldn't let me include it as an attachment
because it is too big). df says it is about 233GB in size with 226GB in use.
This file system was not created using the same steps in this bug report, but
I suspect the problem may be the same. In this case, the script named
punish-fat32-disk (included on this bug report as an attachment) was used to
perform the test.

Approximately 5GB of test data was generated in a directory (not on the FAT32
file system being tested) named data1 (as 4 directories where each contains
200 8MB files), then this directory was rsync'ed 36 times to the destination
FAT32 file system, each time renaming the source directory by incrementing the
number on the end (e.g. 2, 3, 4, etc.) so that rsync would see it as a new
directory to copy to the destination FAT32 file system. After unmounting and
mounting again (without attempting a dosfsck), the directories data22 through
data36 contain no files yet the file system is listed as having only 7.9GB of
space left.

Here's the sfdisk output for the disk:
---
# sfdisk -l /dev/sdc

Disk /dev/sdc: 30401 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sdc1          0+  30400   30401- 244196032    c  Win95 FAT32 (LBA)
/dev/sdc2          0       -       0          0    0  Empty
/dev/sdc3          0       -       0          0    0  Empty
/dev/sdc4          0       -       0          0    0  Empty
---

The image of the file system was created with the following command:
# bzip2 -9 -c /dev/sdc1 > /data/corrupt-fat32-fs.img.bz2

Under Windows 2000, the chkdsk program reports errors with the file system.
Here's a transcript of the cmd.exe session:
---
Microsoft Windows 2000 [Version 5.00.2195]
(C) Copyright 1985-2000 Microsoft Corp.

C:\Documents and Settings\administrator> chkdsk G:
The type of the file system is FAT32.
Volume EXT_HD created 12/9/2004 10:30 AM
Volume Serial Number is 41B8-7DB6
Windows is verifying files and folders...
Windows found errors on the disk, but will not fix them
because disk checking was run without the /F (fix) parameter.
The \jlmuir\data36\data-set-0 entry contains a nonvalid link.
The \jlmuir\data36\data-set-1 entry contains a nonvalid link.
The \jlmuir\data36\data-set-2 entry contains a nonvalid link.
The \jlmuir\data36\data-set-3 entry contains a nonvalid link.
The \jlmuir\data35\data-set-0 entry contains a nonvalid link.
The \jlmuir\data35\data-set-1 entry contains a nonvalid link.
The \jlmuir\data35\data-set-2 entry contains a nonvalid link.
The \jlmuir\data35\data-set-3 entry contains a nonvalid link.
The \jlmuir\data34\data-set-0 entry contains a nonvalid link.
The \jlmuir\data34\data-set-1 entry contains a nonvalid link.
The \jlmuir\data34\data-set-2 entry contains a nonvalid link.
The \jlmuir\data34\data-set-3 entry contains a nonvalid link.
The \jlmuir\data33\data-set-0 entry contains a nonvalid link.
The \jlmuir\data33\data-set-1 entry contains a nonvalid link.
The \jlmuir\data33\data-set-2 entry contains a nonvalid link.
The \jlmuir\data33\data-set-3 entry contains a nonvalid link.
The \jlmuir\data32\data-set-0 entry contains a nonvalid link.
The \jlmuir\data32\data-set-1 entry contains a nonvalid link.
The \jlmuir\data32\data-set-2 entry contains a nonvalid link.
The \jlmuir\data32\data-set-3 entry contains a nonvalid link.
The \jlmuir\data31\data-set-0 entry contains a nonvalid link.
The \jlmuir\data31\data-set-1 entry contains a nonvalid link.
The \jlmuir\data31\data-set-2 entry contains a nonvalid link.
The \jlmuir\data31\data-set-3 entry contains a nonvalid link.
The \jlmuir\data30\data-set-0 entry contains a nonvalid link.
The \jlmuir\data30\data-set-1 entry contains a nonvalid link.
The \jlmuir\data30\data-set-2 entry contains a nonvalid link.
The \jlmuir\data30\data-set-3 entry contains a nonvalid link.
The \jlmuir\data29\data-set-0 entry contains a nonvalid link.
The \jlmuir\data29\data-set-1 entry contains a nonvalid link.
The \jlmuir\data29\data-set-2 entry contains a nonvalid link.
The \jlmuir\data29\data-set-3 entry contains a nonvalid link.
The \jlmuir\data28\data-set-0 entry contains a nonvalid link.
The \jlmuir\data28\data-set-1 entry contains a nonvalid link.
The \jlmuir\data28\data-set-2 entry contains a nonvalid link.
The \jlmuir\data28\data-set-3 entry contains a nonvalid link.
The \jlmuir\data27\data-set-0 entry contains a nonvalid link.
The \jlmuir\data27\data-set-1 entry contains a nonvalid link.
The \jlmuir\data27\data-set-2 entry contains a nonvalid link.
The \jlmuir\data27\data-set-3 entry contains a nonvalid link.
The \jlmuir\data26\data-set-0 entry contains a nonvalid link.
The \jlmuir\data26\data-set-1 entry contains a nonvalid link.
The \jlmuir\data26\data-set-2 entry contains a nonvalid link.
The \jlmuir\data26\data-set-3 entry contains a nonvalid link.
The \jlmuir\data25\data-set-0 entry contains a nonvalid link.
The \jlmuir\data25\data-set-1 entry contains a nonvalid link.
The \jlmuir\data25\data-set-2 entry contains a nonvalid link.
The \jlmuir\data25\data-set-3 entry contains a nonvalid link.
The \jlmuir\data24\data-set-0 entry contains a nonvalid link.
The \jlmuir\data24\data-set-1 entry contains a nonvalid link.
The \jlmuir\data24\data-set-2 entry contains a nonvalid link.
The \jlmuir\data24\data-set-3 entry contains a nonvalid link.
The \jlmuir\data23\data-set-0 entry contains a nonvalid link.
The \jlmuir\data23\data-set-1 entry contains a nonvalid link.
The \jlmuir\data23\data-set-2 entry contains a nonvalid link.
The \jlmuir\data23\data-set-3 entry contains a nonvalid link.
The \jlmuir\data22\data-set-0 entry contains a nonvalid link.
The \jlmuir\data22\data-set-1 entry contains a nonvalid link.
The \jlmuir\data22\data-set-2 entry contains a nonvalid link.
The \jlmuir\data22\data-set-3 entry contains a nonvalid link.
File and folder verification is complete.
Convert lost chains to files (Y/N)? n
98305920 KB of free disk space would be added.
Windows found problems with the file system.
Run CHKDSK with the /F (fix) option to correct these.
  244,136,384 KB total disk space.
        3,872 KB in 121 folders.
  137,625,600 KB in 16,800 files.
    8,200,960 KB are available.

       32,768 bytes in each allocation unit.
    7,629,262 total allocation units on disk.
      256,280 allocation units available on disk.

C:\Documents and Settings\administrator>
---
Comment 9 Ernie Petrides 2005-01-06 14:23:09 EST
Fixes for this problem have just been committed to the RHEL3 U5
patch pool this afternoon (in kernel version 2.4.21-27.6.EL).
Comment 10 Karen Bruner 2005-02-04 18:04:25 EST
We've been experiencing similar problems with 250Gb USB-attached
drives, although unmounting and remounting isn't even required to
reproduce the problem.  We have the 2.4.21-27.0.2.ELsmp kernel
currently.  When will this new kernel get released as an rpm?
Comment 11 Ernie Petrides 2005-02-07 17:32:26 EST
Release of RHEL3 U5 is currently scheduled for beginning of May.
Comment 12 Tim Powers 2005-05-18 09:28:45 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-294.html

Note You need to log in before you can comment on or make changes to this bug.