Red Hat Bugzilla – Bug 141388
FAT32 file system zero length files corruption after remount
Last modified: 2007-11-30 17:07:05 EST
Description of problem: After writing 8MB files to a FAT32 file system, the file system becomes corrupt. After an unmount followed by a remount, some 8MB files end up with a size of 0. It may only happen for larger file systems and may only happen after a certain percentage of the file system is being used. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. # dd if=/dev/zero of=/dev/vg01/data bs=512 count=1 2. # mkfs.vfat -F 32 -n TESTFS -v /dev/vg01/data 3. # mount -t vfat -o 'noatime,nodev,uid=502,gid=502,umask=007' \ /dev/vg01/data /data # df -hl | grep '/data' /dev/vg01/data 196G 32K 196G 1% /data 4. The make-test-data.sh script used in this step is an attachment on this bug report. $ mkdir /data/jlmuir $ /mnt/ds4/jlmuir $ ~/bin/make-test-data.sh 100 $ du -hs /mnt/ds4/jlmuir/ 44G /mnt/ds4/jlmuir 5. $ rsync -av --modify-window=1 /mnt/ds4/jlmuir/ /data/jlmuir # umount /data # mount /data $ find /data/jlmuir -type f -size 0 6. $ rename 'data' 'data2' /mnt/ds4/jlmuir/* $ rsync -av --modify-window=1 /mnt/ds4/jlmuir/ /data/jlmuir # umount /data # mount /data $ find /data/jlmuir -type f -size 0 7. $ rename 'data2' 'data3' /mnt/ds4/jlmuir/* $ rsync -av --modify-window=1 /mnt/ds4/jlmuir/ /data/jlmuir # umount /data # mount /data $ find /data/jlmuir -type f -size 0 8. $ rename 'data3' 'data4' /mnt/ds4/jlmuir/* $ rsync -av --modify-window=1 /mnt/ds4/jlmuir/ /data/jlmuir # umount /data # mount /data $ find /data/jlmuir -type f -size 0 $ df -hl | grep '/data' /dev/vg01/data 196G 174G 22G 89% /data Actual Results: In step 8, the file system is now corrupt. The find command shows that all files in the data4-set-* directories have lost their contents; they have all become zero length files. Expected Results: Step 8 should be just like the previous steps 5, 6, and 7 - there should be no zero length files - all of the files should have been copied correctly and the file system should not be corrupt. Additional info: I don't know whether this problem is dependent on timing. For what it's worth, I unmounted the /data file system a few seconds after the rsync finished in step 8. This sounds like it may be the same bug discussed on the linux-kernel mailing list at http://seclists.org/lists/linux-kernel/2003/Dec/1065.html Erik Andersen posted a patch in his reply to this email at http://seclists.org/lists/linux-kernel/2003/Dec/1127.html
Created attachment 107651 [details] Script used in step 4 to generate files
I'm not sure that Eric's patch is a full fix, but it certainly shows real problems around 2^31 * 0.5Kb barrier. I'll do the full audit tonight - 2.4 and 2.6 are close enough for that and I think I see how to make sparse catch these guys; then fixes will be backported.
PS: there's a decent chance that 141381 is the same kind of issue; I'm not saying that we should merge them right now, but patch for this one will be definitely worth checking in case if it fixes both.
sigh... s/141381/141253/. Sorry.
Created attachment 108543 [details] Script referenced in comment
I've created an image of a file system that is corrupt and put it at http://www.imca.aps.anl.gov/~jlmuir/corrupt-fat32-fs.img.bz2 (File size: ~11MB. Bugzilla wouldn't let me include it as an attachment because it is too big). df says it is about 233GB in size with 226GB in use. This file system was not created using the same steps in this bug report, but I suspect the problem may be the same. In this case, the script named punish-fat32-disk (included on this bug report as an attachment) was used to perform the test. Approximately 5GB of test data was generated in a directory (not on the FAT32 file system being tested) named data1 (as 4 directories where each contains 200 8MB files), then this directory was rsync'ed 36 times to the destination FAT32 file system, each time renaming the source directory by incrementing the number on the end (e.g. 2, 3, 4, etc.) so that rsync would see it as a new directory to copy to the destination FAT32 file system. After unmounting and mounting again (without attempting a dosfsck), the directories data22 through data36 contain no files yet the file system is listed as having only 7.9GB of space left. Here's the sfdisk output for the disk: --- # sfdisk -l /dev/sdc Disk /dev/sdc: 30401 cylinders, 255 heads, 63 sectors/track Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End #cyls #blocks Id System /dev/sdc1 0+ 30400 30401- 244196032 c Win95 FAT32 (LBA) /dev/sdc2 0 - 0 0 0 Empty /dev/sdc3 0 - 0 0 0 Empty /dev/sdc4 0 - 0 0 0 Empty --- The image of the file system was created with the following command: # bzip2 -9 -c /dev/sdc1 > /data/corrupt-fat32-fs.img.bz2 Under Windows 2000, the chkdsk program reports errors with the file system. Here's a transcript of the cmd.exe session: --- Microsoft Windows 2000 [Version 5.00.2195] (C) Copyright 1985-2000 Microsoft Corp. C:\Documents and Settings\administrator> chkdsk G: The type of the file system is FAT32. Volume EXT_HD created 12/9/2004 10:30 AM Volume Serial Number is 41B8-7DB6 Windows is verifying files and folders... Windows found errors on the disk, but will not fix them because disk checking was run without the /F (fix) parameter. The \jlmuir\data36\data-set-0 entry contains a nonvalid link. The \jlmuir\data36\data-set-1 entry contains a nonvalid link. The \jlmuir\data36\data-set-2 entry contains a nonvalid link. The \jlmuir\data36\data-set-3 entry contains a nonvalid link. The \jlmuir\data35\data-set-0 entry contains a nonvalid link. The \jlmuir\data35\data-set-1 entry contains a nonvalid link. The \jlmuir\data35\data-set-2 entry contains a nonvalid link. The \jlmuir\data35\data-set-3 entry contains a nonvalid link. The \jlmuir\data34\data-set-0 entry contains a nonvalid link. The \jlmuir\data34\data-set-1 entry contains a nonvalid link. The \jlmuir\data34\data-set-2 entry contains a nonvalid link. The \jlmuir\data34\data-set-3 entry contains a nonvalid link. The \jlmuir\data33\data-set-0 entry contains a nonvalid link. The \jlmuir\data33\data-set-1 entry contains a nonvalid link. The \jlmuir\data33\data-set-2 entry contains a nonvalid link. The \jlmuir\data33\data-set-3 entry contains a nonvalid link. The \jlmuir\data32\data-set-0 entry contains a nonvalid link. The \jlmuir\data32\data-set-1 entry contains a nonvalid link. The \jlmuir\data32\data-set-2 entry contains a nonvalid link. The \jlmuir\data32\data-set-3 entry contains a nonvalid link. The \jlmuir\data31\data-set-0 entry contains a nonvalid link. The \jlmuir\data31\data-set-1 entry contains a nonvalid link. The \jlmuir\data31\data-set-2 entry contains a nonvalid link. The \jlmuir\data31\data-set-3 entry contains a nonvalid link. The \jlmuir\data30\data-set-0 entry contains a nonvalid link. The \jlmuir\data30\data-set-1 entry contains a nonvalid link. The \jlmuir\data30\data-set-2 entry contains a nonvalid link. The \jlmuir\data30\data-set-3 entry contains a nonvalid link. The \jlmuir\data29\data-set-0 entry contains a nonvalid link. The \jlmuir\data29\data-set-1 entry contains a nonvalid link. The \jlmuir\data29\data-set-2 entry contains a nonvalid link. The \jlmuir\data29\data-set-3 entry contains a nonvalid link. The \jlmuir\data28\data-set-0 entry contains a nonvalid link. The \jlmuir\data28\data-set-1 entry contains a nonvalid link. The \jlmuir\data28\data-set-2 entry contains a nonvalid link. The \jlmuir\data28\data-set-3 entry contains a nonvalid link. The \jlmuir\data27\data-set-0 entry contains a nonvalid link. The \jlmuir\data27\data-set-1 entry contains a nonvalid link. The \jlmuir\data27\data-set-2 entry contains a nonvalid link. The \jlmuir\data27\data-set-3 entry contains a nonvalid link. The \jlmuir\data26\data-set-0 entry contains a nonvalid link. The \jlmuir\data26\data-set-1 entry contains a nonvalid link. The \jlmuir\data26\data-set-2 entry contains a nonvalid link. The \jlmuir\data26\data-set-3 entry contains a nonvalid link. The \jlmuir\data25\data-set-0 entry contains a nonvalid link. The \jlmuir\data25\data-set-1 entry contains a nonvalid link. The \jlmuir\data25\data-set-2 entry contains a nonvalid link. The \jlmuir\data25\data-set-3 entry contains a nonvalid link. The \jlmuir\data24\data-set-0 entry contains a nonvalid link. The \jlmuir\data24\data-set-1 entry contains a nonvalid link. The \jlmuir\data24\data-set-2 entry contains a nonvalid link. The \jlmuir\data24\data-set-3 entry contains a nonvalid link. The \jlmuir\data23\data-set-0 entry contains a nonvalid link. The \jlmuir\data23\data-set-1 entry contains a nonvalid link. The \jlmuir\data23\data-set-2 entry contains a nonvalid link. The \jlmuir\data23\data-set-3 entry contains a nonvalid link. The \jlmuir\data22\data-set-0 entry contains a nonvalid link. The \jlmuir\data22\data-set-1 entry contains a nonvalid link. The \jlmuir\data22\data-set-2 entry contains a nonvalid link. The \jlmuir\data22\data-set-3 entry contains a nonvalid link. File and folder verification is complete. Convert lost chains to files (Y/N)? n 98305920 KB of free disk space would be added. Windows found problems with the file system. Run CHKDSK with the /F (fix) option to correct these. 244,136,384 KB total disk space. 3,872 KB in 121 folders. 137,625,600 KB in 16,800 files. 8,200,960 KB are available. 32,768 bytes in each allocation unit. 7,629,262 total allocation units on disk. 256,280 allocation units available on disk. C:\Documents and Settings\administrator> ---
Fixes for this problem have just been committed to the RHEL3 U5 patch pool this afternoon (in kernel version 2.4.21-27.6.EL).
We've been experiencing similar problems with 250Gb USB-attached drives, although unmounting and remounting isn't even required to reproduce the problem. We have the 2.4.21-27.0.2.ELsmp kernel currently. When will this new kernel get released as an rpm?
Release of RHEL3 U5 is currently scheduled for beginning of May.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-294.html