Red Hat Bugzilla – Bug 125646
files restored from dump are corrupt
Last modified: 2013-07-02 19:00:39 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040510
Description of problem:
When dumping only set of files (not entire partition), in same cases
files in dump archive are corrupted. My colegue Don Tiessen
discovered this with dump-0.4b27-3 on Red Hat 7.3, and I confirmed and
did some additinal testing with dump-0.4b33-3 on Fedora Core 2.
The corruption is not reproducible always. Sometimes (but only
sometimes) everything goes fine. Sometimes some files are corrupted.
What I did was to set up directory /root/test, and copied current set
of official RPM updates for Fedora Core 2 as test files. Than I did:
# cd /root
# dump -0 -f test.dump test
# restore -ivf test1.dump
# mv root root1
# cd root1/test
# for a in *; do diff $a /root/test/$a; done
I repeated this test three times (changing 1 to 2 and 3 in above
commands). First and third time I ended up with some files corrupted,
and second time everything was correctly restored. Dump and restore
did not print any error messages.
The results of one run:
Binary files ipsec-tools-0.2.5-2.i386.rpm and
Binary files kdelibs-3.2.2-6.i386.rpm and
Binary files kdelibs-devel-3.2.2-6.i386.rpm and
Binary files subversion-perl-1.0.2-2.1.i386.rpm and
The results of another run:
Binary files cups-1.1.20-11.1.i386.rpm and
Binary files php-ldap-4.3.6-5.i386.rpm and
Binary files php-pear-4.3.6-5.i386.rpm and
Binary files subversion-1.0.2-2.1.i386.rpm and
The corrupted files have same size as original files.
Than I did two additional rounds of tests.
In second round of tests, I used dump file on different partition
(original files on hda1, dump on hda6). In this case I was not able
to reproduce the problem (but it might be that I wasn't trying hard
In third round of test, I dumped entire partition. Again, I was not
able to reproduce the problem (but it might be that I wasn't trying
I'm reporting the bug with high severity, since it might result in
loss of valuable data (we all trust utilities such as dump or tar to
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. dump -0 -f dump_file dir
2. restore -ivf dump_file
3. diff between original and restored files
Actual Results: The restored files were different than original files
Expected Results: The restored files should be the same as original
Your trust is misplaced if using tar and/or dump to back up
dump has never promised reliable archives on active unix
systems, and the problem is worse on linux because of
the lack of a character device, output from raw devices is
subject to change while cached by the kernel.
I'm pretty sure that's what the problem is, but I
will upgrade to -b36 and double check that the problem
exists there as well.
I know all about the dangers of backing up files on active file
system. However, I expect that files that are not active (no process
have them open for writing) should be backed up fine. The files that
I was trying to archive using dump were untouched by any process on
the system for long enough time that the kernel flushed all pages to
the disk for sure.
Anyhow, in case that (parts) of the file are still in cache and not
commited to disk, shouldn't the kernel return information from the cache?
Jeff: you are probably correct, this looks very much like an active
Aleksandar: dump bypasses the kernel cache when it access the raw
disk, so if the kernel hasn't flushed the data dump will not see it.
Even worse, if the kernel has flushed only part of the data (for
example, flushed the inode metadata but not the data blocks), dump can
see an invalid filesystem.
Most of the time dumping a mounted filesystem works just fine, but
there is no guarantee, and you'd better run restore -C to verify it.
On the other side, dumping a unmounted filesystem (or a filesystem
snapshot created by LVM/EVMS) is 100% guaranteed to be valid.
Than I guess we can mark this as NOTABUG (I'll leave it to Jeff to
Stelian, if I understood you correctly, if I do something like:
Linux kernel will flush all data to the disk prior of it being removed
from the RAID device? Is that documented and garanteed behaviour? Or
you had something else in mind?
Jeff: I released 0.4b37 today, which fixes a filesystem offset
calculation which could also lead to read errors or data corruption.
Make sure you package 0.4b37 if you decide to upgrade.
Aleksandar: no need for raidhotremove. I was talking about:
dump 0f /dev/tape /dev/whatever
Stelian: The workaround doesn't work in my case (or I would be doing
it in a first place, and there wouldn't be this bug report). It would
be nice if /dev/whatever was unmountable. However it isn't. So the
backup must be done on mounted file system. No way around it.
Booting single user or from CD is not an option either (for starters,
no physical access to console, not to mention other restrictions).
Never mind, on Linux this obviously can't be done in a safe and simple
way with minimum (application) downtime like on Solaris by use of
lockfs -fw;metaoffline;lockfs -u;ufsdump;metaonline (which would only
sync the differences if any after metaonline which takes seconds to
complete, not resync entire meta device like raidhotadd witch takes
loooong time for large volumes). Time to stop typing, I'm going too
much off topic anyway...
I would consider that if 'raidhotremove' doesn't flush the data on the
hardware device then you have a kernel bug.
Anyway, you may want to look at LVM (or EVMS) snapshots, this may be
the only way for you to run dump without problems.
Just in case, please give a try to 0b4b37 too (you can download the
.src.rpm or binary rpm from dump.sf.net), a corruption fix is included
and who knows, you may be hitting exactly this bug...
Alex, have you tested the dump 0.4b37? Does the restore bug occur
again? I doubt that even Stellian tried hard to fix the problem, dump
cannot be safely used to mounted (and even not idle filesystems)
without the EVMS kernel patches. If you agree, I'll close this bug as
I've downloaded and installed 0.4b37. The fix Stellian made fixed
this bug too (or maybe it was the same bug). If you agree, it can be
closed as CURRENTRELEASE or NEXTRELEASE (whichever is appropriate).
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.