Bug 125646
Summary: | files restored from dump are corrupt | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Aleksandar Milivojevic <alex> |
Component: | dump | Assignee: | Jindrich Novy <jnovy> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 2 | CC: | pknirsch, stelian, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-08-12 07:10:16 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Aleksandar Milivojevic
2004-06-09 17:03:27 UTC
Your trust is misplaced if using tar and/or dump to back up active systems. dump has never promised reliable archives on active unix systems, and the problem is worse on linux because of the lack of a character device, output from raw devices is subject to change while cached by the kernel. I'm pretty sure that's what the problem is, but I will upgrade to -b36 and double check that the problem exists there as well. I know all about the dangers of backing up files on active file system. However, I expect that files that are not active (no process have them open for writing) should be backed up fine. The files that I was trying to archive using dump were untouched by any process on the system for long enough time that the kernel flushed all pages to the disk for sure. Anyhow, in case that (parts) of the file are still in cache and not commited to disk, shouldn't the kernel return information from the cache? Jeff: you are probably correct, this looks very much like an active filesystem problem. Aleksandar: dump bypasses the kernel cache when it access the raw disk, so if the kernel hasn't flushed the data dump will not see it. Even worse, if the kernel has flushed only part of the data (for example, flushed the inode metadata but not the data blocks), dump can see an invalid filesystem. Most of the time dumping a mounted filesystem works just fine, but there is no guarantee, and you'd better run restore -C to verify it. On the other side, dumping a unmounted filesystem (or a filesystem snapshot created by LVM/EVMS) is 100% guaranteed to be valid. Stelian. Than I guess we can mark this as NOTABUG (I'll leave it to Jeff to decide)? Stelian, if I understood you correctly, if I do something like: raidhotremove dump raidhotadd Linux kernel will flush all data to the disk prior of it being removed from the RAID device? Is that documented and garanteed behaviour? Or you had something else in mind? Jeff: I released 0.4b37 today, which fixes a filesystem offset calculation which could also lead to read errors or data corruption. Make sure you package 0.4b37 if you decide to upgrade. Aleksandar: no need for raidhotremove. I was talking about: umount /dev/whatever dump 0f /dev/tape /dev/whatever mount /dev/whatever Stelian: The workaround doesn't work in my case (or I would be doing it in a first place, and there wouldn't be this bug report). It would be nice if /dev/whatever was unmountable. However it isn't. So the backup must be done on mounted file system. No way around it. Booting single user or from CD is not an option either (for starters, no physical access to console, not to mention other restrictions). Never mind, on Linux this obviously can't be done in a safe and simple way with minimum (application) downtime like on Solaris by use of lockfs -fw;metaoffline;lockfs -u;ufsdump;metaonline (which would only sync the differences if any after metaonline which takes seconds to complete, not resync entire meta device like raidhotadd witch takes loooong time for large volumes). Time to stop typing, I'm going too much off topic anyway... I would consider that if 'raidhotremove' doesn't flush the data on the hardware device then you have a kernel bug. Anyway, you may want to look at LVM (or EVMS) snapshots, this may be the only way for you to run dump without problems. Just in case, please give a try to 0b4b37 too (you can download the .src.rpm or binary rpm from dump.sf.net), a corruption fix is included and who knows, you may be hitting exactly this bug... Stelian. Alex, have you tested the dump 0.4b37? Does the restore bug occur again? I doubt that even Stellian tried hard to fix the problem, dump cannot be safely used to mounted (and even not idle filesystems) without the EVMS kernel patches. If you agree, I'll close this bug as NOTABUG. Jindrich I've downloaded and installed 0.4b37. The fix Stellian made fixed this bug too (or maybe it was the same bug). If you agree, it can be closed as CURRENTRELEASE or NEXTRELEASE (whichever is appropriate). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-439.html |