Bug 845233
Summary: | XFS regularly truncating files after crash/reboot | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Daire Byrne <daire.byrne> |
Component: | kernel | Assignee: | Dave Chinner <dchinner> |
Status: | CLOSED ERRATA | QA Contact: | Boris Ranto <branto> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 6.2 | CC: | cww, dchinner, dhoward, eguan, esandeen, fhirtz, fs-qe, jamesb, kernel-eus-qe, ksquizza, kzhang, npajkovs, pasteur, pds, rwheeler, yowang |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-2.6.32-328.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-02-21 06:44:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 846704, 960437, 960438 |
Description
Daire Byrne
2012-08-02 11:39:51 UTC
Hi Daire, Please open up a support ticket with RH so our support staff can help gather the needed information. In general, it is the applications duty to use fsync() or fdatasync() when it wants to have data persist over a power failure. Hi Daire, We are looking to pull some upsteam fixes back into RHEL6. This BZ will get updated with the details once that happens. Thanks for the report! We appear to be seeing files get truncated (zero bytes) after rebooting even when the files are opened read only. This is a bit concerning... This thread describes the same thing as we're seeing: http://thread.gmane.org/gmane.comp.file-systems.xfs.general/45648 James, if you see that behavior (which I have *never* seen or heard of), please open a ticket with Red Hat support so we can debug with you. That specific thread you reference was not on RHEL (CentOS) and the SGI engineer and reporter saw it only on a specific machine/harwdare type. Thanks! This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. Posted to rhkernel-list: http://post-office.corp.redhat.com/archives/rhkernel-list/2012-September/msg02828.html *** Bug 835623 has been marked as a duplicate of this bug. *** Patch(es) available on kernel-2.6.32-328.el6 Hi Jarod; I'm confused by this issue. We are seeing a serious 0-length file problem on XFS partitions after a system crash. These are files which were written to the disk over 18 hours before the crash and not modified since (they were programs, not data files etc.) I've been doing an hourly scan for 0-length files and in one case after the crash I found 379 new 0-length files on the system, compared to the scan before the crash! We're running RHEL 6.2. I found, in the release notes for RHEL 6.3, a reference to Bug 856686 which seems like it might be our problem. However I can't see that bug as it's apparently marked private, so I can't be sure. The dup bug 835623 here is also private. Now I find this bug, which also sounds similar and is marked as available in 2.6.32-328 which I guess will be the kernel for RHEL 6.4? Is there any possibility of backports of this bug to the current RHEL 6.3 (at least)? I don't have access to the rhelkernel-list link above so I'm not sure how much work the fix would be. I'm wondering if XFS is simply not reliable for use in currently-released versions of Red Hat EL, and I should avoid it. Unfortunately we do a lot of formatting of very large partitions and switching back to ext4, with the orders of magnitude longer format times, would be very painful. Hi Paul, Perhaps you should have contacted RH support as soon as you started seeing data loss problems rather than working around them. As it is, you're going to be looking for the fix to 856685, which has been available for RHEL6.2 since this errata was release: http://rhn.redhat.com/errata/RHSA-2012-1401.html It was also fixed in 6.3 at the same time. This bug was never triaged as the reporter never followed up, and so was used to close off the last known, quite rare recovery problem (reported maybe 5 times in the past 5 years!) that was solved upstream that could have resulted in zero length files. So I think the above errata kernel is what you want. If it doesn't fix your problems, then please go through the usual channels to get a new bug opened. -Dave. Thanks. I haven't tried any workarounds, I was obtaining tracking data with a simple cron.hourly job to search for 0-length files; I've just started seriously looking into this problem and only today did I discover it was related to XFS and system crashes (the nodes are remote and headless and I didn't realize they were crashing in the first place--we would just notice that some files were 0 length and we had no idea when or how it happened). Luckily we're still in development so no customer data lost! I'll take a look at that errata. Cheers! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0496.html *** Bug 960641 has been marked as a duplicate of this bug. *** |