845233 – XFS regularly truncating files after crash/reboot

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 845233 - XFS regularly truncating files after crash/reboot

Summary: XFS regularly truncating files after crash/reboot

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.2
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Dave Chinner
QA Contact:	Boris Ranto
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	835623 960641 (view as bug list)
Depends On:
Blocks:	846704 960437 960438
TreeView+	depends on / blocked

Reported:	2012-08-02 11:39 UTC by Daire Byrne
Modified:	2018-12-04 14:43 UTC (History)
CC List:	16 users (show)
Fixed In Version:	kernel-2.6.32-328.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-02-21 06:44:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2013:0496	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 6 kernel update	2013-02-20 21:40:54 UTC

Description Daire Byrne 2012-08-02 11:39:51 UTC

Description of problem:

We have observed that files can be truncated (0 bytes for small files) on an XFS filesystem after a power cycle or crash. It seems like perhaps data is not being flushed to disk often enough. A search brought this patch to my attention which seems to describe the issue we are experiencing:

http://oss.sgi.com/archives/xfs/2012-06/msg00350.html

Are there plans to commit this patch to the EL6 kernel? Are there any other workarounds we can try in the meantime?

Version-Release number of selected component (if applicable):
kernel-2.6.32-220.17.1.el6.x86_64

How reproducible:
Not sure how to reproduce every time - it has happened numerous times in the last couple of weeks on our EL6 file server.

Steps to Reproduce:
1. Power cycle server not long after "writing" some small files (e.g. a source tree)
2.
3.
  
Actual results:
Files being truncated after log replay

Expected results:
File data should be committed to disk.

Additional info:

Comment 2 Ric Wheeler 2012-08-02 13:44:58 UTC

Hi Daire,

Please open up a support ticket with RH so our support staff can help gather the needed information.

In general, it is the applications duty to use fsync() or fdatasync() when it wants to have data persist over a power failure.

Comment 7 Ric Wheeler 2012-08-03 11:03:43 UTC

Hi Daire,

We are looking to pull some upsteam fixes back into RHEL6. This BZ will get updated with the details once that happens.

Thanks for the report!

Comment 8 James Braid 2012-08-15 22:59:52 UTC

We appear to be seeing files get truncated (zero bytes) after rebooting even when the files are opened read only. This is a bit concerning...

This thread describes the same thing as we're seeing:

http://thread.gmane.org/gmane.comp.file-systems.xfs.general/45648

Comment 9 Ric Wheeler 2012-08-15 23:20:28 UTC

James, if you see that behavior (which I have *never* seen or heard of), please open a ticket with Red Hat support so we can debug with you.

That specific thread you reference was not on RHEL (CentOS) and the SGI engineer and reporter saw it only on a specific machine/harwdare type.

Thanks!

Comment 10 RHEL Program Management 2012-09-20 21:11:01 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 11 Dave Chinner 2012-09-25 02:16:58 UTC

Posted to rhkernel-list:

http://post-office.corp.redhat.com/archives/rhkernel-list/2012-September/msg02828.html

Comment 12 Dave Chinner 2012-09-25 02:22:03 UTC

*** Bug 835623 has been marked as a duplicate of this bug. ***

Comment 14 Jarod Wilson 2012-10-10 20:08:56 UTC

Patch(es) available on kernel-2.6.32-328.el6

Comment 17 Paul Smith 2012-11-08 23:34:07 UTC

Hi Jarod; I'm confused by this issue.  We are seeing a serious 0-length file problem on XFS partitions after a system crash.  These are files which were written to the disk over 18 hours before the crash and not modified since (they were programs, not data files etc.)  I've been doing an hourly scan for 0-length files and in one case after the crash I found 379 new 0-length files on the system, compared to the scan before the crash!

We're running RHEL 6.2.  I found, in the release notes for RHEL 6.3, a reference to Bug 856686 which seems like it might be our problem.  However I can't see that bug as it's apparently marked private, so I can't be sure.  The dup bug 835623 here is also private.

Now I find this bug, which also sounds similar and is marked as available in 2.6.32-328 which I guess will be the kernel for RHEL 6.4?

Is there any possibility of backports of this bug to the current RHEL 6.3 (at least)?  I don't have access to the rhelkernel-list link above so I'm not sure how much work the fix would be.

I'm wondering if XFS is simply not reliable for use in currently-released versions of Red Hat EL, and I should avoid it.  Unfortunately we do a lot of formatting of very large partitions and switching back to ext4, with the orders of magnitude longer format times, would be very painful.

Comment 18 Dave Chinner 2012-11-09 00:01:32 UTC

Hi Paul,

Perhaps you should have contacted RH support as soon as you started seeing data loss problems rather than working around them. As it is, you're going to be looking for the fix to 856685, which has been available for RHEL6.2 since this errata was release:

http://rhn.redhat.com/errata/RHSA-2012-1401.html

It was also fixed in 6.3 at the same time.

This bug was never triaged as the reporter never followed up, and so was used to close off the last known, quite rare recovery problem (reported maybe 5 times in the past 5 years!) that was solved upstream that could have resulted in zero length files. So I think the above errata kernel is what you want. If it doesn't fix your problems, then please go through the usual channels to get a new bug opened.

-Dave.

Comment 19 Paul Smith 2012-11-09 00:20:53 UTC

Thanks.  I haven't tried any workarounds, I was obtaining tracking data with a simple cron.hourly job to search for 0-length files; I've just started seriously looking into this problem and only today did I discover it was related to XFS and system crashes (the nodes are remote and headless and I didn't realize they were crashing in the first place--we would just notice that some files were 0 length and we had no idea when or how it happened).  Luckily we're still in development so no customer data lost!

I'll take a look at that errata.  Cheers!

Comment 22 errata-xmlrpc 2013-02-21 06:44:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html

Comment 27 Frank Hirtz 2013-05-09 21:36:33 UTC

*** Bug 960641 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.