Red Hat Bugzilla – Bug 70622
fsync_inode_buffers is unreliable
Last modified: 2007-11-30 17:06:51 EST
Description of Problem:
The routine fsync_inode_buffers() should write out the dirty buffers for the
specified inode, but there exists a scenario where it can return having failed
to do so:
bdflush() runs and gets its hands on a dirty buffer. It locks, cleans, and
submits it. Before the I/O completes, commit_write() dirties it again.
Now fsync_inode_buffers() runs. It sees the buffer as dirty, so it calls
ll_rw_block() on it, but its try lock fails, so it does nothing.
fsync_inode_buffers() later waits on the buffer. Then the I/O finally
completes, and fsync_inode_buffers() returns. That I/O was on a previous
version of the block -- the latest data have not necessarily reached the disk yet.
Version-Release number of selected component (if applicable):
2.4.9-e.3, e.5 and e.8
Steps to Reproduce:
1. This is a race condition. We encountered it in work on a shared/clustered
filesystem. Corruption occurred and the bug was found through inspection.
Occasionally, not all buffers will be flushed when fsync_inode_buffers returns.
In a clustered filesystem, this can cause data corruption.
All dirty buffers should have been written at least in the form they were at the
time of the call on return from fsync_inode_buffers.
This causes data corruption for a shared/clustered filesytem which needs to use
clusterwide inode locks and expects fsync_inode_buffers() reliably write all the
FYI, the patch for this bug made it into 2.4.19.