Red Hat Bugzilla – Bug 179324
NFS: ^C on "iozone -I" causes oops
Last modified: 2010-03-16 14:19:46 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc3 Firefox/1.0.7
Description of problem:
i just found this bug a few days ago. patch is tested against 2.6.16-rc1
with my aio+dio patches, but should fit with little or no adjustment on
RHEL 4 u3.
the problem is a ^C usually causes files to be closed that may have
outstanding direct I/O going on them. either the app catches the ^C and
closes the files immediately before shutting down, or the system catches
the ^C, closes down the app and closes the files too.
in that case we hit a BUG_ON in nfs_clear_inode (which may not exist in
2.6.9, but this is still a good patch to include) which asserts that
nfsi->data_updates == 0 or else oops.
it's a racy thing to reproduce. i didn't start seeing it until i began
testing against slow JBOD servers (Solaris 10 in this case, but Linux is
probably also susceptible). if the I/O is slow, then it's much more
likely that the file close will finish before the outstanding I/O.
the fix is to copy what the cached path does: bump the i_count before
starting the I/O, and dec the i_count after it is finished. that will
prevent the inode from being destroyed (via iput) until all direct I/O
-- Chuck Lever
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. run iozone -l
2. Type ^C
Actual Results: Systems oops
Expected Results: iozone to stop
Created attachment 123863 [details]
^C against "iozone -I" is hitting the assertion in nfs_clear_inode().
"iozone -i0 -I -a -c" against a slow server, then control C. This should
not cause an oops.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Chuck, i noticed that this patch was superceded upstream by commit
a8881f5a5c723f82da84b786d3ca83a0df9e0c33 which removes the igrab and iput. That
change might be too invasive for rhel and its not even clear to me it addresses
this issue. comments?
Yes, a8881f5a is the correct fix for this. Trond pointed out that the i_count
is supposed to be going to zero in nfs_clear_inode -- bumping the i_count in the
direct I/O path was only a workaround.
What is your concern about the change's invasiveness?
Created attachment 139788 [details]
Proposed upstream patch
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time. This request will be
reviewed for a future Red Hat Enterprise Linux release.
How reproducible was this system failure?
It was two and a half years ago, so I don't remember clearly. I don't remember it being difficult to hit
using the reproducer, but it was somewhat more rare with everyday applications.
Hmmm. After two days of trying, I _may_ have gotten it to fail, but
I am not sure. I will continue to try and to verify it this next time.
I am using iozone, but it doesn't happen very often nonetheless.