This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 179324 - NFS: ^C on "iozone -I" causes oops
NFS: ^C on "iozone -I" causes oops
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity high
: ---
: ---
Assigned To: Ric Wheeler
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-01-30 06:47 EST by Steve Dickson
Modified: 2010-03-16 14:19 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-16 14:19:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Purposed Patch (1018 bytes, patch)
2006-01-30 07:06 EST, Steve Dickson
no flags Details | Diff
Proposed upstream patch (4.05 KB, patch)
2006-10-30 19:54 EST, Steve Dickson
no flags Details | Diff

  None (edit)
Description Steve Dickson 2006-01-30 06:47:29 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc3 Firefox/1.0.7

Description of problem:
i just found this bug a few days ago.  patch is tested against 2.6.16-rc1
with my aio+dio patches, but should fit with little or no adjustment on
RHEL 4 u3.

the problem is a ^C usually causes files to be closed that may have
outstanding direct I/O going on them.  either the app catches the ^C and
closes the files immediately before shutting down, or the system catches
the ^C, closes down the app and closes the files too.

in that case we hit a BUG_ON in nfs_clear_inode (which may not exist in
2.6.9, but this is still a good patch to include) which asserts that
nfsi->data_updates == 0 or else oops.

it's a racy thing to reproduce.  i didn't start seeing it until i began
testing against slow JBOD servers (Solaris 10 in this case, but Linux is
probably also susceptible).  if the I/O is slow, then it's much more
likely that the file close will finish before the outstanding I/O.

the fix is to copy what the cached path does: bump the i_count before
starting the I/O, and dec the i_count after it is finished.  that will
prevent the inode from being destroyed (via iput) until all direct I/O
is complete.

        -- Chuck Lever

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. run iozone -l
2. Type ^C
3.
  

Actual Results:  Systems oops 

Expected Results:  iozone to stop 

Additional info:
Comment 1 Steve Dickson 2006-01-30 07:06:06 EST
Created attachment 123863 [details]
Purposed Patch

^C against "iozone -I" is hitting the assertion in nfs_clear_inode().

Test plan:
"iozone -i0 -I -a -c" against a slow server, then control C.  This should
not cause an oops.
Comment 3 RHEL Product and Program Management 2006-09-07 15:33:25 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 4 RHEL Product and Program Management 2006-09-07 15:33:36 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 5 RHEL Product and Program Management 2006-09-07 15:33:51 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 7 Jason Baron 2006-09-13 13:33:49 EDT
Chuck, i noticed that this patch was superceded upstream by commit
a8881f5a5c723f82da84b786d3ca83a0df9e0c33 which removes the igrab and iput. That
change might be too invasive for rhel and its not even clear to me it addresses
this issue. comments?
Comment 8 Chuck Lever 2006-10-30 19:16:09 EST
Jason-

Yes, a8881f5a is the correct fix for this.  Trond pointed out that the i_count
is supposed to be going to zero in nfs_clear_inode -- bumping the i_count in the
direct I/O path was only a workaround.

What is your concern about the change's invasiveness?
Comment 9 Steve Dickson 2006-10-30 19:54:02 EST
Created attachment 139788 [details]
Proposed upstream patch
Comment 11 RHEL Product and Program Management 2007-05-09 06:54:22 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 12 RHEL Product and Program Management 2007-09-07 15:45:59 EDT
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.
Comment 13 Peter Staubach 2008-06-19 14:09:33 EDT
How reproducible was this system failure?
Comment 14 Chuck Lever 2008-06-20 13:08:07 EDT
It was two and a half years ago, so I don't remember clearly.  I don't remember it being difficult to hit 
using the reproducer, but it was somewhat more rare with everyday applications.
Comment 15 Peter Staubach 2008-06-20 13:47:52 EDT
Hmmm.  After two days of trying, I _may_ have gotten it to fail, but
I am not sure.  I will continue to try and to verify it this next time.
I am using iozone, but it doesn't happen very often nonetheless.

Note You need to log in before you can comment on or make changes to this bug.