Bug 809937

Summary: Fix RPC priority queue wake up all tasks processing
Product: Red Hat Enterprise Linux 5 Reporter: Steve Dickson <steved>
Component: kernelAssignee: nfs-maint
Status: CLOSED ERRATA QA Contact: Eryu Guan <eguan>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.9CC: andros, bikash, ccui, cww, dhoward, eguan, kzhang, msvoboda, yanwang
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: NFS
Fixed In Version: Doc Type: Bug Fix
Doc Text:
A process scheduler did not handle RPC priority wait queues correctly. Consequently, the process scheduler failed to wake up all scheduled tasks as expected after RPC timeout, which caused the system to become unresponsive and could significantly decrease system performance. This update modifies the process scheduler to handle RPC priority wait queues as expected. All scheduled tasks are now properly woken up after RPC timeout and the system behaves as expected.
Story Points: ---
Clone Of: 809928 Environment:
Last Closed: 2013-01-08 04:51:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 809928    
Bug Blocks: 817569, 817570, 817571    

Description Steve Dickson 2012-04-04 17:18:02 UTC
+++ This bug was initially created as a clone of Bug #809928 +++

Description of problem:The list macro list_for_each_next_safe for loop fails to enumerate RPC priority wait queue tasks stored on the tk_wait.links list resulting in rpc_wake_up and rpc_wake_up_status failing to wake up all tasks. This will result in nasty hangs, and poor performance. The NFSv4.1 session slot table wait queues implementation uses a lot of RPC priority queues and so v4.1 is especially affected.

The bug was noticed as I investigated Bug 756212 - Redirecting I/O through the MDS after a data server network partition is very slow.  We are currently running NFSv4.1 performance tests on RHEL 6.3 with the fix VRS without the fix as we suspect this bug is responsible for poor performance results and for the wide standard deviation between test runs.


Version-Release number of selected component (if applicable): All versions of RHEL.


How reproducible:
100% with newly submitted NFSv4.1 file layout data server quick failover patch set.

Steps to Reproduce:
1. Start a large I/O on an nfsv4.1 pNFS mount. 
2. Network partition a data server that is receiving a large amount of data
3. Do not reconnect the data server

The client will get a data server connection error, reset the failed RPC to go to the MDS, and mark the pNFS deviceid as bad, which will reset RPC tasks going through the rpc_call_prepare state to go to the MDS instead of using pNFS. rpc_wake_up all is then called to drain (wake-up) all RPC tasks waiting on the failed Data Server Session fore channel slot table wait queue for a session slot.


  
Actual results:

We wait the RPC timeout for all in-flight RPC's to fail and be redirected, but since rpc_wake_up is broken without the fix, and only wakes up one PRIORITY task per rpc_wake_up call, then we only process up to slot # of RPC tasks waiting on the queue and the application hangs. 


Expected results:

We wait the RPC timeout for all in-flight RPC's to fail, and all RPC tasks on the slot table wait queue immediately wake up and are redirected to the MDS. The application succeeds.


Additional info:

We are running other tests and should get a new reproducer that doesn't depend on a new patch set.

Here is the message from Trond:

Date: Mon, 19 Mar 2012 21:29:32 +0000
From: Myklebust, Trond <Trond.Myklebust>
To: Steve Dickson <SteveD>
CC: Adamson, Andy <William.Adamson>

Steve,

This bug probably explains a good chunk of the random hangs that we've
been seeing (particularly on connection losses etc) in the past few
years. Please queue it up for _all_ versions of RHEL asap.

Kudos to Andy for noticing the problem and working out the bug!

Cheers
Trond

Comment 1 Steve Dickson 2012-04-04 17:21:46 UTC
*** Bug 809502 has been marked as a duplicate of this bug. ***

Comment 2 RHEL Program Management 2012-04-09 17:19:39 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 8 Miroslav Svoboda 2012-07-11 09:42:41 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A process scheduler did not handle RPC priority wait queues correctly. Consequently, the process scheduler failed to wake up all scheduled tasks as expected after RPC timeout, which caused the system to become unresponsive and could significantly decrease system performance. This update modifies the process scheduler to handle RPC priority wait queues as expected. All scheduled tasks are now properly woken up after RPC timeout and the system behaves as expected.

Comment 11 errata-xmlrpc 2013-01-08 04:51:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0006.html