Bug 661730

Summary: NFS4 clients cannot reclaim locks after server reboot [rhel-6.0.z]
Product: Red Hat Enterprise Linux 6 Reporter: RHEL Program Management <pm-rhel>
Component: kernelAssignee: Frantisek Hrbata <fhrbata>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: bfields, dhoward, jlayton, jmalanik, pbenas, pm-eus, rwheeler, sprabhu, steved, tscofield, yanwang
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-71.16.1.el6 Doc Type: Bug Fix
Doc Text:
The lock reclaim operation on a Red Hat Enterprise Linux 6 NFSv4 client did not work properly when, after a server reboot, an I/O operation which resulted in a STALE_STATEID response was performed before the RENEW call was sent to the server. This behavior was caused due to the improper use of the state flags. While investigating this bug, a different bug was discovered in the state recovery operation which resulted in a reclaim thread looping in the nfs4_reclaim_open_state() function. With this update, both operations have been fixed and work as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-22 17:40:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 638269    
Bug Blocks:    
Attachments:
Description Flags
tcpdump on kernel-2.6.32-71.18.1.el6
none
full log on kernel 2.6.32-71.18.1.el6 none

Description RHEL Program Management 2010-12-09 14:35:25 UTC
This bug has been copied from bug #638269 and has been proposed
to be backported to 6.0 z-stream (EUS).

Comment 10 yanfu,wang 2011-02-12 08:26:03 UTC
Created attachment 478366 [details]
tcpdump on kernel-2.6.32-71.18.1.el6

Comment 11 yanfu,wang 2011-02-12 08:28:47 UTC
hi,
I encounter the same problem as comment #5, the reproducer process continued writing after server reboot. What do you think about the test result? 
I've attached the tcpdump log pls refer to above, thanks.

Comment 17 yanfu,wang 2011-02-16 07:33:42 UTC
Created attachment 479031 [details]
full log on kernel 2.6.32-71.18.1.el6

Comment 18 yanfu,wang 2011-02-16 07:37:46 UTC
use the above test steps and # virsh destroy rhel6.0; virsh start rhel6.0 to crash nfs server, finally get the results:
 57.141223 10.16.42.210 -> 10.66.65.95  NFS [RPC retransmission of #394][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 60.717421 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 61.005714  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 740) <EMPTY> PUTFH;WRITE WRITE(10023)
 61.005837 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> RENEW RENEW
 61.289059  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 745) <EMPTY> RENEW RENEW(10022)
 61.289117 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> SETCLIENTID SETCLIENTID
 61.572563  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 752) <EMPTY> SETCLIENTID SETCLIENTID
 61.572591 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> SETCLIENTID_CONFIRM SETCLIENTID_CONFIRM;PUTROOTFH PUTROOTFH;GETATTR GETATTR
 61.856146  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 756) <EMPTY> SETCLIENTID_CONFIRM SETCLIENTID_CONFIRM;PUTROOTFH PUTROOTFH;GETATTR GETATTR
 61.856182 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;OPEN OPEN;GETATTR GETATTR
 62.137404  10.66.65.95 -> 10.16.42.210 NFS V1 CB_NULL Call
 62.137444 10.16.42.210 -> 10.66.65.95  NFS V1 CB_NULL Reply (Call In 764)
 62.142706  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 760) <EMPTY> PUTFH;OPEN OPEN;GETATTR GETATTR
 62.142760 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;LOCK LOCK
 62.427752  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 768) <EMPTY> PUTFH;LOCK LOCK
 62.427803 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 62.711675  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 777) <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 62.711751 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
 63.007462  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 779) <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
 64.007684 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 64.291681  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 791) <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 64.291754 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
 64.587636  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 798) <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
 65.587782 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 65.870717  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 811) <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 65.870786 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR

Comment 19 errata-xmlrpc 2011-02-22 17:40:02 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0283.html

Comment 20 Martin Prpič 2011-02-23 15:06:07 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The lock reclaim operation on a Red Hat Enterprise Linux 6 NFSv4 client did not work properly when, after a server reboot, an I/O operation which resulted in a STALE_STATEID response was performed before the RENEW call was sent to the server. This behavior was caused due to the improper use of the state flags. While investigating this bug, a different bug was discovered in the state recovery operation which resulted in a reclaim thread looping in the nfs4_reclaim_open_state() function. With this update, both operations have been fixed and work as expected.