Bug 661730 - NFS4 clients cannot reclaim locks after server reboot [rhel-6.0.z]
Summary: NFS4 clients cannot reclaim locks after server reboot [rhel-6.0.z]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Frantisek Hrbata
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On: 638269
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-12-09 14:35 UTC by RHEL Program Management
Modified: 2018-11-14 16:47 UTC (History)
11 users (show)

Fixed In Version: kernel-2.6.32-71.16.1.el6
Doc Type: Bug Fix
Doc Text:
The lock reclaim operation on a Red Hat Enterprise Linux 6 NFSv4 client did not work properly when, after a server reboot, an I/O operation which resulted in a STALE_STATEID response was performed before the RENEW call was sent to the server. This behavior was caused due to the improper use of the state flags. While investigating this bug, a different bug was discovered in the state recovery operation which resulted in a reclaim thread looping in the nfs4_reclaim_open_state() function. With this update, both operations have been fixed and work as expected.
Clone Of:
Environment:
Last Closed: 2011-02-22 17:40:02 UTC
Target Upstream Version:


Attachments (Terms of Use)
tcpdump on kernel-2.6.32-71.18.1.el6 (1.03 MB, application/octet-stream)
2011-02-12 08:26 UTC, yanfu,wang
no flags Details
full log on kernel 2.6.32-71.18.1.el6 (28.00 KB, application/x-troff-man)
2011-02-16 07:33 UTC, yanfu,wang
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0283 normal SHIPPED_LIVE Moderate: kernel security, bug fix, and enhancement update 2011-02-22 17:38:22 UTC

Description RHEL Program Management 2010-12-09 14:35:25 UTC
This bug has been copied from bug #638269 and has been proposed
to be backported to 6.0 z-stream (EUS).

Comment 10 yanfu,wang 2011-02-12 08:26:03 UTC
Created attachment 478366 [details]
tcpdump on kernel-2.6.32-71.18.1.el6

Comment 11 yanfu,wang 2011-02-12 08:28:47 UTC
hi,
I encounter the same problem as comment #5, the reproducer process continued writing after server reboot. What do you think about the test result? 
I've attached the tcpdump log pls refer to above, thanks.

Comment 17 yanfu,wang 2011-02-16 07:33:42 UTC
Created attachment 479031 [details]
full log on kernel 2.6.32-71.18.1.el6

Comment 18 yanfu,wang 2011-02-16 07:37:46 UTC
use the above test steps and # virsh destroy rhel6.0; virsh start rhel6.0 to crash nfs server, finally get the results:
 57.141223 10.16.42.210 -> 10.66.65.95  NFS [RPC retransmission of #394][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 60.717421 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 61.005714  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 740) <EMPTY> PUTFH;WRITE WRITE(10023)
 61.005837 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> RENEW RENEW
 61.289059  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 745) <EMPTY> RENEW RENEW(10022)
 61.289117 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> SETCLIENTID SETCLIENTID
 61.572563  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 752) <EMPTY> SETCLIENTID SETCLIENTID
 61.572591 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> SETCLIENTID_CONFIRM SETCLIENTID_CONFIRM;PUTROOTFH PUTROOTFH;GETATTR GETATTR
 61.856146  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 756) <EMPTY> SETCLIENTID_CONFIRM SETCLIENTID_CONFIRM;PUTROOTFH PUTROOTFH;GETATTR GETATTR
 61.856182 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;OPEN OPEN;GETATTR GETATTR
 62.137404  10.66.65.95 -> 10.16.42.210 NFS V1 CB_NULL Call
 62.137444 10.16.42.210 -> 10.66.65.95  NFS V1 CB_NULL Reply (Call In 764)
 62.142706  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 760) <EMPTY> PUTFH;OPEN OPEN;GETATTR GETATTR
 62.142760 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;LOCK LOCK
 62.427752  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 768) <EMPTY> PUTFH;LOCK LOCK
 62.427803 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 62.711675  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 777) <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 62.711751 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
 63.007462  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 779) <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
 64.007684 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 64.291681  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 791) <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 64.291754 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
 64.587636  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 798) <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
 65.587782 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 65.870717  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 811) <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 65.870786 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR

Comment 19 errata-xmlrpc 2011-02-22 17:40:02 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0283.html

Comment 20 Martin Prpič 2011-02-23 15:06:07 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The lock reclaim operation on a Red Hat Enterprise Linux 6 NFSv4 client did not work properly when, after a server reboot, an I/O operation which resulted in a STALE_STATEID response was performed before the RENEW call was sent to the server. This behavior was caused due to the improper use of the state flags. While investigating this bug, a different bug was discovered in the state recovery operation which resulted in a reclaim thread looping in the nfs4_reclaim_open_state() function. With this update, both operations have been fixed and work as expected.


Note You need to log in before you can comment on or make changes to this bug.