Bug 661730

Summary:

NFS4 clients cannot reclaim locks after server reboot [rhel-6.0.z]

Product:

Red Hat Enterprise Linux 6

Reporter:

RHEL Program Management <pm-rhel>

Component:

kernel

Assignee:

Frantisek Hrbata <fhrbata>

Status:

CLOSED ERRATA

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

high

Docs Contact:

Priority:

high

Version:

6.0

CC:

bfields, dhoward, jlayton, jmalanik, pbenas, pm-eus, rwheeler, sprabhu, steved, tscofield, yanwang

Target Milestone:

Keywords:

ZStream

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-2.6.32-71.16.1.el6

Doc Type:

Bug Fix

Doc Text:

The lock reclaim operation on a Red Hat Enterprise Linux 6 NFSv4 client did not work properly when, after a server reboot, an I/O operation which resulted in a STALE_STATEID response was performed before the RENEW call was sent to the server. This behavior was caused due to the improper use of the state flags. While investigating this bug, a different bug was discovered in the state recovery operation which resulted in a reclaim thread looping in the nfs4_reclaim_open_state() function. With this update, both operations have been fixed and work as expected.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-02-22 17:40:02 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

638269

Bug Blocks:

Attachments:

Description	Flags
tcpdump on kernel-2.6.32-71.18.1.el6	none
full log on kernel 2.6.32-71.18.1.el6	none

Description RHEL Program Management 2010-12-09 14:35:25 UTC

This bug has been copied from bug #638269 and has been proposed
to be backported to 6.0 z-stream (EUS).

Comment 10 yanfu,wang 2011-02-12 08:26:03 UTC

Created attachment 478366 [details]
tcpdump on kernel-2.6.32-71.18.1.el6

Comment 11 yanfu,wang 2011-02-12 08:28:47 UTC

hi,
I encounter the same problem as comment #5, the reproducer process continued writing after server reboot. What do you think about the test result? 
I've attached the tcpdump log pls refer to above, thanks.

Comment 17 yanfu,wang 2011-02-16 07:33:42 UTC

Created attachment 479031 [details]
full log on kernel 2.6.32-71.18.1.el6

Comment 18 yanfu,wang 2011-02-16 07:37:46 UTC

use the above test steps and # virsh destroy rhel6.0; virsh start rhel6.0 to crash nfs server, finally get the results:
 57.141223 10.16.42.210 -> 10.66.65.95  NFS [RPC retransmission of #394][TCP Retransmission] V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 60.717421 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 61.005714  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 740) <EMPTY> PUTFH;WRITE WRITE(10023)
 61.005837 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> RENEW RENEW
 61.289059  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 745) <EMPTY> RENEW RENEW(10022)
 61.289117 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> SETCLIENTID SETCLIENTID
 61.572563  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 752) <EMPTY> SETCLIENTID SETCLIENTID
 61.572591 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> SETCLIENTID_CONFIRM SETCLIENTID_CONFIRM;PUTROOTFH PUTROOTFH;GETATTR GETATTR
 61.856146  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 756) <EMPTY> SETCLIENTID_CONFIRM SETCLIENTID_CONFIRM;PUTROOTFH PUTROOTFH;GETATTR GETATTR
 61.856182 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;OPEN OPEN;GETATTR GETATTR
 62.137404  10.66.65.95 -> 10.16.42.210 NFS V1 CB_NULL Call
 62.137444 10.16.42.210 -> 10.66.65.95  NFS V1 CB_NULL Reply (Call In 764)
 62.142706  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 760) <EMPTY> PUTFH;OPEN OPEN;GETATTR GETATTR
 62.142760 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;LOCK LOCK
 62.427752  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 768) <EMPTY> PUTFH;LOCK LOCK
 62.427803 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 62.711675  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 777) <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 62.711751 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
 63.007462  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 779) <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
 64.007684 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 64.291681  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 791) <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 64.291754 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
 64.587636  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 798) <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR
 65.587782 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 65.870717  10.66.65.95 -> 10.16.42.210 NFS V4 COMP Reply (Call In 811) <EMPTY> PUTFH;WRITE WRITE;GETATTR GETATTR
 65.870786 10.16.42.210 -> 10.66.65.95  NFS V4 COMP Call <EMPTY> PUTFH;COMMIT COMMIT;GETATTR GETATTR

Comment 19 errata-xmlrpc 2011-02-22 17:40:02 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0283.html

Comment 20 Martin Prpič 2011-02-23 15:06:07 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The lock reclaim operation on a Red Hat Enterprise Linux 6 NFSv4 client did not work properly when, after a server reboot, an I/O operation which resulted in a STALE_STATEID response was performed before the RENEW call was sent to the server. This behavior was caused due to the improper use of the state flags. While investigating this bug, a different bug was discovered in the state recovery operation which resulted in a reclaim thread looping in the nfs4_reclaim_open_state() function. With this update, both operations have been fixed and work as expected.