Bug 1733347

Summary: NFSv4.0 client sending a double CLOSE (leading to EIO application failure)
Product: Red Hat Enterprise Linux 7 Reporter: Olga Kornieskaia <kolga>
Component: kernelAssignee: Steve Dickson <steved>
kernel sub component: NFS QA Contact: Zhi Li <yieli>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: high CC: bfields, dhoward, dwysocha, kolga, mora, steved, svmichel, xzhou, yieli, yoyang
Version: 7.6Keywords: Reproducer, ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.10.0-1080.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1744945 1744946 (view as bug list) Environment:
Last Closed: 2020-03-31 19:26:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1744945, 1744946    

Description Olga Kornieskaia 2019-07-25 19:24:26 UTC
Description of problem:

An NFS4.0 client that gets write delegations (thus Netapp server is probably required) sends a double close given the following set of operations:
fd1 = open(foo, RDWR) (will have to make sure a write delegation was granted)
fd2 = open(foo, RDONLY)
close(fd1)
close(fd2)  (a double close can be observed on the network trace)

This is problematic because it leads to open_owner sequencing problems between the client and server which can later on lead to application failing with EIO if application was using that open owner to do open_owner to lock_owner during sending a lock request.

This problem is in 7.5, 7.6, 7.7 (maybe early than 7.5 but I didn't check). Also in kernels 4.14 or lower.
This is not a problem in RHEL8 or in kernel 4.15 or higher.

Trond put in a number of open/close races for the 4.15:
      NFSv4: Fix OPEN / CLOSE race
      NFSv4: Add a tracepoint to document open stateid updates
      NFSv4: Fix open create exclusive when the server reboots
      NFS: Fix a typo in nfs_rename()
      NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.
      NFSv4: Don't try to CLOSE if the stateid 'other' field has changed
      pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-close
      NFSv4: Retry NFS4ERR_OLD_STATEID errors in layoutreturn
      (**) NFSv4: cleanup nfs4_close_done
      (**) NFSv4: Clean up nfs4_delegreturn_done
      NFSv4: Check the open stateid when searching for expired state
      NFSv4: nfs_set_open_stateid must not trigger state recovery for closed state
      NFSv4: Replace closed stateids with the "invalid special stateid"

All but the (**) patches apply cleanly to the RHEL7.6 kernel tree. I think all of the patches should be consider. However, the patch that fixes the double close is
" NFSv4: Replace closed stateids with the "invalid special stateid"" commit fcd8843c406b46433857ae45e5e9d84b01a7d20b. At minimum this patch is needed to fix the problem


An application that we used to trigger an EIO (though not necessarily every time) is nfstest_io:
[mora@netapp-17 test]$ sudo mount -o vers=4.0 192.168.0.61:/vol1 /mnt/t
[mora@netapp-17 test]$ ./nfstest_io -d /mnt/t/data -v all -n 10 -r 600 -e --createlog --logdir /home/mora/logs -s 1562961880971

I don't recommend using it to trigger the issue as it doesn't happen every time (as the runs are not deterministic). However, running it will trigger double close and BAD_SEQID on the network trace every time. 

We will modify the nfstest_delegation (in the near future) to check for this error.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Jorge Mora 2019-07-25 21:28:31 UTC
I have modified nfstest_delegation script to check for the double CLOSE.
Here are the steps to reproduce it.

How reproducible:
100%

Steps to Reproduce:
1. Get the latest NFStest
   git clone git://linux-nfs.org/projects/mora/nfstest

2. Run any of the delegation tests which exposes this issue (basic07-basic12)
   cd nfstest
   export PYTHONPATH=$PWD
   cd test
   ./nfstest_delegation -v all -s <server-ip> -e </exported/volume> --nfsversion 4.0 basic07


Example:
./nfstest_delegation -v all -s 192.168.0.61 -e /vol1 --nfsversion 4.0 basic07
*** Basic WRITE delegation test using RDWR open while reading
    TEST: Running test 'basic07'
    PASS: Open file for RDWR should succeed
    PASS: Read file should succeed
    PASS: Open file for READ on same process should succeed
    PASS: Read file on same process should succeed
    PASS: Open file for READ from a different process should succeed
    PASS: Read file from a different process should succeed
    PASS: OPEN should be sent
    PASS: OPEN should be sent with CLAIM_NULL
    PASS: OPEN should be sent with the name of the file to be opened
    PASS: OPEN should be sent with the filehandle of the directory
    PASS: WRITE delegation should be granted
    PASS: OPEN should not be sent for the same file
    PASS: CLOSE should be sent to the server
    PASS: CLOSE should be sent with correct OPEN stateid
    FAIL: CLOSE was sent twice to the server
    PASS: DELEGRETURN should be sent after the close
    PASS: DELEGRETURN should be sent with the delegation stateid
    TIME: 1.588014s

17 tests (16 passed, 1 failed)

Total time: 1.731056s

The packet trace shows the client sends two CLOSE operations for the same file:
./nfstest_pkt /tmp/nfstest_delegation_20190725152045_1.cap
78 2019-07-25 15:20:47.257295 192.168.0.26  -> 192.168.0.61  NFS   v4.0 call  xid:0x82fee1fd PUTFH;CLOSE FH:0xfb47fd1e stid:1,0xbb0d
79 2019-07-25 15:20:47.258143 192.168.0.61  -> 192.168.0.26  NFS   v4.0 reply xid:0x82fee1fd PUTFH;CLOSE stid:1,0xbb0d
80 2019-07-25 15:20:47.258232 192.168.0.26  -> 192.168.0.61  NFS   v4.0 call  xid:0x83fee1fd PUTFH;CLOSE FH:0xfb47fd1e stid:1,0xbb0d
81 2019-07-25 15:20:47.258659 192.168.0.61  -> 192.168.0.26  NFS   v4.0 reply xid:0x83fee1fd PUTFH;CLOSE stid:1,0xbb0d

Comment 6 Dave Wysochanski 2019-08-13 14:48:57 UTC
Steve / Olga,

This sounds identical to a bug that was supposedly previously fixed:
https://bugzilla.redhat.com/show_bug.cgi?id=1542191

Do you think maybe in that bug we all got confused due to:
- presence of backport issue with that other patch (see https://bugzilla.redhat.com/show_bug.cgi?id=1542191#c12)
- difficulty reproducing this issue

There was no indication in bug 1542191 anyone validated the fix (if it was validated it was not stated anywhere in the bug).

Are we sure this only needs one patch to be backported to the z-streams?
" NFSv4: Replace closed stateids with the "invalid special stateid"" commit fcd8843c406b46433857ae45e5e9d84b01a7d20b. At minimum this patch is needed to fix the problem

Comment 7 Olga Kornieskaia 2019-08-13 14:58:56 UTC
(In reply to Dave Wysochanski from comment #6)
> Steve / Olga,
> 
> This sounds identical to a bug that was supposedly previously fixed:
> https://bugzilla.redhat.com/show_bug.cgi?id=1542191

No this is a different trigger. Previous case had to do with recovering state.
 
> Do you think maybe in that bug we all got confused due to:
> - presence of backport issue with that other patch (see
> https://bugzilla.redhat.com/show_bug.cgi?id=1542191#c12)
> - difficulty reproducing this issue

No I don't think so.

> There was no indication in bug 1542191 anyone validated the fix (if it was
> validated it was not stated anywhere in the bug).
> 
> Are we sure this only needs one patch to be backported to the z-streams?
> " NFSv4: Replace closed stateids with the "invalid special stateid"" commit
> fcd8843c406b46433857ae45e5e9d84b01a7d20b. At minimum this patch is needed to
> fix the problem

According to my testing, it is the only thing that's needed. Jorge has modified nfstest to easily check for this case so it'll be easy to see if I was wrong.

Comment 8 Dave Wysochanski 2019-08-13 15:16:33 UTC
Thanks Olga.  Given the extreme similarities, it will be close to a nightmare to diagnose this vs previous bug.  I'm going to prep a test kernel and see if our QE will test this one patch out on RHEL7.7 to start.  If that goes ok we can push into 7.7.z and consider other z-streams.

Comment 13 Dave Wysochanski 2019-08-14 14:37:08 UTC
I asked whether there were any open cases with this issue but so far no one has responded that there is one.  I think if NetApp has any customer hitting this issue or having state exhaustion as a result we can easily try a test kernel.  I built 3.10.0-1062.el7 plus the one patch fcd8843 NFSv4: Replace closed stateids with the "invalid special stateid" and can make it available.  Normally to do z-streams we need a customer wanting it and/or strong validation patch(es) fix known issue and do not cause other side-effects, or it is a known regression, etc.

We could not get the test from comment #2 to run internally.  Both filers we have seem to be rejecting write delegation or the test is bad somehow - maybe only works on certain ontap versions?  Did ontap maybe change the logic on how it hands out write delegations?


*** Basic WRITE delegation test using RDWR open while reading
    TEST: Running test 'basic07'
    DBG3: 14:43:26.820030 - Sync all buffers to disk
    DBG2: 14:43:26.820431 - Unmount volume: umount -f /mnt/t
    DBG2: 14:43:26.834613 - Start remote procedure server locally
    DBG2: 14:43:27.158384 - Trace start: /usr/sbin/tcpdump -i eth0 -n -B 196608 -s 0 -w /tmp/nfstest_delegation_20190813144318_1.cap host <DELETED>
    DBG2: 14:43:27.431549 - Mount volume: mount -o vers=4.0,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 <DELETED>
    DBG5: 14:43:28.121777 - Get the actual NFS version of mount point: findmnt /mnt/t
    DBG6: 14:43:28.142828 -     NFS version of mount point: 4.0
    DBG4: 14:43:28.143462 - Open /mnt/t/nfstest_delegation_20190813144318_f_1 so open owner sticks around
    DBG2: 14:43:28.156584 - Open file for RDWR [/mnt/t/nfstest_delegation_20190813144318_f_2]
    PASS: Open file for RDWR should succeed
    DBG3: 14:43:28.160387 - Read file [/mnt/t/nfstest_delegation_20190813144318_f_2]
    PASS: Read file should succeed
    DBG2: 14:43:28.264360 - Open file for READ on same process [/mnt/t/nfstest_delegation_20190813144318_f_2]
    PASS: Open file for READ on same process should succeed
    DBG3: 14:43:28.271569 - Read file on same process [/mnt/t/nfstest_delegation_20190813144318_f_2]
    PASS: Read file on same process should succeed
    DBG2: 14:43:28.372581 - Open file for READ from a different process [/mnt/t/nfstest_delegation_20190813144318_f_2]
    PASS: Open file for READ from a different process should succeed
    DBG3: 14:43:28.379687 - Read file from a different process [/mnt/t/nfstest_delegation_20190813144318_f_2]
    PASS: Read file from a different process should succeed
    DBG3: 14:43:28.591665 - Sync all buffers to disk
    DBG2: 14:43:28.605811 - Unmount volume: umount -f /mnt/t
    DBG2: 14:43:28.643743 - Trace stop
    DBG2: 14:43:28.644572 - killall tcpdump
    DBG1: 14:43:28.789474 - trace_open [/tmp/nfstest_delegation_20190813144318_1.cap]
    FAIL: WRITE delegation should be granted
    TIME: 2.095088s

    DBG7: 14:43:28.914912 - CLEANUP starts
    DBG2: 14:43:28.915313 - Mount volume: mount -o vers=4.0,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 <DELETED>
    DBG5: 14:43:29.604300 - Get the actual NFS version of mount point: findmnt /mnt/t
    DBG6: 14:43:29.624233 -     NFS version of mount point: 4.0
    DBG3: 14:43:29.624783 -     Stop remote procedure server locally
    DBG5: 14:43:29.632720 -     Removing file [/mnt/t/nfstest_delegation_20190813144318_f_2]
    DBG5: 14:43:29.643995 -     Removing file [/mnt/t/nfstest_delegation_20190813144318_f_1]
    DBG3: 14:43:29.647685 - Sync all buffers to disk
    DBG2: 14:43:29.661206 - Unmount volume: umount -f /mnt/t
    DBG7: 14:43:29.696403 - CLEANUP done
    TIME: 0.796754s

    DBG7: 14:43:29.711772 - Calling DelegTest() destructor
    DBG3: 14:43:29.712110 - Removing mount point directory: rmdir /mnt/t

7 tests (6 passed, 1 failed)

Total time: 11.625838s

Comment 15 Olga Kornieskaia 2019-08-14 15:12:46 UTC
(In reply to Dave Wysochanski from comment #13)
> I asked whether there were any open cases with this issue but so far no one
> has responded that there is one.  I think if NetApp has any customer hitting
> this issue or having state exhaustion as a result we can easily try a test

There is a customer as the result of which all this came to be. I have reached out to Netapp support to ask the customer to file a customer case with Redhat and reference this bugzilla. I will ping my end to see where things stand with it. Given it's still vacation season I'm thinking things might move slow.

> kernel.  I built 3.10.0-1062.el7 plus the one patch fcd8843 NFSv4: Replace
> closed stateids with the "invalid special stateid" and can make it
> available.  Normally to do z-streams we need a customer wanting it and/or
> strong validation patch(es) fix known issue and do not cause other
> side-effects, or it is a known regression, etc.

Yes understood, that's why I asked the support to file a case with Redhat (otherwise I told them it'll only be fixed in 7.8).
 
> We could not get the test from comment #2 to run internally.  Both filers we
> have seem to be rejecting write delegation or the test is bad somehow -
> maybe only works on certain ontap versions?  Did ontap maybe change the
> logic on how it hands out write delegations?

It looks like server isn't given a delegation. It most cases it's because you have a firewall up and there is no back channel to do recalls so it won't give out delegation. Please check the firewall and do a sanity check (just like a cat test or echo 1 > test to make sure you have delegations). Also make sure that 4.0 delegations are enabled on the server (as 4.1 and 4.0 have to be enabled individually)


> *** Basic WRITE delegation test using RDWR open while reading
>     TEST: Running test 'basic07'
>     DBG3: 14:43:26.820030 - Sync all buffers to disk
>     DBG2: 14:43:26.820431 - Unmount volume: umount -f /mnt/t
>     DBG2: 14:43:26.834613 - Start remote procedure server locally
>     DBG2: 14:43:27.158384 - Trace start: /usr/sbin/tcpdump -i eth0 -n -B
> 196608 -s 0 -w /tmp/nfstest_delegation_20190813144318_1.cap host <DELETED>
>     DBG2: 14:43:27.431549 - Mount volume: mount -o
> vers=4.0,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 <DELETED>
>     DBG5: 14:43:28.121777 - Get the actual NFS version of mount point:
> findmnt /mnt/t
>     DBG6: 14:43:28.142828 -     NFS version of mount point: 4.0
>     DBG4: 14:43:28.143462 - Open
> /mnt/t/nfstest_delegation_20190813144318_f_1 so open owner sticks around
>     DBG2: 14:43:28.156584 - Open file for RDWR
> [/mnt/t/nfstest_delegation_20190813144318_f_2]
>     PASS: Open file for RDWR should succeed
>     DBG3: 14:43:28.160387 - Read file
> [/mnt/t/nfstest_delegation_20190813144318_f_2]
>     PASS: Read file should succeed
>     DBG2: 14:43:28.264360 - Open file for READ on same process
> [/mnt/t/nfstest_delegation_20190813144318_f_2]
>     PASS: Open file for READ on same process should succeed
>     DBG3: 14:43:28.271569 - Read file on same process
> [/mnt/t/nfstest_delegation_20190813144318_f_2]
>     PASS: Read file on same process should succeed
>     DBG2: 14:43:28.372581 - Open file for READ from a different process
> [/mnt/t/nfstest_delegation_20190813144318_f_2]
>     PASS: Open file for READ from a different process should succeed
>     DBG3: 14:43:28.379687 - Read file from a different process
> [/mnt/t/nfstest_delegation_20190813144318_f_2]
>     PASS: Read file from a different process should succeed
>     DBG3: 14:43:28.591665 - Sync all buffers to disk
>     DBG2: 14:43:28.605811 - Unmount volume: umount -f /mnt/t
>     DBG2: 14:43:28.643743 - Trace stop
>     DBG2: 14:43:28.644572 - killall tcpdump
>     DBG1: 14:43:28.789474 - trace_open
> [/tmp/nfstest_delegation_20190813144318_1.cap]
>     FAIL: WRITE delegation should be granted
>     TIME: 2.095088s
> 
>     DBG7: 14:43:28.914912 - CLEANUP starts
>     DBG2: 14:43:28.915313 - Mount volume: mount -o
> vers=4.0,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 <DELETED>
>     DBG5: 14:43:29.604300 - Get the actual NFS version of mount point:
> findmnt /mnt/t
>     DBG6: 14:43:29.624233 -     NFS version of mount point: 4.0
>     DBG3: 14:43:29.624783 -     Stop remote procedure server locally
>     DBG5: 14:43:29.632720 -     Removing file
> [/mnt/t/nfstest_delegation_20190813144318_f_2]
>     DBG5: 14:43:29.643995 -     Removing file
> [/mnt/t/nfstest_delegation_20190813144318_f_1]
>     DBG3: 14:43:29.647685 - Sync all buffers to disk
>     DBG2: 14:43:29.661206 - Unmount volume: umount -f /mnt/t
>     DBG7: 14:43:29.696403 - CLEANUP done
>     TIME: 0.796754s
> 
>     DBG7: 14:43:29.711772 - Calling DelegTest() destructor
>     DBG3: 14:43:29.712110 - Removing mount point directory: rmdir /mnt/t
> 
> 7 tests (6 passed, 1 failed)
> 
> Total time: 11.625838s

Comment 16 Dave Wysochanski 2019-08-14 15:47:45 UTC
Thanks Olga!  Ok I got the test working now.  You'd think by now I would not forget the backchannel, but indeed looks like for me there was something in between blocking it but fixed now.

PASS:

[root@rhel7-node1-esx-rdu test]# uname -r
3.10.0-1062.el7.bz1733347.1.x86_64

*** Basic WRITE delegation test using RDWR open while reading
    TEST: Running test 'basic07'
    DBG3: 11:24:05.179965 - Sync all buffers to disk
    DBG2: 11:24:05.180433 - Unmount volume: umount -f /mnt/t
    DBG2: 11:24:05.196460 - Start remote procedure server locally
    DBG2: 11:24:05.521842 - Trace start: /usr/sbin/tcpdump -i ens32 -n -B 196608 -s 0 -w /tmp/nfstest_delegation_20190814112404_1.cap <DELETED>
    DBG2: 11:24:05.731549 - Mount volume: mount -o vers=4.0,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 <DELETED>
    DBG5: 11:24:05.792999 - Get the actual NFS version of mount point: findmnt /mnt/t
    DBG6: 11:24:05.811105 -     NFS version of mount point: 4.0
    DBG4: 11:24:05.811644 - Open /mnt/t/nfstest_delegation_20190814112404_f_1 so open owner sticks around
    DBG2: 11:24:05.814022 - Open file for RDWR [/mnt/t/nfstest_delegation_20190814112404_f_2]
    PASS: Open file for RDWR should succeed
    DBG3: 11:24:05.815396 - Read file [/mnt/t/nfstest_delegation_20190814112404_f_2]
    PASS: Read file should succeed
    DBG2: 11:24:05.916850 - Open file for READ on same process [/mnt/t/nfstest_delegation_20190814112404_f_2]
    PASS: Open file for READ on same process should succeed
    DBG3: 11:24:05.917754 - Read file on same process [/mnt/t/nfstest_delegation_20190814112404_f_2]
    PASS: Read file on same process should succeed
    DBG2: 11:24:06.018279 - Open file for READ from a different process [/mnt/t/nfstest_delegation_20190814112404_f_2]
    PASS: Open file for READ from a different process should succeed
    DBG3: 11:24:06.019718 - Read file from a different process [/mnt/t/nfstest_delegation_20190814112404_f_2]
    PASS: Read file from a different process should succeed
    DBG3: 11:24:06.222342 - Sync all buffers to disk
    DBG2: 11:24:06.228723 - Unmount volume: umount -f /mnt/t
    DBG2: 11:24:06.265938 - Trace stop
    DBG2: 11:24:06.266580 - killall tcpdump
    DBG1: 11:24:06.447325 - trace_open [/tmp/nfstest_delegation_20190814112404_1.cap]
    PASS: OPEN should be sent
    PASS: OPEN should be sent with CLAIM_NULL
    PASS: OPEN should be sent with the name of the file to be opened
    PASS: OPEN should be sent with the filehandle of the directory
    PASS: WRITE delegation should be granted
    PASS: OPEN should not be sent for the same file
    PASS: CLOSE should be sent to the server
    PASS: CLOSE should be sent with correct OPEN stateid
    PASS: DELEGRETURN should be sent after the close
    PASS: DELEGRETURN should be sent with the delegation stateid
    TIME: 1.434247s

    DBG7: 11:24:06.613934 - CLEANUP starts
    DBG2: 11:24:06.614321 - Mount volume: mount -o vers=4.0,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 <DELETED>
    DBG5: 11:24:06.676012 - Get the actual NFS version of mount point: findmnt /mnt/t
    DBG6: 11:24:06.694755 -     NFS version of mount point: 4.0
    DBG3: 11:24:06.695378 -     Stop remote procedure server locally
    DBG5: 11:24:06.697693 -     Removing trace file [/tmp/nfstest_delegation_20190814112404_1.cap]
    DBG5: 11:24:06.707159 -     Removing file [/mnt/t/nfstest_delegation_20190814112404_f_2]
    DBG5: 11:24:06.710000 -     Removing file [/mnt/t/nfstest_delegation_20190814112404_f_1]
    DBG3: 11:24:06.711218 - Sync all buffers to disk
    DBG2: 11:24:06.712376 - Unmount volume: umount -f /mnt/t
    DBG7: 11:24:06.747847 - CLEANUP done
    TIME: 0.151087s

    DBG7: 11:24:06.765156 - Calling DelegTest() destructor
    DBG3: 11:24:06.765516 - Removing mount point directory: rmdir /mnt/t

16 tests (16 passed, 0 failed)

Total time: 2.369198s


FAIL:
[root@rhel7-node1-esx-rdu test]# uname -r
3.10.0-1062.el7.x86_64
*** Basic WRITE delegation test using RDWR open while reading
    TEST: Running test 'basic07'
    DBG3: 11:19:57.957700 - Sync all buffers to disk
    DBG2: 11:19:57.958332 - Unmount volume: umount -f /mnt/t
    DBG2: 11:19:57.973910 - Start remote procedure server locally
    DBG2: 11:19:58.296777 - Trace start: /usr/sbin/tcpdump -i ens32 -n -B 196608 -s 0 -w /tmp/nfstest_delegation_20190814111957_1.cap <DELETED>
    DBG2: 11:19:58.478046 - Mount volume: mount -o vers=4.0,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 <DELETED>
    DBG5: 11:19:58.542649 - Get the actual NFS version of mount point: findmnt /mnt/t
    DBG6: 11:19:58.560403 -     NFS version of mount point: 4.0
    DBG4: 11:19:58.560986 - Open /mnt/t/nfstest_delegation_20190814111957_f_1 so open owner sticks around
    DBG2: 11:19:58.563982 - Open file for RDWR [/mnt/t/nfstest_delegation_20190814111957_f_2]
    PASS: Open file for RDWR should succeed
    DBG3: 11:19:58.566184 - Read file [/mnt/t/nfstest_delegation_20190814111957_f_2]
    PASS: Read file should succeed
    DBG2: 11:19:58.667644 - Open file for READ on same process [/mnt/t/nfstest_delegation_20190814111957_f_2]
    PASS: Open file for READ on same process should succeed
    DBG3: 11:19:58.668670 - Read file on same process [/mnt/t/nfstest_delegation_20190814111957_f_2]
    PASS: Read file on same process should succeed
    DBG2: 11:19:58.769187 - Open file for READ from a different process [/mnt/t/nfstest_delegation_20190814111957_f_2]
    PASS: Open file for READ from a different process should succeed
    DBG3: 11:19:58.772851 - Read file from a different process [/mnt/t/nfstest_delegation_20190814111957_f_2]
    PASS: Read file from a different process should succeed
    DBG3: 11:19:58.976236 - Sync all buffers to disk
    DBG2: 11:19:58.982698 - Unmount volume: umount -f /mnt/t
    DBG2: 11:19:59.017684 - Trace stop
    DBG2: 11:19:59.018301 - killall tcpdump
    DBG1: 11:19:59.179333 - trace_open [/tmp/nfstest_delegation_20190814111957_1.cap]
    PASS: OPEN should be sent
    PASS: OPEN should be sent with CLAIM_NULL
    PASS: OPEN should be sent with the name of the file to be opened
    PASS: OPEN should be sent with the filehandle of the directory
    PASS: WRITE delegation should be granted
    PASS: OPEN should not be sent for the same file
    PASS: CLOSE should be sent to the server
    PASS: CLOSE should be sent with correct OPEN stateid
    FAIL: CLOSE was sent twice to the server
    PASS: DELEGRETURN should be sent after the close
    PASS: DELEGRETURN should be sent with the delegation stateid
    TIME: 1.395959s

    DBG7: 11:19:59.353287 - CLEANUP starts
    DBG2: 11:19:59.353686 - Mount volume: mount -o vers=4.0,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 <DELETED>
    DBG5: 11:19:59.419457 - Get the actual NFS version of mount point: findmnt /mnt/t
    DBG6: 11:19:59.438251 -     NFS version of mount point: 4.0
    DBG3: 11:19:59.438863 -     Stop remote procedure server locally
    DBG5: 11:19:59.442250 -     Removing file [/mnt/t/nfstest_delegation_20190814111957_f_2]
    DBG5: 11:19:59.445303 -     Removing file [/mnt/t/nfstest_delegation_20190814111957_f_1]
    DBG3: 11:19:59.446470 - Sync all buffers to disk
    DBG2: 11:19:59.448352 - Unmount volume: umount -f /mnt/t
    DBG7: 11:19:59.486514 - CLEANUP done
    TIME: 0.150495s

    DBG7: 11:19:59.503844 - Calling DelegTest() destructor
    DBG3: 11:19:59.504229 - Removing mount point directory: rmdir /mnt/t

17 tests (16 passed, 1 failed)

Comment 18 Olga Kornieskaia 2019-08-14 15:51:49 UTC
(In reply to Dave Wysochanski from comment #16)
> Thanks Olga!  Ok I got the test working now.  You'd think by now I would not
> forget the backchannel, but indeed looks like for me there was something in
> between blocking it but fixed now.

Yep I forget about it too :-) I'm glad you have a reliable reproducer now (many thanks or Jorge).
 
> PASS:
> 
> [root@rhel7-node1-esx-rdu test]# uname -r
> 3.10.0-1062.el7.bz1733347.1.x86_64
> 
> *** Basic WRITE delegation test using RDWR open while reading
>     TEST: Running test 'basic07'
>     DBG3: 11:24:05.179965 - Sync all buffers to disk
>     DBG2: 11:24:05.180433 - Unmount volume: umount -f /mnt/t
>     DBG2: 11:24:05.196460 - Start remote procedure server locally
>     DBG2: 11:24:05.521842 - Trace start: /usr/sbin/tcpdump -i ens32 -n -B
> 196608 -s 0 -w /tmp/nfstest_delegation_20190814112404_1.cap <DELETED>
>     DBG2: 11:24:05.731549 - Mount volume: mount -o
> vers=4.0,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 <DELETED>
>     DBG5: 11:24:05.792999 - Get the actual NFS version of mount point:
> findmnt /mnt/t
>     DBG6: 11:24:05.811105 -     NFS version of mount point: 4.0
>     DBG4: 11:24:05.811644 - Open
> /mnt/t/nfstest_delegation_20190814112404_f_1 so open owner sticks around
>     DBG2: 11:24:05.814022 - Open file for RDWR
> [/mnt/t/nfstest_delegation_20190814112404_f_2]
>     PASS: Open file for RDWR should succeed
>     DBG3: 11:24:05.815396 - Read file
> [/mnt/t/nfstest_delegation_20190814112404_f_2]
>     PASS: Read file should succeed
>     DBG2: 11:24:05.916850 - Open file for READ on same process
> [/mnt/t/nfstest_delegation_20190814112404_f_2]
>     PASS: Open file for READ on same process should succeed
>     DBG3: 11:24:05.917754 - Read file on same process
> [/mnt/t/nfstest_delegation_20190814112404_f_2]
>     PASS: Read file on same process should succeed
>     DBG2: 11:24:06.018279 - Open file for READ from a different process
> [/mnt/t/nfstest_delegation_20190814112404_f_2]
>     PASS: Open file for READ from a different process should succeed
>     DBG3: 11:24:06.019718 - Read file from a different process
> [/mnt/t/nfstest_delegation_20190814112404_f_2]
>     PASS: Read file from a different process should succeed
>     DBG3: 11:24:06.222342 - Sync all buffers to disk
>     DBG2: 11:24:06.228723 - Unmount volume: umount -f /mnt/t
>     DBG2: 11:24:06.265938 - Trace stop
>     DBG2: 11:24:06.266580 - killall tcpdump
>     DBG1: 11:24:06.447325 - trace_open
> [/tmp/nfstest_delegation_20190814112404_1.cap]
>     PASS: OPEN should be sent
>     PASS: OPEN should be sent with CLAIM_NULL
>     PASS: OPEN should be sent with the name of the file to be opened
>     PASS: OPEN should be sent with the filehandle of the directory
>     PASS: WRITE delegation should be granted
>     PASS: OPEN should not be sent for the same file
>     PASS: CLOSE should be sent to the server
>     PASS: CLOSE should be sent with correct OPEN stateid
>     PASS: DELEGRETURN should be sent after the close
>     PASS: DELEGRETURN should be sent with the delegation stateid
>     TIME: 1.434247s
> 
>     DBG7: 11:24:06.613934 - CLEANUP starts
>     DBG2: 11:24:06.614321 - Mount volume: mount -o
> vers=4.0,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 <DELETED>
>     DBG5: 11:24:06.676012 - Get the actual NFS version of mount point:
> findmnt /mnt/t
>     DBG6: 11:24:06.694755 -     NFS version of mount point: 4.0
>     DBG3: 11:24:06.695378 -     Stop remote procedure server locally
>     DBG5: 11:24:06.697693 -     Removing trace file
> [/tmp/nfstest_delegation_20190814112404_1.cap]
>     DBG5: 11:24:06.707159 -     Removing file
> [/mnt/t/nfstest_delegation_20190814112404_f_2]
>     DBG5: 11:24:06.710000 -     Removing file
> [/mnt/t/nfstest_delegation_20190814112404_f_1]
>     DBG3: 11:24:06.711218 - Sync all buffers to disk
>     DBG2: 11:24:06.712376 - Unmount volume: umount -f /mnt/t
>     DBG7: 11:24:06.747847 - CLEANUP done
>     TIME: 0.151087s
> 
>     DBG7: 11:24:06.765156 - Calling DelegTest() destructor
>     DBG3: 11:24:06.765516 - Removing mount point directory: rmdir /mnt/t
> 
> 16 tests (16 passed, 0 failed)
> 
> Total time: 2.369198s
> 
> 
> FAIL:
> [root@rhel7-node1-esx-rdu test]# uname -r
> 3.10.0-1062.el7.x86_64
> *** Basic WRITE delegation test using RDWR open while reading
>     TEST: Running test 'basic07'
>     DBG3: 11:19:57.957700 - Sync all buffers to disk
>     DBG2: 11:19:57.958332 - Unmount volume: umount -f /mnt/t
>     DBG2: 11:19:57.973910 - Start remote procedure server locally
>     DBG2: 11:19:58.296777 - Trace start: /usr/sbin/tcpdump -i ens32 -n -B
> 196608 -s 0 -w /tmp/nfstest_delegation_20190814111957_1.cap <DELETED>
>     DBG2: 11:19:58.478046 - Mount volume: mount -o
> vers=4.0,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 <DELETED>
>     DBG5: 11:19:58.542649 - Get the actual NFS version of mount point:
> findmnt /mnt/t
>     DBG6: 11:19:58.560403 -     NFS version of mount point: 4.0
>     DBG4: 11:19:58.560986 - Open
> /mnt/t/nfstest_delegation_20190814111957_f_1 so open owner sticks around
>     DBG2: 11:19:58.563982 - Open file for RDWR
> [/mnt/t/nfstest_delegation_20190814111957_f_2]
>     PASS: Open file for RDWR should succeed
>     DBG3: 11:19:58.566184 - Read file
> [/mnt/t/nfstest_delegation_20190814111957_f_2]
>     PASS: Read file should succeed
>     DBG2: 11:19:58.667644 - Open file for READ on same process
> [/mnt/t/nfstest_delegation_20190814111957_f_2]
>     PASS: Open file for READ on same process should succeed
>     DBG3: 11:19:58.668670 - Read file on same process
> [/mnt/t/nfstest_delegation_20190814111957_f_2]
>     PASS: Read file on same process should succeed
>     DBG2: 11:19:58.769187 - Open file for READ from a different process
> [/mnt/t/nfstest_delegation_20190814111957_f_2]
>     PASS: Open file for READ from a different process should succeed
>     DBG3: 11:19:58.772851 - Read file from a different process
> [/mnt/t/nfstest_delegation_20190814111957_f_2]
>     PASS: Read file from a different process should succeed
>     DBG3: 11:19:58.976236 - Sync all buffers to disk
>     DBG2: 11:19:58.982698 - Unmount volume: umount -f /mnt/t
>     DBG2: 11:19:59.017684 - Trace stop
>     DBG2: 11:19:59.018301 - killall tcpdump
>     DBG1: 11:19:59.179333 - trace_open
> [/tmp/nfstest_delegation_20190814111957_1.cap]
>     PASS: OPEN should be sent
>     PASS: OPEN should be sent with CLAIM_NULL
>     PASS: OPEN should be sent with the name of the file to be opened
>     PASS: OPEN should be sent with the filehandle of the directory
>     PASS: WRITE delegation should be granted
>     PASS: OPEN should not be sent for the same file
>     PASS: CLOSE should be sent to the server
>     PASS: CLOSE should be sent with correct OPEN stateid
>     FAIL: CLOSE was sent twice to the server
>     PASS: DELEGRETURN should be sent after the close
>     PASS: DELEGRETURN should be sent with the delegation stateid
>     TIME: 1.395959s
> 
>     DBG7: 11:19:59.353287 - CLEANUP starts
>     DBG2: 11:19:59.353686 - Mount volume: mount -o
> vers=4.0,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 <DELETED>
>     DBG5: 11:19:59.419457 - Get the actual NFS version of mount point:
> findmnt /mnt/t
>     DBG6: 11:19:59.438251 -     NFS version of mount point: 4.0
>     DBG3: 11:19:59.438863 -     Stop remote procedure server locally
>     DBG5: 11:19:59.442250 -     Removing file
> [/mnt/t/nfstest_delegation_20190814111957_f_2]
>     DBG5: 11:19:59.445303 -     Removing file
> [/mnt/t/nfstest_delegation_20190814111957_f_1]
>     DBG3: 11:19:59.446470 - Sync all buffers to disk
>     DBG2: 11:19:59.448352 - Unmount volume: umount -f /mnt/t
>     DBG7: 11:19:59.486514 - CLEANUP done
>     TIME: 0.150495s
> 
>     DBG7: 11:19:59.503844 - Calling DelegTest() destructor
>     DBG3: 11:19:59.504229 - Removing mount point directory: rmdir /mnt/t
> 
> 17 tests (16 passed, 1 failed)

Comment 19 Olga Kornieskaia 2019-08-14 15:53:34 UTC
One ask. If and when the zstream(s) bugzilla(s) will be created can you cc me to the bugzillas so that I can keep track and know which releases they are. Thank you.

Comment 20 Dave Wysochanski 2019-08-14 16:02:10 UTC
(In reply to Olga Kornieskaia from comment #19)
> One ask. If and when the zstream(s) bugzilla(s) will be created can you cc
> me to the bugzillas so that I can keep track and know which releases they
> are. Thank you.

I think clones should retain the CC list so I've added you there.  It should at least go to 7.7.z but I cannot say for sure or when at this point.

Comment 21 Jan Stancek 2019-08-23 08:17:28 UTC
Patch(es) committed on kernel-3.10.0-1080.el7

Comment 27 Zhi Li 2019-08-26 02:32:22 UTC
Moving to VERIFIED according to comment#26.

Comment 29 errata-xmlrpc 2020-03-31 19:26:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1016