RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2107360 - knfsd not always recalling delegations on contended access
Summary: knfsd not always recalling delegations on contended access
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Jeff Layton
QA Contact: Zhi Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-14 19:30 UTC by Olga Kornieskaia
Modified: 2022-11-15 12:45 UTC (History)
8 users (show)

Fixed In Version: kernel-5.14.0-150.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-15 11:09:31 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nfstest_interop network trace (23.20 KB, application/gzip)
2022-07-14 19:30 UTC, Olga Kornieskaia
no flags Details
tar with nfstest_interop log and 6 network trace files (165.90 KB, application/gzip)
2022-07-14 19:32 UTC, Olga Kornieskaia
no flags Details
PATCH: nfs: CTO can be skipped, depending on order of reference puts (2.17 KB, patch)
2022-07-27 18:09 UTC, Jeff Layton
no flags Details | Diff
test18 failure (103.55 KB, application/vnd.tcpdump.pcap)
2022-07-28 15:22 UTC, Scott Mayhew
no flags Details
test21 failure (97.71 KB, application/vnd.tcpdump.pcap)
2022-07-28 15:24 UTC, Scott Mayhew
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src/kernel centos-stream-9 merge_requests 1255 0 None opened nfsd: eliminate the NFSD_FILE_BREAK_* flags 2022-08-10 16:11:25 UTC
Red Hat Issue Tracker RHELPLAN-127791 0 None None None 2022-07-14 19:32:11 UTC
Red Hat Product Errata RHSA-2022:8267 0 None None None 2022-11-15 11:09:48 UTC

Description Olga Kornieskaia 2022-07-14 19:30:44 UTC
Created attachment 1897214 [details]
nfstest_interop network trace

Description of problem:

During the Spring 2022 Bakeathon, Netapp testing ran into a data corruption issue using a RHEL9.0 GA kernel running against a linux server, during the nfstest_interop test.

The problem has been extremely difficult to reproduce. It was re-produced only once more after that.

*** Verify appending data with NFSv3 and NFSv4.1 is correctly read using NFSv4.1
    TEST: Running test 'test18'
    DBG2: 01:11:51.776173 - Capture reference NFS stats: /usr/sbin/nfsstat > /tmp/nfsstat_e549vfk3
    DBG2: 01:11:51.779789 - Trace start: /usr/bin/sudo /usr/sbin/tcpdump -i tun0 -n -B 196608 -s 0 -w /home/mora/logs/fedora-rawhide/krb5/nfstest_interop_20220506_010905_018.cap host 100.97.13.12
    DBG2: 01:11:52.905995 - Mount volume: /usr/bin/sudo mount -o vers=4.1,proto=tcp,sec=krb5,hard,rsize=4096,wsize=4096 fedora-rawhide:/export/ext4 /mnt/t
    DBG1: 01:11:54.910032 - Get routing info: /usr/sbin/ip route get 100.96.3.52
    DBG2: 01:11:54.914191 - Mount volume: /usr/bin/sudo mount -o vers=3,proto=tcp,sec=krb5,hard,rsize=4096,wsize=4096 fedora-rawhide:/export/ext4 /mnt/t_v30
    DBG1: 01:11:56.520798 - Get routing info: /usr/sbin/ip route get 100.96.3.52
    DBG2: 01:11:56.524484 - Mount volume: /usr/bin/sudo mount -o vers=4.1,proto=tcp,sec=krb5,hard,rsize=4096,wsize=4096 fedora-rawhide:/export/ext4 /mnt/t_v41
    DBG1: 01:11:57.991883 - Opening file /mnt/t/mora_krb5/nfstest_interop_20220506_010905_f_001 using NFSv4.1 so owner sticks around
    DBG1: 01:11:58.084534 - Create empty file /mnt/t/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv4.1
    DBG1: 01:11:58.179785 - Append data to /mnt/t_v30/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv3
    DBG2: 01:11:58.318271 -     Written data: 'AAAAAAAAAA'
    DBG1: 01:11:58.367523 - Append data to /mnt/t_v41/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv4.1
    DBG2: 01:11:58.414309 -     Written data: 'CCCCCCCCCC'
    DBG1: 01:11:58.463041 - Read contents of /mnt/t/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv4.1
    PASS: Read data using NFSv4.1 should be correct
    DBG1: 01:11:58.603382 - Append data to /mnt/t_v30/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv3
    DBG2: 01:11:58.742179 -     Written data: 'AAAAAAAAAA'
    DBG1: 01:11:58.790459 - Append data to /mnt/t_v41/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv4.1
    DBG2: 01:11:58.883116 -     Written data: 'CCCCCCCCCC'
    DBG1: 01:11:58.978174 - Read contents of /mnt/t/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv4.1
    FAIL: Read data using NFSv4.1 should be correct
    DBG2: 01:11:59.071618 - Expected data: b'AAAAAAAAAACCCCCCCCCCAAAAAAAAAACCCCCCCCCC'
    DBG2: 01:11:59.071718 - Read data:     b'AAAAAAAAAACCCCCCCCCC\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00CCCCCCCCCC'
    DBG2: 01:11:59.119604 - Unmount volume: /usr/bin/sudo umount -f /mnt/t_v41
    DBG2: 01:11:59.174266 - Unmount volume: /usr/bin/sudo umount -f /mnt/t_v30
    DBG2: 01:11:59.525602 - Unmount volume: /usr/bin/sudo umount -f /mnt/t
    DBG2: 01:11:59.722718 - Trace stop
    DBG2: 01:12:01.725135 - Get all processes: ps -ef
    DBG2: 01:12:01.737847 - Stopping packet trace capture: /usr/bin/sudo /usr/bin/kill -SIGINT 1854611
    DBG2: 01:12:01.786371 - Stopping packet trace capture: /usr/bin/sudo /usr/bin/kill -SIGINT 1854609
    DBG2: 01:12:01.933150 - Capture relative NFS stats: /usr/sbin/nfsstat -l -S /tmp/nfsstat_e549vfk3 > /home/mora/logs/fedora-rawhide/krb5/nfstest_interop_20220506_010905_018.stat
    DBG1: 01:12:01.937041 - trace_open [/home/mora/logs/fedora-rawhide/krb5/nfstest_interop_20220506_010905_018.cap]
    WARN: NFS   v4.1 reply xid:0xd45e252f SEQUENCE;PUTFH;GETATTR;CLOSE stid: NFS4ERR_OLD_STATEID
    TIME: 10.208452s

The network trace is attached.

The test in general opens with one protocol, write a byte range, then opens and writes/appends with a different protocol. The problem comes that for some reason the last append the client writes instead of the range then whole file size (0-40bytes) and in its cache it has the wrong bytes.

Attaching also another run that hit the problem with more instances of failure
the tgz file contains the log and 6 network traces.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Olga Kornieskaia 2022-07-14 19:32:03 UTC
Created attachment 1897216 [details]
tar with nfstest_interop log and 6 network trace files

Comment 2 Scott Mayhew 2022-07-26 17:19:57 UTC
Any info on what kernel the NFS server was running?  I ran the test 100x in a loop, and another 100x with an additional 15ms latency added to the connection (since that would simulate conditions during the virtual BAT) but didn't see any failures.  Also the bug title says v4.2 krb5i, but in the description and the tarball it looks like the failure occurred with v4.0 and v4.1 krb5.

Comment 3 Olga Kornieskaia 2022-07-26 17:49:02 UTC
(In reply to Scott Mayhew from comment #2)
> Any info on what kernel the NFS server was running?  I ran the test 100x in
> a loop, and another 100x with an additional 15ms latency added to the
> connection (since that would simulate conditions during the virtual BAT) but
> didn't see any failures.  Also the bug title says v4.2 krb5i, but in the
> description and the tarball it looks like the failure occurred with v4.0 and
> v4.1 krb5.

Jorge hit it (once) against fedora-rawhide kernel-5.18.0-0.rc2.23.fc37.x86_64 (ext4).

I hit it (once) against 5.16-rc kernel (unfortunately the exact rc I can't be sure of). 

No reproductions since and we've tried.

Apologizes the title is wrong, it should probably say 4.1+. This is an nfstest_interop test so it mixes v3, v4 and v4.1 (appends take turns on which protocol an append happens). I believe the "failing" append is always on 4.1. I think 4.1 or 4.2 doesn't matter here.

Comment 4 Scott Mayhew 2022-07-27 16:35:08 UTC
I tested with 5.14.0-70.13.1.el9_0.x86_64 on the client against the following kernel versions on the NFS server:
4.18.0-411.el8.x86_64
5.14.0-130.el9.x86_64
5.19.0-0.rc7.20220722git68e77ffbfd06.56.fc37.x86_64

No luck in reproducing.

Looking at the pcap files in the tarball, it looks like it's the v3 client that is screwing up.  It's not revalidating the inode on the 2nd open, so it doesn't have the correct file size and the appending write has the wrong offset.

Comment 5 Olga Kornieskaia 2022-07-27 17:14:35 UTC
(In reply to Scott Mayhew from comment #4)
> I tested with 5.14.0-70.13.1.el9_0.x86_64 on the client against the
> following kernel versions on the NFS server:
> 4.18.0-411.el8.x86_64
> 5.14.0-130.el9.x86_64
> 5.19.0-0.rc7.20220722git68e77ffbfd06.56.fc37.x86_64
> 
> No luck in reproducing.
> 
> Looking at the pcap files in the tarball, it looks like it's the v3 client
> that is screwing up.  It's not revalidating the inode on the 2nd open, so it
> doesn't have the correct file size and the appending write has the wrong
> offset.

I don't believe that's correct. the problem is on the last append. Instead of appending the correct offset (with offset+len), it does 0-length of the file (in 1st network trace file, that append is in frame 367, the open before that in frame 365 has the file size at 30, the client knows the correct file size). I believe it's never v3 that's the last append but v4.1.

Comment 6 Jeff Layton 2022-07-27 17:27:19 UTC
Looking at test 012 in the tarball:

    DBG1: 10:18:35.830119 - Create empty file /mnt/t/nfstest_interop_20220627_101748_f_013 using NFSv4.1
    DBG1: 10:18:35.832244 - Append data to /mnt/t_v30/nfstest_interop_20220627_101748_f_013 using NFSv3
    DBG2: 10:18:35.833515 -     Written data: 'AAAAAAAAAA'                                          
    DBG1: 10:18:35.835148 - Append data to /mnt/t_v40/nfstest_interop_20220627_101748_f_013 using NFSv4
    DBG2: 10:18:35.836047 -     Written data: 'BBBBBBBBBB'                                          
    DBG1: 10:18:35.837187 - Read contents of /mnt/t/nfstest_interop_20220627_101748_f_013 using NFSv4.1
    PASS: Read data using NFSv4.1 should be correct                                                 
    DBG1: 10:18:35.838575 - Append data to /mnt/t_v30/nfstest_interop_20220627_101748_f_013 using NFSv3
    DBG2: 10:18:35.839016 -     Written data: 'AAAAAAAAAA'                                          
    DBG1: 10:18:35.839731 - Append data to /mnt/t_v40/nfstest_interop_20220627_101748_f_013 using NFSv4
    DBG2: 10:18:35.842605 -     Written data: 'BBBBBBBBBB'                                          
    DBG1: 10:18:35.843858 - Read contents of /mnt/t/nfstest_interop_20220627_101748_f_013 using NFSv4.1
    FAIL: Read data using NFSv4.1 should be correct                                                 
    DBG2: 10:18:35.845365 - Expected data: b'AAAAAAAAAABBBBBBBBBBAAAAAAAAAABBBBBBBBBB'              
    DBG2: 10:18:35.845548 - Read data:     b'AAAAAAAAAAAAAAAAAAAABBBBBBBBBB'

I'm assuming here that these "Appends" are doing open/pwrite/close (or something similar) and that everything is done serially. Relevant part of the capture:

  373   0.460108 192.168.1.53 → 192.168.1.50 NFS 242 V3 LOOKUP Call, DH: 0xa58289c5/nfstest_interop_20220627_101748_f_013
  375   0.460433 192.168.1.50 → 192.168.1.53 NFS 346 V3 LOOKUP Reply (Call In 373), FH: 0xf045b36e
  377   0.460567 192.168.1.53 → 192.168.1.50 NFS 214 V3 ACCESS Call, FH: 0xf045b36e, [Check: RD MD XT XE XAR XAW XAL]
  378   0.460872 192.168.1.50 → 192.168.1.53 NFS 218 V3 ACCESS Reply (Call In 377), [Allowed: RD MD XT XE XAR XAW XAL]
  380   0.461554 192.168.1.53 → 192.168.1.50 NFS 242 V3 WRITE Call, FH: 0xf045b36e Offset: 0 Len: 10 FILE_SYNC
  381   0.462456 192.168.1.50 → 192.168.1.53 NFS 234 V3 WRITE Reply (Call In 380) Len: 10 FILE_SYNC
  383   0.462804 192.168.1.53 → 192.168.1.50 NFS 346 V4 Call OPEN DH: 0xa58289c5/nfstest_interop_20220627_101748_f_013
  384   0.463057 192.168.1.50 → 192.168.1.53 NFS 446 V4 Reply (Call In 383) OPEN StateID: 0xc446
  386   0.463180 192.168.1.53 → 192.168.1.50 NFS 250 V4 Call OPEN_CONFIRM
  387   0.463416 192.168.1.50 → 192.168.1.53 NFS 166 V4 Reply (Call In 386) OPEN_CONFIRM
  389   0.463744 192.168.1.53 → 192.168.1.50 NFS 290 V4 Call WRITE StateID: 0xcab1 Offset: 10 Len: 10
  390   0.464165 192.168.1.50 → 192.168.1.53 NFS 230 V4 Reply (Call In 389) WRITE
  392   0.464339 192.168.1.53 → 192.168.1.50 NFS 266 V4 Call CLOSE StateID: 0xcab1
  393   0.464558 192.168.1.50 → 192.168.1.53 NFS 238 V4 Reply (Call In 392) CLOSE
  395   0.464829 192.168.1.53 → 192.168.1.50 NFS 346 V4 Call OPEN DH: 0xf045b36e/
  396   0.465071 192.168.1.50 → 192.168.1.53 NFS 438 V4 Reply (Call In 395) OPEN StateID: 0x5db0
  398   0.465219 192.168.1.53 → 192.168.1.50 NFS 294 V4 Call READ StateID: 0x581d Offset: 0 Len: 4096
  399   0.465451 192.168.1.50 → 192.168.1.53 NFS 222 V4 Reply (Call In 398) READ
  401   0.465571 192.168.1.53 → 192.168.1.50 NFS 342 V4 Call CLOSE StateID: 0x5db0
  402   0.465802 192.168.1.50 → 192.168.1.53 NFS 326 V4 Reply (Call In 401) CLOSE
  404   0.466176 192.168.1.53 → 192.168.1.50 NFS 202 V3 ACCESS Call, FH: 0xa58289c5, [Check: RD LU MD XT DL XAR XAW XAL]
  405   0.466400 192.168.1.50 → 192.168.1.53 NFS 218 V3 ACCESS Reply (Call In 404), [Allowed: RD LU MD XT DL XAR XAW XAL]
  407   0.466670 192.168.1.53 → 192.168.1.50 NFS 250 V3 WRITE Call, FH: 0xf045b36e Offset: 0 Len: 20 FILE_SYNC                <<<< PROBLEM HERE
  408   0.467074 192.168.1.50 → 192.168.1.53 NFS 234 V3 WRITE Reply (Call In 407) Len: 20 FILE_SYNC
  410   0.469688 192.168.1.53 → 192.168.1.50 NFS 346 V4 Call OPEN DH: 0xa58289c5/nfstest_interop_20220627_101748_f_013
  411   0.469969 192.168.1.50 → 192.168.1.53 NFS 446 V4 Reply (Call In 410) OPEN StateID: 0xd8fd
  413   0.470292 192.168.1.53 → 192.168.1.50 NFS 290 V4 Call WRITE StateID: 0xd8fd Offset: 20 Len: 10
  414   0.470832 192.168.1.50 → 192.168.1.53 NFS 230 V4 Reply (Call In 413) WRITE
  416   0.470983 192.168.1.53 → 192.168.1.50 NFS 266 V4 Call CLOSE StateID: 0xd8fd
  417   0.471227 192.168.1.50 → 192.168.1.53 NFS 238 V4 Reply (Call In 416) CLOSE
  419   0.471489 192.168.1.53 → 192.168.1.50 NFS 346 V4 Call OPEN DH: 0xf045b36e/
  420   0.471804 192.168.1.50 → 192.168.1.53 NFS 438 V4 Reply (Call In 419) OPEN StateID: 0x64c6
  422   0.471958 192.168.1.53 → 192.168.1.50 NFS 294 V4 Call READ StateID: 0x616b Offset: 0 Len: 4096
  423   0.472215 192.168.1.50 → 192.168.1.53 NFS 234 V4 Reply (Call In 422) READ
  425   0.472356 192.168.1.53 → 192.168.1.50 NFS 342 V4 Call CLOSE StateID: 0x64c6
  426   0.472621 192.168.1.50 → 192.168.1.53 NFS 326 V4 Reply (Call In 425) CLOSE
  428   0.473166 192.168.1.53 → 192.168.1.50 NFS 326 V4 Call CLOSE StateID: 0x33e7
  429   0.473436 192.168.1.50 → 192.168.1.53 NFS 250 V4 Reply (Call In 428) CLOSE

The v3 client in this case should have seen the size change from the v4.0 client's WRITE. It didn't because it didn't issue any GETATTRs between the close and open. It should have done 2 of them for CTO purposes (AFAIU).

My guess is that the refcount on the nfs_open_context did not go to zero here for some reason, so all of the CTO stuff never got triggered.

Comment 7 Jeff Layton 2022-07-27 17:41:04 UTC
Looking more closely at the client code, I suspect that the client ended up with the put_nfs_open_context_sync in nfs_file_clear_open_context occurring on a reference that wasn't the last one. If the final reference gets put by put_nfs_open instead, then the CTO work never happens.

What we probably need to do is ensure that when the "sync" variant gets called that it flags the open context in such a way that the last refcount put results in all of the open_context stuff occurring.

Comment 8 Jeff Layton 2022-07-27 17:42:15 UTC
Make that "If the final reference gets put by put_nfs_open_context..."

Comment 9 Jeff Layton 2022-07-27 18:09:59 UTC
Created attachment 1899765 [details]
PATCH: nfs: CTO can be skipped, depending on order of reference puts

Proposed patch. I wonder if this might prevent the issue. OTOH, I'm not sure we need the is_sync argument at all with this flag. It may also be possible to just ignore the is_sync variable in this function instead.

Comment 10 Olga Kornieskaia 2022-07-27 18:18:11 UTC
So there might be 2+ different problems. The one you picked has the wrong size in v3 but that's different from the what's happening in the 4.1 case.

Comment 11 Jeff Layton 2022-07-27 18:44:44 UTC
Looking at the original problem description, it looks like the v4.1 client saw the size change due to the v3 client's writes, but it didn't drop the cache and satisfied the read out of it:

    DBG1: 01:11:58.084534 - Create empty file /mnt/t/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv4.1
    DBG1: 01:11:58.179785 - Append data to /mnt/t_v30/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv3
    DBG2: 01:11:58.318271 -     Written data: 'AAAAAAAAAA'
    DBG1: 01:11:58.367523 - Append data to /mnt/t_v41/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv4.1
    DBG2: 01:11:58.414309 -     Written data: 'CCCCCCCCCC'
    DBG1: 01:11:58.463041 - Read contents of /mnt/t/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv4.1
    PASS: Read data using NFSv4.1 should be correct
    DBG1: 01:11:58.603382 - Append data to /mnt/t_v30/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv3
    DBG2: 01:11:58.742179 -     Written data: 'AAAAAAAAAA'
    DBG1: 01:11:58.790459 - Append data to /mnt/t_v41/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv4.1
    DBG2: 01:11:58.883116 -     Written data: 'CCCCCCCCCC'
    DBG1: 01:11:58.978174 - Read contents of /mnt/t/mora_krb5/nfstest_interop_20220506_010905_f_019 using NFSv4.1
    FAIL: Read data using NFSv4.1 should be correct
    DBG2: 01:11:59.071618 - Expected data: b'AAAAAAAAAACCCCCCCCCCAAAAAAAAAACCCCCCCCCC'
    DBG2: 01:11:59.071718 - Read data:     b'AAAAAAAAAACCCCCCCCCC\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00CCCCCCCCCC'


Here's the relevant part of the capture:

  298   5.645866 100.97.13.12 → 100.96.3.52  NFS 252 V4 Call ACCESS FH: 0x3e88b868, [Check: RD LU MD XT DL]
  301   5.691776  100.96.3.52 → 100.97.13.12 NFS 252 V4 Reply (Call In 298) ACCESS, [Allowed: RD LU MD XT DL]
  303   5.691845 100.97.13.12 → 100.96.3.52  NFS 268 V4 Call LOOKUP DH: 0x3e88b868/mora_krb5
  304   5.738008  100.96.3.52 → 100.97.13.12 NFS 396 V4 Reply (Call In 303) LOOKUP
  306   5.738460 100.97.13.12 → 100.96.3.52  NFS 260 V4 Call ACCESS FH: 0x202b033c, [Check: RD LU MD XT DL]
  307   5.784630  100.96.3.52 → 100.97.13.12 NFS 252 V4 Reply (Call In 306) ACCESS, [Allowed: RD LU MD XT DL]
  309   5.784706 100.97.13.12 → 100.96.3.52  NFS 368 V4 Call OPEN DH: 0x202b033c/nfstest_interop_20220506_010905_f_001
  310   5.830759  100.96.3.52 → 100.97.13.12 NFS 500 V4 Reply (Call In 309) OPEN StateID: 0xfd45
  311   5.831071 100.97.13.12 → 100.96.3.52  NFS 392 V4 Call OPEN DH: 0x202b033c/nfstest_interop_20220506_010905_f_019
  312   5.879480  100.96.3.52 → 100.97.13.12 NFS 468 V4 Reply (Call In 311) OPEN StateID: 0xc433
  313   5.879776 100.97.13.12 → 100.96.3.52  NFS 316 V4 Call CLOSE StateID: 0xc433
  314   5.925966  100.96.3.52 → 100.97.13.12 NFS 300 V4 Reply (Call In 313) CLOSE
  315   5.926252 100.97.13.12 → 100.96.3.52  NFS 188 V3 ACCESS Call, FH: 0x202b033c, [Check: RD LU MD XT DL XAR XAW XAL]
  317   5.972350  100.96.3.52 → 100.97.13.12 NFS 204 V3 ACCESS Reply (Call In 315), [Allowed: RD LU MD XT DL XAR XAW XAL]
  319   5.972501 100.97.13.12 → 100.96.3.52  NFS 228 V3 LOOKUP Call, DH: 0x202b033c/nfstest_interop_20220506_010905_f_019
  320   6.018289  100.96.3.52 → 100.97.13.12 NFS 320 V3 LOOKUP Reply (Call In 319), FH: 0x8da8453d
  322   6.018578 100.97.13.12 → 100.96.3.52  NFS 188 V3 ACCESS Call, FH: 0x8da8453d, [Check: RD MD XT XE XAR XAW XAL]
  323   6.064482  100.96.3.52 → 100.97.13.12 NFS 204 V3 ACCESS Reply (Call In 322), [Allowed: RD MD XT XE XAR XAW XAL]
  324   6.064752 100.97.13.12 → 100.96.3.52  NFS 216 V3 WRITE Call, FH: 0x8da8453d Offset: 0 Len: 10 FILE_SYNC
  325   6.113721  100.96.3.52 → 100.97.13.12 NFS 220 V3 WRITE Reply (Call In 324) Len: 10 FILE_SYNC
  326   6.114007 100.97.13.12 → 100.96.3.52  NFS 320 V4 Call OPEN DH: 0x8da8453d/
  328   6.160498  100.96.3.52 → 100.97.13.12 NFS 424 V4 Reply (Call In 326) OPEN StateID: 0x9312
  330   6.160783 100.97.13.12 → 100.96.3.52  NFS 300 V4 Call WRITE StateID: 0x96bf Offset: 10 Len: 10
  331   6.208892  100.96.3.52 → 100.97.13.12 NFS 260 V4 Reply (Call In 330) WRITE
  333   6.209341 100.97.13.12 → 100.96.3.52  NFS 316 V4 Call CLOSE StateID: 0x9312
  334   6.209773 100.97.13.12 → 100.96.3.52  NFS 320 V4 Call OPEN DH: 0x8da8453d/
  336   6.255602  100.96.3.52 → 100.97.13.12 NFS 700 V4 Reply (Call In 334) OPEN StateID: 0x9de5  ; V4 Reply (Call In 333) CLOSE Status: NFS4ERR_OLD_STATEID
  337   6.255991 100.97.13.12 → 100.96.3.52  NFS 308 V4 Call OPEN_DOWNGRADE
  338   6.256148 100.97.13.12 → 100.96.3.52  NFS 268 V4 Call READ StateID: 0xafc9 Offset: 0 Len: 4096
  340   6.302881  100.96.3.52 → 100.97.13.12 NFS 208 V4 Reply (Call In 338) READ
  341   6.303032  100.96.3.52 → 100.97.13.12 NFS 236 V4 Reply (Call In 337) OPEN_DOWNGRADE
  343   6.303244 100.97.13.12 → 100.96.3.52  NFS 300 V4 Call CLOSE StateID: 0x9848
  344   6.349476  100.96.3.52 → 100.97.13.12 NFS 236 V4 Reply (Call In 343) CLOSE
  345   6.349867 100.97.13.12 → 100.96.3.52  NFS 188 V3 ACCESS Call, FH: 0x202b033c, [Check: RD LU MD XT DL XAR XAW XAL]
  347   6.395946  100.96.3.52 → 100.97.13.12 NFS 204 V3 ACCESS Reply (Call In 345), [Allowed: RD LU MD XT DL XAR XAW XAL]
  349   6.396088 100.97.13.12 → 100.96.3.52  NFS 184 V3 GETATTR Call, FH: 0x8da8453d
  350   6.442136  100.96.3.52 → 100.97.13.12 NFS 196 V3 GETATTR Reply (Call In 349)  Regular File mode: 0755 uid: 65534 gid: 65534
  352   6.442284 100.97.13.12 → 100.96.3.52  NFS 188 V3 ACCESS Call, FH: 0x8da8453d, [Check: RD MD XT XE XAR XAW XAL]
  353   6.488373  100.96.3.52 → 100.97.13.12 NFS 204 V3 ACCESS Reply (Call In 352), [Allowed: RD MD XT XE XAR XAW XAL]
  355   6.488679 100.97.13.12 → 100.96.3.52  NFS 216 V3 WRITE Call, FH: 0x8da8453d Offset: 20 Len: 10 FILE_SYNC
  357   6.536729  100.96.3.52 → 100.97.13.12 NFS 220 V3 WRITE Reply (Call In 355) Len: 10 FILE_SYNC                     <<<< Why was a v3 write allowed to proceed before the CB_RECALL + DELEGRETURN? I also don't see CB_RECALL in capture at all.
  358   6.536937 100.97.13.12 → 100.96.3.52  NFS 312 V4 Call DELEGRETURN StateID: 0xaa64
  362   6.583108  100.96.3.52 → 100.97.13.12 NFS 296 V4 Reply (Call In 358) DELEGRETURN                                 <<<< DELEGRETURN reply has change attr 7094486757652095996
  364   6.583217 100.97.13.12 → 100.96.3.52  NFS 320 V4 Call OPEN DH: 0x8da8453d/
  365   6.629329  100.96.3.52 → 100.97.13.12 NFS 424 V4 Reply (Call In 364) OPEN StateID: 0xb6df                        <<<< OPEN reply has change attr 7094486757652095996 (same as above)
  367   6.629583 100.97.13.12 → 100.96.3.52  NFS 328 V4 Call WRITE StateID: 0xb372 Offset: 0 Len: 40
  368   6.678163  100.96.3.52 → 100.97.13.12 NFS 260 V4 Reply (Call In 367) WRITE
  370   6.678284 100.97.13.12 → 100.96.3.52  NFS 316 V4 Call CLOSE StateID: 0xb6df
  371   6.724375  100.96.3.52 → 100.97.13.12 NFS 308 V4 Reply (Call In 370) CLOSE
  372   6.724702 100.97.13.12 → 100.96.3.52  NFS 320 V4 Call OPEN DH: 0x8da8453d/
  373   6.771095  100.96.3.52 → 100.97.13.12 NFS 424 V4 Reply (Call In 372) OPEN StateID: 0x0426
  374   6.771356 100.97.13.12 → 100.96.3.52  NFS 316 V4 Call CLOSE StateID: 0x0426
  375   6.817702  100.96.3.52 → 100.97.13.12 NFS 300 V4 Reply (Call In 374) CLOSE
  376   6.818148 100.97.13.12 → 100.96.3.52  NFS 300 V4 Call CLOSE StateID: 0xfd45
  377   6.864508  100.96.3.52 → 100.97.13.12 NFS 236 V4 Reply (Call In 376) CLOSE
  407   7.266009 100.97.13.12 → 100.96.3.52  NFS 152 V3 NULL Call
  408   7.266029 100.97.13.12 → 100.96.3.52  NFS 152 V3 NULL Call
  413   7.311831  100.96.3.52 → 100.97.13.12 NFS 164 V3 NULL Reply (Call In 408)  ; V3 NULL Reply (Call In 407)
  417   7.317125 100.97.13.12 → 100.96.3.52  NFS 312 V4 Call DELEGRETURN StateID: 0xd888
  418   7.363154  100.96.3.52 → 100.97.13.12 NFS 284 V4 Reply (Call In 417) DELEGRETURN
  420   7.369035 100.97.13.12 → 100.96.3.52  NFS 152 V4 NULL Call
  421   7.369102 100.97.13.12 → 100.96.3.52  NFS 152 V4 NULL Call
  422   7.369241 100.97.13.12 → 100.96.3.52  NFS 224 V4 Call DESTROY_SESSION
  424   7.415354  100.96.3.52 → 100.97.13.12 NFS 108 V4 NULL Reply (Call In 421)
  426   7.415412  100.96.3.52 → 100.97.13.12 NFS 168 V4 Reply (Call In 422) DESTROY_SESSION
  428   7.415685 100.97.13.12 → 100.96.3.52  NFS 216 V4 Call DESTROY_CLIENTID
  429   7.415803  100.96.3.52 → 100.97.13.12 NFS 108 V4 NULL Reply (Call In 420)
  431   7.466029  100.96.3.52 → 100.97.13.12 NFS 168 V4 Reply (Call In 428) DESTROY_CLIENTID
  433   7.466539 100.97.13.12 → 100.96.3.52  NFS 152 V4 NULL Call
  434   7.512410  100.96.3.52 → 100.97.13.12 NFS 108 V4 NULL Reply (Call In 433)

This one looks more like a server bug to me. The server allowed a synchronous v3 write to occur while there was a read delegation outstanding to the v4.1 client. I also don't see whether it ever got recalled. The v4.1 client assumes that when the DELEGRETURN comes back with a particular change attribute that its cache doesn't need to be invalidated on a subsequent open that returns with the same change attr.

Comment 12 Jeff Layton 2022-07-28 14:44:27 UTC
Do we happen to know what kernel the server was running in the originally reported test? I know the client was RHEL 9.0GA.  Was the server the same?

Comment 13 Olga Kornieskaia 2022-07-28 15:01:21 UTC
(In reply to Jeff Layton from comment #12)
> Do we happen to know what kernel the server was running in the originally
> reported test? I know the client was RHEL 9.0GA.  Was the server the same?

Jorge hit it (once) against fedora-rawhide kernel-5.18.0-0.rc2.23.fc37.x86_64 (ext4).

I hit it (once) against 5.16-rc kernel (unfortunately the exact rc I can't be sure of). 

No reproductions since and we've tried.

Comment 14 Scott Mayhew 2022-07-28 15:22:46 UTC
Created attachment 1899997 [details]
test18 failure

I finally managed to get two failures.  I was trying to trigger out of order reference puts using systemtap to trigger the scenario Jeff's patch would address:

# stap -v -g -e 'probe module("nfs").function("put_nfs_open_context") { mdelay(500); printf("."); } probe module("nfs").function("put_nfs_open_context_sync") { printf("^") } probe module("nfs").function("nfs_close_context") { printf("o") }'

But the test failure that was reproduced matches the scenario in the bug description, rather than the failures in the tarball (/me scratches head):

*** Verify appending data with NFSv3 and NFSv4.1 is correctly read using NFSv4.1
    TEST: Running test 'test18'
    DBG2: 11:00:01.296594 - Capture reference NFS stats: /usr/sbin/nfsstat > /tmp/nfsstat_4d40nbvo
    DBG2: 11:00:01.299087 - Trace start: /usr/sbin/tcpdump -i enp1s0 -n -B 196608 -s 0 -w /root/logs/nfstest_interop_20220728_105716_018.cap host 192.168.122.241
    DBG2: 11:00:02.350565 - Mount volume: mount -o vers=4.1,proto=tcp,sec=krb5i,hard,rsize=4096,wsize=4096 rawhide.smayhew.test:/export /mnt/t
    DBG1: 11:00:02.531629 - Get routing info: /usr/sbin/ip route get 192.168.122.107
    DBG2: 11:00:02.545857 - Mount volume: mount -o vers=3,proto=tcp,sec=krb5i,hard,rsize=4096,wsize=4096 rawhide.smayhew.test:/export /mnt/t_v30
    DBG1: 11:00:02.600678 - Get routing info: /usr/sbin/ip route get 192.168.122.107
    DBG2: 11:00:02.608099 - Mount volume: mount -o vers=4.1,proto=tcp,sec=krb5i,hard,rsize=4096,wsize=4096 rawhide.smayhew.test:/export /mnt/t_v41
    DBG1: 11:00:02.642228 - Opening file /mnt/t/nfstest_interop_20220728_105716_f_001 using NFSv4.1 so owner sticks around
    DBG1: 11:00:03.143697 - Create empty file /mnt/t/nfstest_interop_20220728_105716_f_019 using NFSv4.1
    DBG1: 11:00:03.662452 - Append data to /mnt/t_v30/nfstest_interop_20220728_105716_f_019 using NFSv3
    DBG2: 11:00:04.163635 -     Written data: 'AAAAAAAAAA'
    DBG1: 11:00:04.184253 - Append data to /mnt/t_v41/nfstest_interop_20220728_105716_f_019 using NFSv4.1
    DBG2: 11:00:05.684918 -     Written data: 'CCCCCCCCCC'
    DBG1: 11:00:05.689175 - Read contents of /mnt/t/nfstest_interop_20220728_105716_f_019 using NFSv4.1
    PASS: Read data using NFSv4.1 should be correct
    DBG1: 11:00:06.690554 - Append data to /mnt/t_v30/nfstest_interop_20220728_105716_f_019 using NFSv3
    DBG2: 11:00:07.691797 -     Written data: 'AAAAAAAAAA'
    DBG1: 11:00:07.709498 - Append data to /mnt/t_v41/nfstest_interop_20220728_105716_f_019 using NFSv4.1
    DBG2: 11:00:08.210588 -     Written data: 'CCCCCCCCCC'
    DBG1: 11:00:08.709650 - Read contents of /mnt/t/nfstest_interop_20220728_105716_f_019 using NFSv4.1
    FAIL: Read data using NFSv4.1 should be correct
    DBG2: 11:00:09.210992 - Expected data: b'AAAAAAAAAACCCCCCCCCCAAAAAAAAAACCCCCCCCCC'
    DBG2: 11:00:09.211084 - Read data:     b'AAAAAAAAAACCCCCCCCCC\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00CCCCCCCCCC'
    DBG2: 11:00:09.229629 - Unmount volume: umount -f /mnt/t_v41
    DBG2: 11:00:09.239553 - Unmount volume: umount -f /mnt/t_v30
    DBG2: 11:00:09.285848 - Unmount volume: umount -f /mnt/t
    DBG2: 11:00:09.310211 - Trace stop
    DBG2: 11:00:11.311337 - Stopping packet trace capture: stopping process 27093
    DBG2: 11:00:11.342328 - Capture relative NFS stats: /usr/sbin/nfsstat -l -S /tmp/nfsstat_4d40nbvo > /root/logs/nfstest_interop_20220728_105716_018.stat
    DBG1: 11:00:11.353277 - trace_open [/root/logs/nfstest_interop_20220728_105716_018.cap]
    TIME: 10.135189s

Comment 15 Scott Mayhew 2022-07-28 15:24:26 UTC
Created attachment 1899998 [details]
test21 failure

Here's the second failure.  This one definitely looks delegation-related.  Notice in the pcap that there's no on-the-wire read from the v4.1 client after the v3 client does its last append.

*** Verify appending data with NFSv4.1 and NFSv3 is correctly read using NFSv4.1
    TEST: Running test 'test21'
    DBG2: 11:00:34.163118 - Capture reference NFS stats: /usr/sbin/nfsstat > /tmp/nfsstat_mkj__vdr
    DBG2: 11:00:34.165626 - Trace start: /usr/sbin/tcpdump -i enp1s0 -n -B 196608 -s 0 -w /root/logs/nfstest_interop_20220728_105716_021.cap host 192.168.122.241
    DBG2: 11:00:35.196822 - Mount volume: mount -o vers=4.1,proto=tcp,sec=krb5i,hard,rsize=4096,wsize=4096 rawhide.smayhew.test:/export /mnt/t
    DBG1: 11:00:35.369719 - Get routing info: /usr/sbin/ip route get 192.168.122.107
    DBG2: 11:00:35.381106 - Mount volume: mount -o vers=4.1,proto=tcp,sec=krb5i,hard,rsize=4096,wsize=4096 rawhide.smayhew.test:/export /mnt/t_v41
    DBG1: 11:00:35.437433 - Get routing info: /usr/sbin/ip route get 192.168.122.107
    DBG2: 11:00:35.448081 - Mount volume: mount -o vers=3,proto=tcp,sec=krb5i,hard,rsize=4096,wsize=4096 rawhide.smayhew.test:/export /mnt/t_v30
    DBG1: 11:00:35.495767 - Opening file /mnt/t/nfstest_interop_20220728_105716_f_001 using NFSv4.1 so owner sticks around
    DBG1: 11:00:35.997079 - Create empty file /mnt/t/nfstest_interop_20220728_105716_f_022 using NFSv4.1
    DBG1: 11:00:36.516998 - Append data to /mnt/t_v41/nfstest_interop_20220728_105716_f_022 using NFSv4.1
    DBG2: 11:00:37.017700 -     Written data: 'CCCCCCCCCC'
    DBG1: 11:00:37.038584 - Append data to /mnt/t_v30/nfstest_interop_20220728_105716_f_022 using NFSv3
    DBG2: 11:00:37.539595 -     Written data: 'AAAAAAAAAA'
    DBG1: 11:00:38.039020 - Read contents of /mnt/t/nfstest_interop_20220728_105716_f_022 using NFSv4.1
    PASS: Read data using NFSv4.1 should be correct
    DBG1: 11:00:39.040663 - Append data to /mnt/t_v41/nfstest_interop_20220728_105716_f_022 using NFSv4.1
    DBG2: 11:00:40.041033 -     Written data: 'CCCCCCCCCC'
    DBG1: 11:00:40.058683 - Append data to /mnt/t_v30/nfstest_interop_20220728_105716_f_022 using NFSv3
    DBG2: 11:00:41.560117 -     Written data: 'AAAAAAAAAA'
    DBG1: 11:00:41.577173 - Read contents of /mnt/t/nfstest_interop_20220728_105716_f_022 using NFSv4.1
    FAIL: Read data using NFSv4.1 should be correct
    DBG2: 11:00:42.078413 - Expected data: b'CCCCCCCCCCAAAAAAAAAACCCCCCCCCCAAAAAAAAAA'
    DBG2: 11:00:42.078482 - Read data:     b'CCCCCCCCCCAAAAAAAAAACCCCCCCCCC'
    DBG2: 11:00:42.098046 - Unmount volume: umount -f /mnt/t_v30
    DBG2: 11:00:42.616666 - Unmount volume: umount -f /mnt/t_v41
    DBG2: 11:00:42.625642 - Unmount volume: umount -f /mnt/t
    DBG2: 11:00:42.652847 - Trace stop
    DBG2: 11:00:44.655345 - Stopping packet trace capture: stopping process 27463
    DBG2: 11:00:44.682049 - Capture relative NFS stats: /usr/sbin/nfsstat -l -S /tmp/nfsstat_mkj__vdr > /root/logs/nfstest_interop_20220728_105716_021.stat
    DBG1: 11:00:44.687361 - trace_open [/root/logs/nfstest_interop_20220728_105716_021.cap]
    TIME: 10.578718s

Comment 16 Jeff Layton 2022-07-28 15:28:18 UTC
Olga, any objection to making this bug public? I'd like to be able to refer to it in upstream discussion.

Comment 17 Olga Kornieskaia 2022-07-28 15:40:49 UTC
(In reply to Jeff Layton from comment #16)
> Olga, any objection to making this bug public? I'd like to be able to refer
> to it in upstream discussion.

No objections at all.

Comment 18 Jeff Layton 2022-07-28 16:43:52 UTC
Both the test18 and test21 failures that scott posted seem to be due to the same server bug. The server allowed the v3 client to write to the file without recalling the delegation first.

Comment 19 Jeff Layton 2022-07-29 11:53:01 UTC
I spent some time looking at this yesterday. First, krb5i seems to be a red-herring. We can reproduce the same problem with AUTH_SYS. The server bug is a little tricky to reproduce. It's more reliably reproducible when using a stap script to delay close calls on the client.

With recent mainline kernel on the server (5.19.0-0.rc8.20220727git39c3c396f813.60.fc37.x86_64), I can reproduce the problem pretty readily (no delegrecall before letting the v3 WRITE through). With a kernel based on Chuck's for-next branch, I can't. So I suspect this is something that has already been fixed there (maybe Chuck's filecache overhaul?).

Comment 20 Jeff Layton 2022-07-29 12:31:36 UTC
Looks like this patch is what fixes it:

commit 4ba22226f3beae8f9d4764baee6e5e3cf6878659
Author: Chuck Lever <chuck.lever>
Date:   Fri Jul 8 14:27:02 2022 -0400

    NFSD: NFSv4 CLOSE should release an nfsd_file immediately
    
    The last close of a file should enable other accessors to open and
    use that file immediately. Leaving the file open in the filecache
    prevents other users from accessing that file until the filecache
    garbage-collects the file -- sometimes that takes several seconds.

I still don't quite understand the mechanism but when I revert that patch, the bug resurfaces.

Comment 21 Jeff Layton 2022-07-29 13:58:24 UTC
I suspect that patch is just papering over the real bug.

I've reverted it for now, and added a tracepoint in nfsd_open_break_lease. Here's the v3 write portion of the trace, along with the voluntary DELEGRETURN the v4.1 client does before writing:

            nfsd-1035    [003] .....   154.909908: nfsd_write_start: xid=0x5972c9ce fh_hash=0x44095ceb offset=20 len=10
            nfsd-1035    [003] .....   154.910004: nfsd_fh_verify: xid=0x5972c9ce fh_hash=0x44095ceb inode=00000000a9fd0d31 type=REG access=WRITE|OWNER_OVERRIDE
            nfsd-1035    [003] .....   154.910349: nfsd_file_lru_del: inode=00000000a9fd0d31 ref=2 nf_flags=HASHED|REFERENCED nf_file=0000000085bc2447
            nfsd-1035    [003] .....   154.910354: nfsd_file_acquire: xid=0x5972c9ce inode=00000000a9fd0d31 may_flags=WRITE ref=2 nf_flags=HASHED|REFERENCED nf_may=WRITE nf_file=0000000085bc2447 status=0

^^^ Shouldn't the above have NFSD_FILE_BREAK_WRITE in nf_flags?

            nfsd-1035    [003] .....   154.910358: nfsd_write_opened: xid=0x5972c9ce fh_hash=0x44095ceb offset=20 len=10
            nfsd-1035    [003] .....   154.938751: nfsd_write_io_done: xid=0x5972c9ce fh_hash=0x44095ceb offset=20 len=10
            nfsd-1035    [003] .....   154.938762: nfsd_file_lru_add: inode=00000000a9fd0d31 ref=2 nf_flags=HASHED|REFERENCED nf_file=0000000085bc2447
            nfsd-1035    [003] .....   154.938765: nfsd_file_put: inode=00000000a9fd0d31 ref=2 flags=HASHED|REFERENCED may=WRITE nf_file=0000000085bc2447
            nfsd-1035    [003] .....   154.938769: nfsd_write_done: xid=0x5972c9ce fh_hash=0x44095ceb offset=20 len=10
            nfsd-1035    [003] .....   154.940229: nfsd_compound: xid=0x0745fc68 opcnt=4
            nfsd-1035    [003] .....   154.940290: nfsd_compound_status: op=1/4 OP_SEQUENCE status=0
            nfsd-1035    [003] .....   154.940380: nfsd_fh_verify: xid=0x0745fc68 fh_hash=0x44095ceb inode=00000000a9fd0d31 type=0x0 access=BYPASS_GSS
            nfsd-1035    [003] .....   154.940458: nfsd_compound_status: op=2/4 OP_PUTFH status=0
            nfsd-1035    [003] .....   154.940464: nfsd_fh_verify: xid=0x0745fc68 fh_hash=0x44095ceb inode=00000000a9fd0d31 type=0x0 access=
            nfsd-1035    [003] .....   154.940548: nfsd_compound_status: op=3/4 OP_GETATTR status=0
            nfsd-1035    [003] .....   154.940553: nfsd_fh_verify: xid=0x0745fc68 fh_hash=0x44095ceb inode=00000000a9fd0d31 type=REG access=
            nfsd-1035    [003] .....   154.940725: nfsd_file_put: inode=00000000a9fd0d31 ref=2 flags=HASHED|REFERENCED may=READ nf_file=00000000a1c3bfec
            nfsd-1035    [003] .....   154.940797: nfsd_compound_status: op=4/4 OP_DELEGRETURN status=0
            nfsd-1035    [003] .....   154.941885: nfsd_compound: xid=0x0845fc68 opcnt=5

...note that nfsd_open_break_lease was _not_ called in this codepath. I think it should have been, and that the handling of NFSD_FILE_BREAK_WRITE is probably not correct in this code.

Comment 22 Jeff Layton 2022-08-01 13:45:58 UTC
Patch posted upstream:

    https://lore.kernel.org/linux-nfs/20220729210107.373538-1-jlayton@kernel.org/T/#u

This seems to fix things for me, and should apply to older kernels without too much wiggling.

Comment 23 Olga Kornieskaia 2022-08-08 18:44:54 UTC
Hi folks, 

I see that this bugzilla has been turned into handling the server side issue. However, there is still the other issue (which I think is likely to be client side -- where after the v3 write that client side cache loses that data -- whatever is the description of the bugzilla). How should the 2nd problem be handled? Should I open another bugzilla with the same problem description (ie client side issue)?

Comment 24 Scott Mayhew 2022-08-08 19:16:04 UTC
(In reply to Olga Kornieskaia from comment #23)
> Hi folks, 
> 
> I see that this bugzilla has been turned into handling the server side
> issue. However, there is still the other issue (which I think is likely to
> be client side -- where after the v3 write that client side cache loses that
> data -- whatever is the description of the bugzilla). How should the 2nd
> problem be handled? Should I open another bugzilla with the same problem
> description (ie client side issue)?

If you want to file a bugzilla for the v3 issue, go ahead... but we won't make any progress unless we have a reproducer.  Since you weren't sure what kernel you were testing against when you triggered the issue, I tried testing against each of the 5.16.0-rc Fedora kernels that I could find on koji.fedoraproject.org.  I didn't have any luck.  The only issue I could reproduce was the one that Jeff's patch fixes.

Comment 26 Jeff Layton 2022-08-10 16:32:36 UTC
MR is up for this bug, and I'm hoping to get this in for 9.1 (and maybe a backport to 9.0.z kernels).

Testing this is tricky, as the client can paper over the problem in some cases by invaliding the cache even though it has a read delegation, or by virtue of just not having read in the requisite data yet.

What would be ideal would be to modify nfstests to watch for this situation (particularly in the interop tests): servers that allow writes or other conflicting activity to proceed, while a delegation is outstanding. It would be nice to have noisy warning for that if it occurs.

Comment 30 Zhi Li 2022-08-17 15:31:06 UTC
I have not reproduced it with the steps in comment #1 by trying many times,
so I think it is really a rare scene, also regression test in this MR build shows no new issue.

Comment 31 Jeff Layton 2022-08-17 15:56:55 UTC
Yes. This is difficult to reproduce as it was somewhat dependent on the client delaying certain operations. I was only able to reproduce it using systemtap to slow down put_nfs_open_context:

    stap -v -g -e 'probe module("nfs").function("put_nfs_open_context") { mdelay(200); }'

...and then used nfstest_interop test18 to try and reproduce the situation while sniffing traffic with wireshark. I never reproduced the data corruption, as the client would usually notice via other means that something had changed. I just used wireshark to ensure that the server was properly issuing CB_RECALLs.

Comment 38 errata-xmlrpc 2022-11-15 11:09:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8267


Note You need to log in before you can comment on or make changes to this bug.