Bug 1803162 - [NFS] Dataloss with copy_file_range on NFS-mounted files that is not 4K aligned on RHEL 8.
Summary: [NFS] Dataloss with copy_file_range on NFS-mounted files that is not 4K align...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel
Version: 8.1
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: rc
: 8.2
Assignee: Benjamin Coddington
QA Contact: JianHong Yin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-14 15:20 UTC by David Koppelman
Modified: 2020-04-28 16:39 UTC (History)
13 users (show)

Fixed In Version: kernel-4.18.0-185.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-28 16:39:11 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
C code to reproduce problem. See comments. (1.86 KB, text/x-csrc)
2020-02-14 15:20 UTC, David Koppelman
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4840141 None None None 2020-02-18 19:31:48 UTC
Red Hat Product Errata RHSA-2020:1769 None None None 2020-04-28 16:39:41 UTC

Description David Koppelman 2020-02-14 15:20:32 UTC
Created attachment 1663143 [details]
C code to reproduce problem. See comments.

Description of problem:

Copying a 36368-byte file to an existing file on an NFS-mounted
filesystem using copy_file_range followed by a timestamp update with
futimens results in an incomplete copy. The target file has the
correct length, and the first 32768 bytes are correct but the
remaining bytes are zero. I'm attaching C code (mostly written by Paul
Eggert eggert@cs.ucla.edu) that reproduces the problem.


Version-Release number of selected component (if applicable):

The problem occurs on:
NAME="Red Hat Enterprise Linux"
VERSION="8.1 (Ootpa)"
Linux 4.18.0-147.3.1.el8_1.x86_64 #1 SMP Wed Nov 27 01:11:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

The NFS-mounted share is on a RHEL 7.7 system using the Red Hat
nfs-server code. The system is regularly updated.



How reproducible:

Every time.

Steps to Reproduce:
1. Compile the attached C code. (See comments.)
2. Run in an nfs-mounted directory.
3. 

Actual results:

Program reports outcome 15, meaning that files differ.


Expected results:

Program reports outcome 0, meaning that files are the same.


Additional info:

Using the executable built with the attached file, cfrbug3.c, I was
able to reproduce the problem on two different NFS mounted
filesystems, and was able to show that the problem does not occur on
XFS filesystems. I verified the problem on two rhel8 systems.

The problem does not occur (at least not reliably) if the file copy is
not followed by a timestamp change.

I've also reported this as a support case to Red Hat, Case #02585137.
I'm opening this bug because it is of obvious interest to the
developer community, including the Emacs developers to whom I
originally reported the problem. I first encountered the problem in a
development version of Emacs, including recent commit
de1d150a6ef58760ab0a58dbee84596623d85d14, but the problem occurs in earlier
versions.

Comment 1 David Koppelman 2020-02-14 20:08:20 UTC
I noticed that this bug has been restricted to the redhat group. Was
that intentional? I filed this bug so that the Emacs developers could
follow its progress. Would it be possible to make this bug public
again?

Comment 2 Jacob Shivers 2020-02-15 23:44:57 UTC
This problem is readily reproducible.
The NFS client is doing what appears to be a short read from the NFS server and then filling in the remaining 3600 bytes with zeroes:

# cat setup_nfs.sh 
#! /bin/bash

function setup() {
        logger "start test rhel-$1"
        #rpcdebug -m nfs -s all 
        #rpcdebug -m rpc -s all 
        tcpdump -s0 -n -i $(tcpdump -D | awk 'NR == 1{gsub(/.*\./,"",$1); print $1}') -w /tmp/$(uname -r)-cfr_test-rhel$1.pcap &
}

function action() {
        if [[ $1 == 7 ]]; then
                mount ad-nfs-server.example.net:/test /mnt/test -o vers=4.2,sec=sys
        elif [[ $1 == 8 ]]; then
                mount nfs-server-8.example.net:/test /mnt/test -o vers=4.2,sec=sys
        else
                exit 1
        fi
        strace -fvttTyyx -s 4096 -o /tmp/$(uname -r)-cfr_test-00_man2cfr-rhel$1.strace /usr/local/bin/copy_file_range /mnt/test/test_file /mnt/test/test_file-cfr
        strace -fvttTyyx -s 4096 -o /tmp/$(uname -r)-cfr_test-01_man2cfr-md5sum-rhel$1.strace md5sum /mnt/test/test_file{,-cfr}
        cd /mnt/test
        strace -fvttTyyx -s 4096 -o /tmp/$(uname -r)-cfr_test-02_cfrbug3-rhel$1.strace /usr/local/bin/cfrbug3
        strace -fvttTyyx -s 4096 -o /tmp/$(uname -r)-cfr_test-03_cfrbug3-md5sum-rhel$1.strace md5sum /mnt/test/{src,dst}
}

function cleanup() {
        pkill tcpdump
        #rpcdebug -m rpc -c all 
        #rpcdebug -m nfs -c all 
        logger "stop test rhel-$1"
        journalctl -b0 | sed '/start test/,/stop test/ !d' > /tmp/$(uname -r)-cfr_test-rhel$1.log
        rm -f /mnt/test/{test_file-cfr,src,dst}
        umount /mnt/test
}

case $1 in 
        setup)
                setup $2
                ;;
        action)
                action $2
                ;;
        cleanup)
                cleanup $2
                ;;
        *)
                echo "Usage: $0 {setup|cleanup}"
                exit 1
esac

 ### RHEL 8.0

# uname -r
4.18.0-80.11.2.el8_0.x86_64

# ./setup_nfs.sh setup 8

# ./setup_nfs.sh action 8
f1c9645dbc14efddc7d8a322685f26eb  /mnt/test/test_file
f1c9645dbc14efddc7d8a322685f26eb  /mnt/test/test_file-cfr
Outcome 15
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/src
13e79e367f6157a0123171e861b5af2e  /mnt/test/dst
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/src
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/dst

# ./setup_nfs.sh cleanup 8

$ grep read 4.18.0-80.11.2.el8_0.x86_64-cfr_test-03_cfrbug3-md5sum-rhel8.strace | egrep -o "^.{,150}|.{,30}$" | tail -5
1588  17:44:35.398747 read(3</mnt/test/dst>, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
..., 32768) = 32768 <0.000012>
1588  17:44:35.398901 read(3</mnt/test/dst>, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
x00", 32768) = 3600 <0.000010>
1588  17:44:35.399030 read(3</mnt/test/dst>, "", 24576) = 0 <0.000009>

$ tshark -tad -n -r 4.18.0-80.11.2.el8_0.x86_64-cfr_test-*pcap -Y 'nfs && frame.number in {1001..1037}' -z proto,colinfo,nfs.fh.hash,nfs.fh.hash
 1001 2020-02-15 17:44:35.374094 192.168.124.148 → 192.168.124.138 NFS 426 V4 Call OPEN DH: 0x60df7df3/src  nfs.fh.hash == 0x60df7df3
 1002 2020-02-15 17:44:35.375492 192.168.124.138 → 192.168.124.148 NFS 498 V4 Reply (Call In 1001) OPEN StateID: 0xa949  nfs.fh.hash == 0x11ce1f1f
 1006 2020-02-15 17:44:35.376131 192.168.124.148 → 192.168.124.138 NFS 4814 V4 Call WRITE StateID: 0xace4 Offset: 0 Len: 36368  nfs.fh.hash == 0x11ce1f1f
 1008 2020-02-15 17:44:35.378912 192.168.124.138 → 192.168.124.148 NFS 246 V4 Reply (Call In 1006) WRITE
 1009 2020-02-15 17:44:35.379092 192.168.124.148 → 192.168.124.138 NFS 290 V4 Call CLOSE StateID: 0xa949  nfs.fh.hash == 0x11ce1f1f
 1010 2020-02-15 17:44:35.379286 192.168.124.138 → 192.168.124.148 NFS 246 V4 Reply (Call In 1009) CLOSE
 1011 2020-02-15 17:44:35.379460 192.168.124.148 → 192.168.124.138 NFS 426 V4 Call OPEN DH: 0x60df7df3/dst  nfs.fh.hash == 0x60df7df3
 1012 2020-02-15 17:44:35.380558 192.168.124.138 → 192.168.124.148 NFS 498 V4 Reply (Call In 1011) OPEN StateID: 0x8c84  nfs.fh.hash == 0x44df8d2d
 1017 2020-02-15 17:44:35.381127 192.168.124.148 → 192.168.124.138 NFS 3366 V4 Call WRITE StateID: 0x8929 Offset: 0 Len: 36368  nfs.fh.hash == 0x44df8d2d
 1019 2020-02-15 17:44:35.383729 192.168.124.138 → 192.168.124.148 NFS 246 V4 Reply (Call In 1017) WRITE
 1020 2020-02-15 17:44:35.383831 192.168.124.148 → 192.168.124.138 NFS 290 V4 Call CLOSE StateID: 0x8c84  nfs.fh.hash == 0x44df8d2d
 1021 2020-02-15 17:44:35.384016 192.168.124.138 → 192.168.124.148 NFS 246 V4 Reply (Call In 1020) CLOSE
 1022 2020-02-15 17:44:35.384157 192.168.124.148 → 192.168.124.138 NFS 338 V4 Call OPEN DH: 0x11ce1f1f/  nfs.fh.hash == 0x11ce1f1f
 1023 2020-02-15 17:44:35.384274 192.168.124.138 → 192.168.124.148 NFS 474 V4 Reply (Call In 1022) OPEN StateID: 0x903f
 1024 2020-02-15 17:44:35.384424 192.168.124.148 → 192.168.124.138 NFS 338 V4 Call OPEN DH: 0x44df8d2d/  nfs.fh.hash == 0x44df8d2d
 1025 2020-02-15 17:44:35.384532 192.168.124.138 → 192.168.124.148 NFS 438 V4 Reply (Call In 1024) OPEN StateID: 0x3e7d
 1026 2020-02-15 17:44:35.384728 192.168.124.148 → 192.168.124.138 NFS 378 V4 Call CLONE Src StateID: 0x276b Offset: 0 Len: 36368 Dst StateID: 0x3bd0 Offset: 0  nfs.fh.hash == 0x11ce1f1f  nfs.fh.hash == 0x44df8d2d
 1027 2020-02-15 17:44:35.384927 192.168.124.138 → 192.168.124.148 NFS 246 V4 Reply (Call In 1026) CLONE
 1028 2020-02-15 17:44:35.385092 192.168.124.148 → 192.168.124.138 NFS 338 V4 Call SETATTR FH: 0x44df8d2d  nfs.fh.hash == 0x44df8d2d
 1029 2020-02-15 17:44:35.386627 192.168.124.138 → 192.168.124.148 NFS 386 V4 Reply (Call In 1028) SETATTR
 1030 2020-02-15 17:44:35.386782 192.168.124.148 → 192.168.124.138 NFS 274 V4 Call CLOSE StateID: 0x903f  nfs.fh.hash == 0x11ce1f1f
 1031 2020-02-15 17:44:35.386942 192.168.124.138 → 192.168.124.148 NFS 182 V4 Reply (Call In 1030) CLOSE
 1032 2020-02-15 17:44:35.387078 192.168.124.148 → 192.168.124.138 NFS 290 V4 Call CLOSE StateID: 0x3e7d  nfs.fh.hash == 0x44df8d2d
 1033 2020-02-15 17:44:35.387191 192.168.124.138 → 192.168.124.148 NFS 246 V4 Reply (Call In 1032) CLOSE
 1034 2020-02-15 17:44:35.387318 192.168.124.148 → 192.168.124.138 NFS 338 V4 Call OPEN DH: 0x44df8d2d/  nfs.fh.hash == 0x44df8d2d
 1035 2020-02-15 17:44:35.387457 192.168.124.138 → 192.168.124.148 NFS 438 V4 Reply (Call In 1034) OPEN StateID: 0x1bb0
 1036 2020-02-15 17:44:35.387621 192.168.124.148 → 192.168.124.138 NFS 282 V4 Call READ StateID: 0x1e1d Offset: 0 Len: 32768  nfs.fh.hash == 0x44df8d2d
 1037 2020-02-15 17:44:35.388042 192.168.124.138 → 192.168.124.148 NFS 32942 V4 Reply (Call In 1036) READ

 ### RHEL 8.1

# uname -r
4.18.0-147.el8.x86_64

# ./setup_nfs.sh setup 8

# ./setup_nfs.sh action 8
f1c9645dbc14efddc7d8a322685f26eb  /mnt/test/test_file
f1c9645dbc14efddc7d8a322685f26eb  /mnt/test/test_file-cfr
Outcome 15
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/src
13e79e367f6157a0123171e861b5af2e  /mnt/test/dst
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/src
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/dst

# ./setup_nfs.sh cleanup 8

$ grep read 4.18.0-147.el8.x86_64-cfr_test-03_cfrbug3-md5sum-rhel8.strace | egrep -o "^.{,150}|.{,30}$" | tail -5
1475  23:03:56.050933 read(3</mnt/test/dst>, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
..., 32768) = 32768 <0.000012>
1475  23:03:56.051087 read(3</mnt/test/dst>, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
x00", 32768) = 3600 <0.000030>
1475  23:03:56.051225 read(3</mnt/test/dst>, "", 24576) = 0 <0.000009>

$ tshark -tad -n -r 4.18.0-147.el8.x86_64-cfr_test-rhel8.pcap -Y 'nfs && frame.number in {1007..1046}' -z proto,colinfo,nfs.fh.hash,nfs.fh.hash
 1007 2020-02-15 18:03:56.023583 192.168.124.151 → 192.168.124.138 NFS 426 V4 Call OPEN DH: 0x60df7df3/src  nfs.fh.hash == 0x60df7df3
 1008 2020-02-15 18:03:56.024831 192.168.124.138 → 192.168.124.151 NFS 498 V4 Reply (Call In 1007) OPEN StateID: 0x922b  nfs.fh.hash == 0xfe9f62c8
 1013 2020-02-15 18:03:56.025522 192.168.124.151 → 192.168.124.138 NFS 1918 V4 Call WRITE StateID: 0x9786 Offset: 0 Len: 36368  nfs.fh.hash == 0xfe9f62c8
 1016 2020-02-15 18:03:56.028531 192.168.124.138 → 192.168.124.151 NFS 246 V4 Reply (Call In 1013) WRITE
 1017 2020-02-15 18:03:56.028726 192.168.124.151 → 192.168.124.138 NFS 290 V4 Call CLOSE StateID: 0x922b  nfs.fh.hash == 0xfe9f62c8
 1018 2020-02-15 18:03:56.028917 192.168.124.138 → 192.168.124.151 NFS 246 V4 Reply (Call In 1017) CLOSE
 1019 2020-02-15 18:03:56.029083 192.168.124.151 → 192.168.124.138 NFS 426 V4 Call OPEN DH: 0x60df7df3/dst  nfs.fh.hash == 0x60df7df3
 1020 2020-02-15 18:03:56.029999 192.168.124.138 → 192.168.124.151 NFS 498 V4 Reply (Call In 1019) OPEN StateID: 0xb7e6  nfs.fh.hash == 0x28a9c242
 1025 2020-02-15 18:03:56.030500 192.168.124.151 → 192.168.124.138 NFS 1918 V4 Call WRITE StateID: 0xb24b Offset: 0 Len: 36368  nfs.fh.hash == 0x28a9c242
 1028 2020-02-15 18:03:56.033207 192.168.124.138 → 192.168.124.151 NFS 246 V4 Reply (Call In 1025) WRITE
 1029 2020-02-15 18:03:56.033288 192.168.124.151 → 192.168.124.138 NFS 290 V4 Call CLOSE StateID: 0xb7e6  nfs.fh.hash == 0x28a9c242
 1030 2020-02-15 18:03:56.033402 192.168.124.138 → 192.168.124.151 NFS 246 V4 Reply (Call In 1029) CLOSE
 1031 2020-02-15 18:03:56.033524 192.168.124.151 → 192.168.124.138 NFS 338 V4 Call OPEN DH: 0xfe9f62c8/  nfs.fh.hash == 0xfe9f62c8
 1032 2020-02-15 18:03:56.033667 192.168.124.138 → 192.168.124.151 NFS 474 V4 Reply (Call In 1031) OPEN StateID: 0xab5d
 1033 2020-02-15 18:03:56.033800 192.168.124.151 → 192.168.124.138 NFS 338 V4 Call OPEN DH: 0x28a9c242/  nfs.fh.hash == 0x28a9c242
 1034 2020-02-15 18:03:56.033908 192.168.124.138 → 192.168.124.151 NFS 438 V4 Reply (Call In 1033) OPEN StateID: 0x051f
 1035 2020-02-15 18:03:56.034035 192.168.124.151 → 192.168.124.138 NFS 378 V4 Call CLONE Src StateID: 0x1c09 Offset: 0 Len: 36368 Dst StateID: 0x00b2 Offset: 0  nfs.fh.hash == 0xfe9f62c8  nfs.fh.hash == 0x28a9c242
 1036 2020-02-15 18:03:56.034186 192.168.124.138 → 192.168.124.151 NFS 246 V4 Reply (Call In 1035) CLONE
 1037 2020-02-15 18:03:56.034316 192.168.124.151 → 192.168.124.138 NFS 338 V4 Call SETATTR FH: 0x28a9c242  nfs.fh.hash == 0x28a9c242
 1038 2020-02-15 18:03:56.035619 192.168.124.138 → 192.168.124.151 NFS 386 V4 Reply (Call In 1037) SETATTR
 1039 2020-02-15 18:03:56.035734 192.168.124.151 → 192.168.124.138 NFS 274 V4 Call CLOSE StateID: 0xab5d  nfs.fh.hash == 0xfe9f62c8
 1040 2020-02-15 18:03:56.035885 192.168.124.138 → 192.168.124.151 NFS 182 V4 Reply (Call In 1039) CLOSE
 1041 2020-02-15 18:03:56.035984 192.168.124.151 → 192.168.124.138 NFS 290 V4 Call CLOSE StateID: 0x051f  nfs.fh.hash == 0x28a9c242
 1042 2020-02-15 18:03:56.036089 192.168.124.138 → 192.168.124.151 NFS 246 V4 Reply (Call In 1041) CLOSE
 1043 2020-02-15 18:03:56.036224 192.168.124.151 → 192.168.124.138 NFS 338 V4 Call OPEN DH: 0x28a9c242/  nfs.fh.hash == 0x28a9c242
 1044 2020-02-15 18:03:56.036336 192.168.124.138 → 192.168.124.151 NFS 438 V4 Reply (Call In 1043) OPEN StateID: 0x20d2
 1045 2020-02-15 18:03:56.036471 192.168.124.151 → 192.168.124.138 NFS 282 V4 Call READ StateID: 0x257f Offset: 0 Len: 32768  nfs.fh.hash == 0x28a9c242
 1046 2020-02-15 18:03:56.036842 192.168.124.138 → 192.168.124.151 NFS 32942 V4 Reply (Call In 1045) READ

 ### Rawhide

# uname -r
5.6.0-0.rc0.git5.1.fc32.x86_64

# ./setup_nfs.sh setup 8

# ./setup_nfs.sh action 8
f1c9645dbc14efddc7d8a322685f26eb  /mnt/test/test_file
f1c9645dbc14efddc7d8a322685f26eb  /mnt/test/test_file-cfr
Outcome 0
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/src
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/dst
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/src
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/dst

# ./setup_nfs.sh cleanup 8

$ grep read 5.6.0-0.rc0.git5.1.fc32.x86_64-cfr_test-03_cfrbug3-md5sum-rhel8.strace | egrep -o "^.{,150}|.{,30}$" | tail -5
763   17:55:18.870011 read(3</mnt/test/dst>, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
..., 32768) = 32768 <0.000044>
763   17:55:18.870284 read(3</mnt/test/dst>, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaa", 32768) = 3600 <0.000031>
763   17:55:18.870433 read(3</mnt/test/dst>, "", 24576) = 0 <0.000026>

$ tshark -tad -n -r 5.6.0-0.rc0.git5.1.fc32.x86_64-cfr_test-rhel8.pcap -Y 'nfs && frame.number in {1262..1298}' -z proto,colinfo,nfs.fh.hash,nfs.fh.hash
 1262 2020-02-15 17:55:18.805792 192.168.124.142 → 192.168.124.138 NFS 426 V4 Call OPEN DH: 0x60df7df3/src  nfs.fh.hash == 0x60df7df3
 1263 2020-02-15 17:55:18.807618 192.168.124.138 → 192.168.124.142 NFS 498 V4 Reply (Call In 1262) OPEN StateID: 0xa226  nfs.fh.hash == 0x5a167e87
 1268 2020-02-15 17:55:18.809501 192.168.124.142 → 192.168.124.138 NFS 7710 V4 Call WRITE StateID: 0xa78b Offset: 0 Len: 36368  nfs.fh.hash == 0x5a167e87
 1271 2020-02-15 17:55:18.812009 192.168.124.138 → 192.168.124.142 NFS 246 V4 Reply (Call In 1268) WRITE
 1272 2020-02-15 17:55:18.812539 192.168.124.142 → 192.168.124.138 NFS 290 V4 Call CLOSE StateID: 0xa226  nfs.fh.hash == 0x5a167e87
 1273 2020-02-15 17:55:18.812734 192.168.124.138 → 192.168.124.142 NFS 246 V4 Reply (Call In 1272) CLOSE
 1274 2020-02-15 17:55:18.813158 192.168.124.142 → 192.168.124.138 NFS 426 V4 Call OPEN DH: 0x60df7df3/dst  nfs.fh.hash == 0x60df7df3
 1275 2020-02-15 17:55:18.814467 192.168.124.138 → 192.168.124.142 NFS 498 V4 Reply (Call In 1274) OPEN StateID: 0x87eb  nfs.fh.hash == 0x137f2b3d
 1278 2020-02-15 17:55:18.815585 192.168.124.142 → 192.168.124.138 NFS 470 V4 Call WRITE StateID: 0x8246 Offset: 0 Len: 36368  nfs.fh.hash == 0x137f2b3d
 1280 2020-02-15 17:55:18.817811 192.168.124.138 → 192.168.124.142 NFS 246 V4 Reply (Call In 1278) WRITE
 1281 2020-02-15 17:55:18.818082 192.168.124.142 → 192.168.124.138 NFS 290 V4 Call CLOSE StateID: 0x87eb  nfs.fh.hash == 0x137f2b3d
 1282 2020-02-15 17:55:18.818281 192.168.124.138 → 192.168.124.142 NFS 246 V4 Reply (Call In 1281) CLOSE
 1283 2020-02-15 17:55:18.818616 192.168.124.142 → 192.168.124.138 NFS 338 V4 Call OPEN DH: 0x5a167e87/  nfs.fh.hash == 0x5a167e87
 1284 2020-02-15 17:55:18.818806 192.168.124.138 → 192.168.124.142 NFS 474 V4 Reply (Call In 1283) OPEN StateID: 0x9b50
 1285 2020-02-15 17:55:18.819199 192.168.124.142 → 192.168.124.138 NFS 338 V4 Call OPEN DH: 0x137f2b3d/  nfs.fh.hash == 0x137f2b3d
 1286 2020-02-15 17:55:18.819347 192.168.124.138 → 192.168.124.142 NFS 438 V4 Reply (Call In 1285) OPEN StateID: 0x3512
 1287 2020-02-15 17:55:18.819870 192.168.124.142 → 192.168.124.138 NFS 378 V4 Call CLONE Src StateID: 0x2c04 Offset: 0 Len: 36368 Dst StateID: 0x30bf Offset: 0  nfs.fh.hash == 0x5a167e87  nfs.fh.hash == 0x137f2b3d
 1288 2020-02-15 17:55:18.820080 192.168.124.138 → 192.168.124.142 NFS 246 V4 Reply (Call In 1287) CLONE
 1289 2020-02-15 17:55:18.820671 192.168.124.142 → 192.168.124.138 NFS 338 V4 Call SETATTR FH: 0x137f2b3d  nfs.fh.hash == 0x137f2b3d
 1290 2020-02-15 17:55:18.822063 192.168.124.138 → 192.168.124.142 NFS 386 V4 Reply (Call In 1289) SETATTR
 1291 2020-02-15 17:55:18.822566 192.168.124.142 → 192.168.124.138 NFS 274 V4 Call CLOSE StateID: 0x9b50  nfs.fh.hash == 0x5a167e87
 1292 2020-02-15 17:55:18.822768 192.168.124.138 → 192.168.124.142 NFS 182 V4 Reply (Call In 1291) CLOSE
 1293 2020-02-15 17:55:18.823074 192.168.124.142 → 192.168.124.138 NFS 290 V4 Call CLOSE StateID: 0x3512  nfs.fh.hash == 0x137f2b3d
 1294 2020-02-15 17:55:18.823253 192.168.124.138 → 192.168.124.142 NFS 246 V4 Reply (Call In 1293) CLOSE
 1295 2020-02-15 17:55:18.823559 192.168.124.142 → 192.168.124.138 NFS 338 V4 Call OPEN DH: 0x137f2b3d/  nfs.fh.hash == 0x137f2b3d
 1296 2020-02-15 17:55:18.823768 192.168.124.138 → 192.168.124.142 NFS 438 V4 Reply (Call In 1295) OPEN StateID: 0x10df
 1297 2020-02-15 17:55:18.824274 192.168.124.142 → 192.168.124.138 NFS 282 V4 Call READ StateID: 0x1572 Offset: 0 Len: 36368  nfs.fh.hash == 0x137f2b3d
 1298 2020-02-15 17:55:18.824607 192.168.124.138 → 192.168.124.142 NFS 36542 V4 Reply (Call In 1297) READ


Still looking into this issue.

Comment 3 Jacob Shivers 2020-02-16 01:36:32 UTC
Noticed that I did a bad copy and did not paste in the wrapper script I ended up using:

# cat setup_nfs.sh
#! /bin/bash

function setup() {
        logger "start test rhel-$1"
        #rpcdebug -m nfs -s all
        #rpcdebug -m rpc -s all
        tcpdump -s0 -n -i $(tcpdump -D | awk 'NR == 1{gsub(/.*\./,"",$1); print $1}') -w /tmp/$(uname -r)-cfr_test-rhel$1.pcap &
}

function action() {
        if [[ $1 == 7 ]]; then
                mount ad-nfs-server.example.net:/test /mnt/test -o vers=4.2,sec=sys
        elif [[ $1 == 8 ]]; then
                mount nfs-server-8.example.net:/test /mnt/test -o vers=4.2,sec=sys
        else
                exit 1
        fi
        strace -fvttTyyx -s 4096 -o /tmp/$(uname -r)-cfr_test-00_man2cfr-rhel$1.strace /usr/local/bin/copy_file_range /mnt/test/test_file /mnt/test/test_file-cfr
        strace -fvttTyyx -s 4096 -o /tmp/$(uname -r)-cfr_test-01_man2cfr-md5sum-rhel$1.strace md5sum /mnt/test/test_file{,-cfr}
        cd /mnt/test
        strace -fvttTyyx -s 4096 -o /tmp/$(uname -r)-cfr_test-02_cfrbug3-rhel$1.strace /usr/local/bin/cfrbug3
        strace -fvttTyyx -s 4096 -o /tmp/$(uname -r)-cfr_test-03_cfrbug3-md5sum-rhel$1.strace md5sum /mnt/test/{src,dst}
        echo 3 > /proc/sys/vm/drop_caches
        strace -fvttTyyx -s 4096 -o /tmp/$(uname -r)-cfr_test-04_cfrbug3-md5sum-rhel$1.strace md5sum /mnt/test/{src,dst}
}

function cleanup() {
        pkill tcpdump
        #rpcdebug -m rpc -c all
        #rpcdebug -m nfs -c all
        logger "stop test rhel-$1"
        journalctl -b0 | sed '/start test/,/stop test/ !d' > /tmp/$(uname -r)-cfr_test-rhel$1.log
        #rm -f /mnt/test/{test_file-cfr,src,dst}
        umount /mnt/test
}

case $1 in
        setup)  
                setup $2
                ;;
        action) 
                action $2
                ;;
        cleanup)
                cleanup $2
                ;;
        *)
                echo "Usage: $0 {setup|cleanup}"
                exit 1
esac

Comment 4 Jacob Shivers 2020-02-18 16:08:01 UTC
As a sanity check, I tested with SMBv3.11 and server side offload and I can not reproduce the behavior.

# ./setup_cifs.sh action smb
f1c9645dbc14efddc7d8a322685f26eb  /mnt/test/test_file
f1c9645dbc14efddc7d8a322685f26eb  /mnt/test/test_file-cfr
Outcome 0
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/src
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/dst
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/src
5897a64f1ccbf727b28fb60d43f60c0d  /mnt/test/dst

 o The client does a full READ for dst instead of a partial
$ tshark -tad -n -r 4.18.0-179.el8.x86_64-cfr_test-rhelsmb.pcap -Y 'smb2 && frame.number in {817..903}'
  817 2020-02-18 10:56:17.838733 192.168.124.151 → 192.168.124.153 SMB2 342 Create Request File: src
  818 2020-02-18 10:56:17.840471 192.168.124.153 → 192.168.124.151 SMB2 354 Create Response File: src
  819 2020-02-18 10:56:17.840527 192.168.124.151 → 192.168.124.153 SMB2 158 Flush Request File: src
  820 2020-02-18 10:56:17.842144 192.168.124.153 → 192.168.124.151 SMB2 143 Flush Response, Error: STATUS_PENDING
  821 2020-02-18 10:56:17.860476 192.168.124.153 → 192.168.124.151 SMB2 138 Flush Response
  823 2020-02-18 10:56:17.860533 192.168.124.151 → 192.168.124.153 SMB2 174 SetInfo Request FILE_INFO/SMB2_FILE_ENDOFFILE_INFO File: src
  824 2020-02-18 10:56:17.860798 192.168.124.153 → 192.168.124.151 SMB2 136 SetInfo Response
  825 2020-02-18 10:56:17.860895 192.168.124.151 → 192.168.124.153 SMB2 446 Create Request File: src;SetInfo Request FILE_INFO/SMB2_FILE_BASIC_INFO;Close Request
  826 2020-02-18 10:56:17.861897 192.168.124.153 → 192.168.124.151 SMB2 478 Create Response File: src;SetInfo Response;Close Response
  827 2020-02-18 10:56:17.861977 192.168.124.151 → 192.168.124.153 SMB2 414 Create Request File: src;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
  828 2020-02-18 10:56:17.862673 192.168.124.153 → 192.168.124.151 SMB2 590 Create Response File: src;GetInfo Response;Close Response
  835 2020-02-18 10:56:17.863428 192.168.124.151 → 192.168.124.153 SMB2 7590 Write Request Len:36368 Off:0 File: src
  838 2020-02-18 10:56:17.864122 192.168.124.153 → 192.168.124.151 SMB2 150 Write Response
  839 2020-02-18 10:56:17.864217 192.168.124.151 → 192.168.124.153 SMB2 158 Close Request File: src
  840 2020-02-18 10:56:17.865022 192.168.124.153 → 192.168.124.151 SMB2 194 Close Response
  841 2020-02-18 10:56:17.865257 192.168.124.151 → 192.168.124.153 SMB2 342 Create Request File: dst
  842 2020-02-18 10:56:17.866499 192.168.124.153 → 192.168.124.151 SMB2 354 Create Response File: dst
  843 2020-02-18 10:56:17.866549 192.168.124.151 → 192.168.124.153 SMB2 158 Flush Request File: dst
  844 2020-02-18 10:56:17.868005 192.168.124.153 → 192.168.124.151 SMB2 143 Flush Response, Error: STATUS_PENDING
  845 2020-02-18 10:56:17.871735 192.168.124.153 → 192.168.124.151 SMB2 138 Flush Response
  847 2020-02-18 10:56:17.871789 192.168.124.151 → 192.168.124.153 SMB2 174 SetInfo Request FILE_INFO/SMB2_FILE_ENDOFFILE_INFO File: dst
  848 2020-02-18 10:56:17.872106 192.168.124.153 → 192.168.124.151 SMB2 136 SetInfo Response
  849 2020-02-18 10:56:17.872159 192.168.124.151 → 192.168.124.153 SMB2 446 Create Request File: dst;SetInfo Request FILE_INFO/SMB2_FILE_BASIC_INFO;Close Request
  850 2020-02-18 10:56:17.873072 192.168.124.153 → 192.168.124.151 SMB2 478 Create Response File: dst;SetInfo Response;Close Response
  851 2020-02-18 10:56:17.873161 192.168.124.151 → 192.168.124.153 SMB2 414 Create Request File: dst;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
  852 2020-02-18 10:56:17.873870 192.168.124.153 → 192.168.124.151 SMB2 590 Create Response File: dst;GetInfo Response;Close Response
  862 2020-02-18 10:56:17.874323 192.168.124.151 → 192.168.124.153 SMB2 350 Write Request Len:36368 Off:0 File: dst
  872 2020-02-18 10:56:17.874988 192.168.124.153 → 192.168.124.151 SMB2 150 Write Response
  873 2020-02-18 10:56:17.875039 192.168.124.151 → 192.168.124.153 SMB2 158 Close Request File: dst
  874 2020-02-18 10:56:17.875593 192.168.124.153 → 192.168.124.151 SMB2 194 Close Response
  875 2020-02-18 10:56:17.875717 192.168.124.151 → 192.168.124.153 SMB2 342 Create Request File: src
  876 2020-02-18 10:56:17.876300 192.168.124.153 → 192.168.124.151 SMB2 354 Create Response File: src
  877 2020-02-18 10:56:17.876438 192.168.124.151 → 192.168.124.153 SMB2 342 Create Request File: dst
  878 2020-02-18 10:56:17.876962 192.168.124.153 → 192.168.124.151 SMB2 354 Create Response File: dst
  879 2020-02-18 10:56:17.877116 192.168.124.151 → 192.168.124.153 SMB2 191 Ioctl Request FSCTL_SRV_REQUEST_RESUME_KEY File: src
  880 2020-02-18 10:56:17.877286 192.168.124.153 → 192.168.124.151 SMB2 214 Ioctl Response FSCTL_SRV_REQUEST_RESUME_KEY File: src
  881 2020-02-18 10:56:17.877311 192.168.124.151 → 192.168.124.153 SMB2 246 Ioctl Request FSCTL_SRV_COPYCHUNK_WRITE File: dst
  882 2020-02-18 10:56:17.877696 192.168.124.153 → 192.168.124.151 SMB2 194 Ioctl Response FSCTL_SRV_COPYCHUNK_WRITE File: dst
  883 2020-02-18 10:56:17.877922 192.168.124.151 → 192.168.124.153 SMB2 414 Create Request File: dst;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
  884 2020-02-18 10:56:17.878611 192.168.124.153 → 192.168.124.151 SMB2 590 Create Response File: dst;GetInfo Response;Close Response
  885 2020-02-18 10:56:17.878751 192.168.124.151 → 192.168.124.153 SMB2 158 Flush Request File: dst
  886 2020-02-18 10:56:17.880487 192.168.124.153 → 192.168.124.151 SMB2 143 Flush Response, Error: STATUS_PENDING
  887 2020-02-18 10:56:17.882262 192.168.124.153 → 192.168.124.151 SMB2 138 Flush Response
  889 2020-02-18 10:56:17.882322 192.168.124.151 → 192.168.124.153 SMB2 446 Create Request File: dst;SetInfo Request FILE_INFO/SMB2_FILE_BASIC_INFO;Close Request
  890 2020-02-18 10:56:17.883260 192.168.124.153 → 192.168.124.151 SMB2 478 Create Response File: dst;SetInfo Response;Close Response
  891 2020-02-18 10:56:17.883440 192.168.124.151 → 192.168.124.153 SMB2 158 Close Request File: src
  892 2020-02-18 10:56:17.883852 192.168.124.153 → 192.168.124.151 SMB2 194 Close Response
  893 2020-02-18 10:56:17.883974 192.168.124.151 → 192.168.124.153 SMB2 158 Close Request File: dst
  894 2020-02-18 10:56:17.884331 192.168.124.153 → 192.168.124.151 SMB2 194 Close Response
  895 2020-02-18 10:56:17.884438 192.168.124.151 → 192.168.124.153 SMB2 342 Create Request File: dst
  896 2020-02-18 10:56:17.884983 192.168.124.153 → 192.168.124.151 SMB2 354 Create Response File: dst
  897 2020-02-18 10:56:17.885136 192.168.124.151 → 192.168.124.153 SMB2 183 Read Request Len:36864 Off:0 File: dst
  898 2020-02-18 10:56:17.885586 192.168.124.153 → 192.168.124.151 SMB2 36518 Read Response
  902 2020-02-18 10:56:17.886309 192.168.124.151 → 192.168.124.153 SMB2 158 Close Request File: dst
  903 2020-02-18 10:56:17.886727 192.168.124.153 → 192.168.124.151 SMB2 194 Close Response

I showed above, but this behavior does not exist upstream on the latest 5.6rc kernel.

Comment 5 Jacob Shivers 2020-02-20 23:33:17 UTC
 ### 4.19 stable

# uname -r
4.19.105

# mount ad-nfs-server.example.net:/test /mnt/test -o sec=sys,vers=4.2

# cd /mnt/test
# rm dst src 
rm: remove regular file 'dst'? y
rm: remove regular file 'src'? y

# cfrbug3 
Outcome 15

 ### 5.0 stable

# uname -r
5.0.21

# mount ad-nfs-server.example.net:/test /mnt/test -o sec=sys,vers=4.2

# cd /mnt/test
# rm dst src
rm: remove regular file 'dst'? y
rm: remove regular file 'src'? y

# cfrbug3 
Outcome 15

------------------------------------------------------------8<------------------------------------------------------------
# git log --pretty=oneline stable/linux-5.0.y..master fs/nfs | wc -l
373

# git log --pretty=oneline stable/linux-5.3.y..master fs/nfs | wc -l
186
------------------------------------------------------------8<------------------------------------------------------------

 ### 5.3 stable

# uname -r
5.3.18

# mount ad-nfs-server.example.net:/test /mnt/test -o sec=sys,vers=4.2

# cd /mnt/test/
# rm src dst
rm: remove regular file 'src'? y
rm: remove regular file 'dst'? y

# cfrbug3 
Outcome 0 <--------


 ### 5.1 stable

# uname -r
5.1.21

# mount ad-nfs-server.example.net:/test /mnt/test -o sec=sys,vers=4.2

# cd /mnt/test

# rm src dst
rm: cannot remove 'src': No such file or directory
rm: cannot remove 'dst': No such file or directory

# cfrbug3 
Outcome 15

 ### 5.2 stable

# uname -r
5.2.21

# mount ad-nfs-server.example.net:/test /mnt/test -o sec=sys,vers=4.2

# cd /mnt/test

# rm src dst
rm: remove regular file 'src'? y
rm: remove regular file 'dst'? y

# cfrbug3 
Outcome 15 <--------

So the last "bad" kernel is in 5.2 and the first good is 5.3.

Comment 6 JianHong Yin 2020-02-21 04:45:14 UTC
Hi David Koppelman

JFYI: I can reproduce the issue even after comment out futimens() call

'''
[root@yjh-rhel-810 rw]# mount -t nfs4
192.168.122.51:/ on /mnt/nfsmp type nfs4 (rw,relatime,vers=4.2,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.251,local_lock=none,addr=192.168.122.51)
192.168.122.51:/nfsshare on /mnt/nfsmp/nfsshare type nfs4 (rw,relatime,vers=4.2,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.251,local_lock=none,addr=192.168.122.51)
[root@yjh-rhel-810 rw]# pwd
/mnt/nfsmp/nfsshare/rw
[root@yjh-rhel-810 rw]# grep futimens crfbug3.c -A1
  //if ( 0 && futimens(dst,tam) )
    //return 22;
[root@yjh-rhel-810 rw]# gcc crfbug3.c
[root@yjh-rhel-810 rw]# ./a.out
Outcome 15
[root@yjh-rhel-810 rw]# md5sum testfile-*
13e79e367f6157a0123171e861b5af2e  testfile-dst-I-hope-this-name-wasnt-used
5897a64f1ccbf727b28fb60d43f60c0d  testfile-src-I-hope-this-name-wasnt-used
'''


another reproducer:
'''
dd if=/dev/zero bs=36368 count=1 | tr "\000" "a" >testfile
dd if=/dev/zero bs=36368 count=1 | tr "\000" "b" >out
wget -O copy_file_range.c https://raw.githubusercontent.com/tcler/linux-network-filesystems/master/utils/syscall_wrapper/copy_file_range.c
gcc -o copy_file_range copy_file_range.c
./copy_file_range out testfile
cmp out testfile
md5sum out testfile
'''

Comment 7 JianHong Yin 2020-02-21 06:02:44 UTC
I found that only 4K aligned files can pass the test:

'''
#wget -O copy_file_range.c https://raw.githubusercontent.com/tcler/linux-network-filesystems/master/utils/syscall_wrapper/copy_file_range.c
#gcc -o copy_file_range copy_file_range.c

#cd /nfs_mountpoint
for ((BS=1; BS<36368; BS++)); do
        echo -e "\E[44mBS=$BS\E[0m"
        dd if=/dev/zero bs=$BS count=1 | tr "\000" "a" >testfile
        dd if=/dev/zero bs=$BS count=1 | tr "\000" "b" >out
        ./copy_file_range out testfile
        cmp out testfile && echo -e "\E[44mOK: test pass(BS=$BS)\E[0m"
        #md5sum out testfile
done
'''

'''
[root@yjh-rhel-810 rw]# grep OK test.log
OK: test pass(BS=4096)
OK: test pass(BS=8192)
OK: test pass(BS=12288)
OK: test pass(BS=16384)
OK: test pass(BS=20480)
...
'''

Comment 11 David Koppelman 2020-02-21 14:41:49 UTC
> JFYI: I can reproduce the issue even after comment out futimens() call

On my system it does not always occur without the futimens call. I
wanted to provide a consistent reproducer to make it easy to track
down the flaw.

Comment 12 Jacob Shivers 2020-02-21 19:10:12 UTC
# git branch | head -1
* (HEAD detached at nfs-for-5.3-1)

# head Makefile 
# SPDX-License-Identifier: GPL-2.0
VERSION = 5
PATCHLEVEL = 2
SUBLEVEL = 0
EXTRAVERSION = -rc7
NAME = Bobtail Squid

# *DOCUMENTATION*
# To see a list of typical targets execute "make help"
# More info can be located in ./README

# uname -r
5.2.0-rc7

# mount ad-nfs-server.example.net:/test /mnt/test -o sec=sys,vers=4.2

# cd /mnt/test

# rm src dst
rm: remove regular file 'src'? y
rm: remove regular file 'dst'? y

# cfrbug3 
Outcome 0

------------------------------------------------------------8<------------------------------------------------------------

# git branch | head -1
* (HEAD detached at nfs-for-5.1-5)

# head Makefile 
# SPDX-License-Identifier: GPL-2.0
VERSION = 5
PATCHLEVEL = 1
SUBLEVEL = 0
EXTRAVERSION = -rc5
NAME = Shy Crocodile

# *DOCUMENTATION*
# To see a list of typical targets execute "make help"
# More info can be located in ./README

# uname -r
5.1.0-rc5

# mount ad-nfs-server.example.net:/test /mnt/test -o sec=sys,vers=4.2

# cd /mnt/test

# rm src dst
rm: remove regular file 'src'? y
rm: remove regular file 'dst'? y

# cfrbug3 
Outcome 15


# cat $(which setup_nfs.sh)
#! /bin/bash

function setup() {
        logger "start test"
        rpcdebug -m nfs -s all
        rpcdebug -m rpc -s all
        tcpdump -s0 -n -i $(tcpdump -D | awk 'NR == 1{gsub(/.*\./,"",$1); print $1}') -w /tmp/$(uname -r)-cfrbug3.pcap &
}

function action() {
        mount ad-nfs-server.example.net:/test /mnt/test -o vers=4.2,sec=sys
        cd /mnt/test
        strace -fvttTyyx -s 4096 -o /tmp/$(uname -r)-cfrbug3.strace /usr/local/bin/cfrbug3
}

function cleanup() {
        pkill tcpdump
        rpcdebug -m rpc -c all
        rpcdebug -m nfs -c all
        logger "stop test"
        journalctl -b0 -o short-precise | sed '/start test/,/stop test/ !d' > /tmp/$(uname -r)-cfrbug3.log
        rm -f /mnt/test/{src,dst}
        umount /mnt/test
}

case $1 in
        setup)  
                setup $2
                ;;
        action) 
                action $2
                ;;
        cleanup)
                cleanup $2
                ;;
        *)
                echo "Usage: $0 {setup|action|cleanup}"
                exit 1
esac

### Testing

# uname -r
5.1.0-rc5

# setup_nfs.sh setup
nfs        vfs dircache lookupcache pagecache proc xdr file root callback client mount fscache pnfs pnfs_ld state
rpc        xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache

# setup_nfs.sh action
Outcome 15

# setup_nfs.sh cleanup


# uname -r
5.2.0-rc7

# setup_nfs.sh setup

# setup_nfs.sh action
Outcome 0

# setup_nfs.sh cleanup


$ diff -y <(egrep -C 1 "32768|12991402" 5.1.0-rc5-cfrbug3.log) <(egrep -C 1 "32768|13009944" 5.2.0-rc7-cfrbug3.log) --width=250
------------------------------------------------------------8<------------------------------------------------------------
Feb 21 18:22:09.379706 git-box-8.example.net kernel: decode_attr_fsid: fsid=(0x1f61d39b60334fc9/0xbac92e2072e86029)         |   Feb 21 18:24:29.497678 git-box-8.example.net kernel: decode_attr_fsid: fsid=(0x1f61d39b60334fc9/0xbac92e2072e86029)
Feb 21 18:22:09.381831 git-box-8.example.net kernel: decode_attr_fileid: fileid=12991402                                    |   Feb 21 18:24:29.499764 git-box-8.example.net kernel: decode_attr_fileid: fileid=13009944
Feb 21 18:22:09.383187 git-box-8.example.net kernel: decode_attr_fs_locations: fs_locations done, error = 0                 |   Feb 21 18:24:29.501120 git-box-8.example.net kernel: decode_attr_fs_locations: fs_locations done, error = 0
--                                                                                                                          |   --
Feb 21 18:22:09.395541 git-box-8.example.net kernel: decode_attr_time_modify: mtime=1582309328                              |   Feb 21 18:24:29.513512 git-box-8.example.net kernel: decode_attr_time_modify: mtime=1582309468
Feb 21 18:22:09.397133 git-box-8.example.net kernel: decode_attr_mounted_on_fileid: fileid=12991402                         |   Feb 21 18:24:29.515044 git-box-8.example.net kernel: decode_attr_mounted_on_fileid: fileid=13009944
Feb 21 18:22:09.398806 git-box-8.example.net kernel: decode_getfattr_attrs: xdr returned 0                                  |   Feb 21 18:24:29.516691 git-box-8.example.net kernel: decode_getfattr_attrs: xdr returned 0
--                                                                                                                          |   --
Feb 21 18:22:09.424885 git-box-8.example.net kernel: RPC:    37 freeing task                                                |   Feb 21 18:24:29.542208 git-box-8.example.net kernel: RPC:    37 freeing task
Feb 21 18:22:09.427896 git-box-8.example.net kernel: NFS: nfs_update_inode(0:46/12991402 fh_crc=0xeb760fad ct=3 info=0x427  |   Feb 21 18:24:29.546035 git-box-8.example.net kernel: NFS: nfs_update_inode(0:46/13009944 fh_crc=0x7860c055 ct=3 info=0x427
Feb 21 18:22:09.427959 git-box-8.example.net kernel: RPC:       wake_up_first(000000004d48935f "ForeChannel Slot table")    |   Feb 21 18:24:29.546108 git-box-8.example.net kernel: RPC:       wake_up_first(00000000fcde2231 "ForeChannel Slot table")
--                                                                                                                          |   --
Feb 21 18:22:09.438865 git-box-8.example.net kernel: NFS: read(/dst, 36368@0)                                               |   Feb 21 18:24:29.557693 git-box-8.example.net kernel: NFS: read(/dst, 36368@0)
Feb 21 18:22:09.441247 git-box-8.example.net kernel: NFS: nfs_readpages (0:46/12991402 8)                                   |   Feb 21 18:24:29.559747 git-box-8.example.net kernel: NFS: (0:46/13009944) data cache invalidated <--------
Feb 21 18:22:09.441311 git-box-8.example.net kernel: NFS: initiated pgio call (req 0:46/12991402, 32768 bytes @ offset 0)   |   Feb 21 18:24:29.559809 git-box-8.example.net kernel: NFS: nfs_readpages (0:46/13009944 9)
Feb 21 18:22:09.443431 git-box-8.example.net kernel: RPC:       new task initialized, procpid 2185                          |   Feb 21 18:24:29.561041 git-box-8.example.net kernel: NFS: initiated pgio call (req 0:46/13009944, 36368 bytes @ offset 0)
--                                                                                                                          |   Feb 21 18:24:29.563066 git-box-8.example.net kernel: RPC:       new task initialized, procpid 2091
Feb 21 18:22:09.501012 git-box-8.example.net kernel: RPC:    38 call_decode (status 0)                                      |   --
Feb 21 18:22:09.502577 git-box-8.example.net kernel: RPC:    38 call_decode result 32768                                    |   Feb 21 18:24:29.641631 git-box-8.example.net kernel: RPC:       rpc_release_client(000000003ab0a38d)
Feb 21 18:22:09.504251 git-box-8.example.net kernel: NFS: nfs_pgio_result:    38, (status 32768)                            |   Feb 21 18:24:29.643307 git-box-8.example.net kernel: NFS: read done (0:46/13009944 4096@0)
Feb 21 18:22:09.506168 git-box-8.example.net kernel: --> nfs4_read_done                                                     |   Feb 21 18:24:29.644639 git-box-8.example.net kernel: NFS: read done (0:46/13009944 4096@4096)
--                                                                                                                          |   Feb 21 18:24:29.646173 git-box-8.example.net kernel: NFS: read done (0:46/13009944 4096@8192)
Feb 21 18:22:09.521554 git-box-8.example.net kernel: nfs4_free_slot: slotid 0 highest_used_slotid 4294967295                |   Feb 21 18:24:29.647663 git-box-8.example.net kernel: NFS: read done (0:46/13009944 4096@12288)
Feb 21 18:22:09.523835 git-box-8.example.net kernel: RPC:    38 return 0, status 32768                                      |   Feb 21 18:24:29.649174 git-box-8.example.net kernel: NFS: read done (0:46/13009944 4096@16384)
Feb 21 18:22:09.525489 git-box-8.example.net kernel: RPC:    38 release task                                                |   Feb 21 18:24:29.650669 git-box-8.example.net kernel: NFS: read done (0:46/13009944 4096@20480)
--                                                                                                                          |   Feb 21 18:24:29.652197 git-box-8.example.net kernel: NFS: read done (0:46/13009944 4096@24576)
Feb 21 18:22:09.533663 git-box-8.example.net kernel: RPC:       rpc_release_client(00000000d5a15508)                        |   Feb 21 18:24:29.653705 git-box-8.example.net kernel: NFS: read done (0:46/13009944 4096@28672)
Feb 21 18:22:09.535705 git-box-8.example.net kernel: NFS: read done (0:46/12991402 4096@0)                                  |   Feb 21 18:24:29.655228 git-box-8.example.net kernel: NFS: read done (0:46/13009944 3600@32768)
Feb 21 18:22:09.539362 git-box-8.example.net kernel: NFS: read done (0:46/12991402 4096@4096)                               |   Feb 21 18:24:29.658302 git-box-8.example.net kernel: NFS: flush(/dst)
Feb 21 18:22:09.539439 git-box-8.example.net kernel: NFS: read done (0:46/12991402 4096@8192)                               <
Feb 21 18:22:09.541232 git-box-8.example.net kernel: NFS: read done (0:46/12991402 4096@12288)                              <
Feb 21 18:22:09.543080 git-box-8.example.net kernel: NFS: read done (0:46/12991402 4096@16384)                              <
Feb 21 18:22:09.544940 git-box-8.example.net kernel: NFS: read done (0:46/12991402 4096@20480)                              <
Feb 21 18:22:09.546825 git-box-8.example.net kernel: NFS: read done (0:46/12991402 4096@24576)                              <
Feb 21 18:22:09.548699 git-box-8.example.net kernel: NFS: read done (0:46/12991402 4096@28672)                              <
Feb 21 18:22:09.550890 git-box-8.example.net kernel: NFS: flush(/dst)  

 o Patch we need

# git show -s --format='%s %b%n' 1c341b777501613aad83f9c233a3fe5701cff083
NFS: Add deferred cache invalidation for close-to-open consistency violations If the client detects that close-to-open cache consistency has been
violated, and that the file or directory has been changed on the
server, then do a cache invalidation when we're done working with
the file.
The reason we don't do an immediate cache invalidation is that we
want to avoid performance problems due to false positives. Also,
note that we cannot guarantee cache consistency in this situation
even if we do invalidate the cache.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

# git show 1c341b777501613aad83f9c233a3fe5701cff083 | sed '/diff/,$ !d' > invalidate.patch

# patch -p1 < invalidate.patch 
patching file fs/nfs/dir.c
Hunk #1 succeeded at 79 (offset -1 lines).
patching file fs/nfs/inode.c
Hunk #1 succeeded at 207 (offset -1 lines).
Hunk #2 succeeded at 651 (offset -1 lines).
Hunk #3 succeeded at 1027 (offset -6 lines).
Hunk #4 succeeded at 1311 (offset -7 lines).
Hunk #5 succeeded at 1870 (offset -7 lines).
patching file include/linux/nfs_fs.h
Hunk #1 succeeded at 224 (offset 1 line).

------------------------------------------------------------8<------------------------------------------------------------

The patch indeed properly invalidates cache and causes a full read of the file.

# uname -r
5.1.0-rc5+

# setup_nfs.sh setup

# setup_nfs.sh action
Outcome 0

# setup_nfs.sh cleanup

2134 Feb 21 19:04:05.986310 git-box-8.example.net kernel: NFS: read(/dst, 36368@0)
2135 Feb 21 19:04:05.988814 git-box-8.example.net kernel: NFS: (0:46/13009944) data cache invalidated <--------
2136 Feb 21 19:04:05.988865 git-box-8.example.net kernel: NFS: nfs_readpages (0:46/13009944 9)
2137 Feb 21 19:04:05.993195 git-box-8.example.net kernel: NFS: initiated pgio call (req 0:46/13009944, 36368 bytes @ offset 0)
2138 Feb 21 19:04:05.993263 git-box-8.example.net kernel: RPC:       new task initialized, procpid 5491
2139 Feb 21 19:04:05.995125 git-box-8.example.net kernel: RPC:       allocated task 00000000481707b6
2140 Feb 21 19:04:06.016310 git-box-8.example.net kernel: RPC:    38 __rpc_execute flags=0x4001

Comment 21 Bruno Meneguele 2020-03-02 10:13:00 UTC
Patch(es) available on kernel-4.18.0-185.el8

Comment 27 errata-xmlrpc 2020-04-28 16:39:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1769


Note You need to log in before you can comment on or make changes to this bug.