Bug 1442797

Summary: nfs client gets binary zeroes when a file grows on the server
Product: [Fedora] Fedora Reporter: Eyal Lebedinsky <bugzilla>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 25CC: gansalmon, ichavero, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab, samuel-rhbugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-03 04:03:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eyal Lebedinsky 2017-04-17 14:56:07 UTC
Description of problem:
Examining a file on an nfs mounted fs with tail/less/vi or such. As the file grows I get binary zeroes at the tail for a few minutes.

Version-Release number of selected component (if applicable):
Tried the last three kernels that I have:
	4.10.8-100.fc24.x86_64 broken (my latest)
	4.10.6-100.fc24.x86_64 broken
	4.9.17-100.fc24.x86_64 good
Was also good for earlier kernels for years.

How reproducible:
Highly repeatable.

Steps to Reproduce:
1.
select a file on an nfs mount that is growing regularly.
2.
$ less /nfs-mounted-path/file
3.
Hit 'G' to refresh and see zeroes at the tail.

Actual results:
...older content...
^@^@^@^@^@^@^@

Expected results:
...older content...
...later content...

Additional info:

1. The server runs an old f19.

2. mount configuration:

on the server (192.168.3.7)
=============

$ cat /etc/exports
/data              192.168.3.0/24(rw,async)

$ sudo exportfs -v
/data           192.168.3.0/24(rw,async,wdelay,root_squash,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)

on the client (192.168.3.4)
=============

$ cat /etc/fstab
files:/data     /data-e7       nfs    soft   0   0

$ mount
=======
files:/data on /data-e7 type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.3.4,local_lock=none,addr=192.168.3.7)

Comment 1 Eyal Lebedinsky 2017-04-21 12:26:40 UTC
- Tested latest kernel 4.10.10-100.fc24.x86_64 and it also shows the problem.
- should mention that the mount options are the same on all the tested kernels.

Comment 2 Eyal Lebedinsky 2017-05-22 22:42:14 UTC
What is happening with this issue? I am staying on an old 4.9 kernel because of this.
TIA

Comment 3 Eyal Lebedinsky 2017-05-23 04:57:56 UTC
I want to add that:
1) latest kernels, up to 4.10.15-100.fc24.x86_64 show the same problem.
2) Downgrading nfs-utils did not resolve the issue:
   DEBUG ---> Package nfs-utils.x86_64 1:1.3.4-2.rc3.fc24 will be downgraded
   DEBUG ---> Package nfs-utils.x86_64 1:1.3.3-8.rc5.fc24 will be a downgrade

TIA

Comment 4 Eyal Lebedinsky 2017-05-27 05:58:28 UTC
Following is a script and test results showing the problem. It does not happen with the old 4.9.17-100 kernel.
Note the blocks of zeroes dumped every few seconds.

Any help will be appreciated.

[eyal@e7:~]$ cat nfs-test.sh
#!/bin/sh

case "$1" in
c|client)
        while true ; do
                tail -c 35 t1 | od -c
                sleep 5
        done
        ;;
s|server)
        n=1000
        while true ; do
                n=$((n+1))
                msg="`date` $n"
                echo "$msg"
                echo "$msg" >>t1
                sleep 5
        done
        ;;
*)
        echo "specify argument c[lient] or s[erver]"
        ;;
esac

=========> this is running on f19 and /data2 is exported by nfs
[eyal@e7:~]$ uname -a
Linux e7.eyal.emu.id.au 3.14.27-100.fc19.x86_64 #1 SMP Wed Dec 17 19:36:34 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[eyal@e7:~]$ sudo exportfs
/data           192.168.3.0/24
/data1          192.168.3.0/24
/data2          192.168.3.0/24

[eyal@e7:~]$ cd /data2/tmp
[eyal@e7:/data2/tmp]$ ./nfs-test.sh s
Sat May 27 14:56:34 AEST 2017 1001
Sat May 27 14:56:39 AEST 2017 1002
Sat May 27 14:56:44 AEST 2017 1003
Sat May 27 14:56:49 AEST 2017 1004
Sat May 27 14:56:54 AEST 2017 1005
Sat May 27 14:56:59 AEST 2017 1006
Sat May 27 14:57:04 AEST 2017 1007
Sat May 27 14:57:09 AEST 2017 1008
Sat May 27 14:57:14 AEST 2017 1009


=========> this is running on f24 (kernel 4.10.15-100) and /data2 is mounted by nfs
[eyal@e4:~]$ grep data2 /etc/fstab
files:/data2                    /data2                  nfs     soft            0 0
[eyal@e4:~]$ mount|grep data2
files:/data2 on /data2 type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.3.4,local_lock=none,addr=192.168.3.7)

[eyal@e4:~]$ cd /data2/tmp
[eyal@e4:/data2/tmp]$ ./nfs-test.sh c
0000000   S   a   t       M   a   y       2   7       1   4   :   5   6
0000020   :   3   9       A   E   S   T       2   0   1   7       1   0
0000040   0   2  \n
0000043
0000000   S   a   t       M   a   y       2   7       1   4   :   5   6
0000020   :   4   4       A   E   S   T       2   0   1   7       1   0
0000040   0   3  \n
0000043
0000000   S   a   t       M   a   y       2   7       1   4   :   5   6
0000020   :   4   9       A   E   S   T       2   0   1   7       1   0
0000040   0   4  \n
0000043
0000000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0000040  \0  \0  \0
0000043
0000000   S   a   t       M   a   y       2   7       1   4   :   5   6
0000020   :   5   9       A   E   S   T       2   0   1   7       1   0
0000040   0   6  \n
0000043
0000000   S   a   t       M   a   y       2   7       1   4   :   5   7
0000020   :   0   4       A   E   S   T       2   0   1   7       1   0
0000040   0   7  \n
0000043
0000000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0000040  \0  \0  \0
0000043
0000000   S   a   t       M   a   y       2   7       1   4   :   5   7
0000020   :   1   4       A   E   S   T       2   0   1   7       1   0
0000040   0   9  \n
0000043

Comment 5 Eyal Lebedinsky 2017-05-28 08:56:31 UTC
I have now tested this on f25, same issue. Booting an older kernel (4.8.?) all is well. Using the latest (4.10.17) shows the above problem.


Can I get a status update on this issue? It holds back my updates.
Can I raise the severity?

Regards

Comment 6 Eyal Lebedinsky 2017-06-01 23:22:18 UTC
I now testes a f25 (kernel 4.9.17-100) client reading a f24 (same kernel) server and the problem is not showing. This looks like a kernel backward compatibility issue.

At the moment upgrading the server is not an option for me, can this be fixed properly in the later kernels?

Regards

Comment 7 Eyal Lebedinsky 2017-06-01 23:31:58 UTC
I just had a f25 update to kernel 4.11.3-200. It shows the same problem when importing from kernel 3.14.27-100.fc19.x86_64.

FYI

Comment 8 Eyal Lebedinsky 2017-06-03 04:03:03 UTC
Sorted out. Mounting the nfs shares with 'nfsvers=3' fixes the problem. The default is '4.1'.

The old kernels (3.8 and 3.9 tested) probably did not do a good job with nfs version 4. Or, possibly, the new kernels are bad in supporting the old kernels when using version 3 (if this was selected).