Bug 1010747

Summary: cp of large file from local disk to nfs mount fails with "Unknown error 527"
Product: [Community] GlusterFS Reporter: Jim <jim>
Component: nfsAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: pre-releaseCC: bugs, gluster-bugs, jim, vagarwal
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-22 15:40:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Log lines showing failure none

Description Jim 2013-09-23 00:12:22 UTC
Description of problem:
==========================
cp of large file from local disk to nfs mounted gluster volume fails with  "Unknown error 527". nfs.log on server with DEBUG enabled shows:

[2013-09-22 23:55:13.364112] D [nfs3-helpers.c:1762:nfs3_log_rw_call] 0-nfs-nfsv3: XID: 9ce83e4, WRITE: args: FH: exportid 6704915b-8e60-4404-90f3-2567e74ec817, gfid 0fc1cd1c-8bb7-4fc4-929a
-58316177a469, offset: 3497734144,  count: 65536, UNSTABLE
[2013-09-22 23:55:13.364157] D [nfs3-helpers.c:3441:nfs3_log_write_res] 0-nfs-nfsv3: XID: 9ce83e4, WRITE: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 65536, UNSTABLE,wve
rf: 1379893029
[2013-09-22 23:55:13.364207] D [afr-transaction.c:1094:afr_post_nonblocking_inodelk_cbk] 0-vol2-replicate-1: Non blocking inodelks done. Proceeding to FOP
[2013-09-22 23:55:13.364528] D [nfs3-helpers.c:1762:nfs3_log_rw_call] 0-nfs-nfsv3: XID: ace83e4, WRITE: args: FH: exportid 6704915b-8e60-4404-90f3-2567e74ec817, gfid 0fc1cd1c-8bb7-4fc4-929a
-58316177a469, offset: 3497799680,  count: 65536, UNSTABLE
[2013-09-22 23:55:13.364649] D [nfs3-helpers.c:3441:nfs3_log_write_res] 0-nfs-nfsv3: XID: ace83e4, WRITE: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 65536, UNSTABLE,wve
rf: 1379893029
[2013-09-22 23:55:13.364748] D [nfs3-helpers.c:1762:nfs3_log_rw_call] 0-nfs-nfsv3: XID: bce83e4, WRITE: args: FH: exportid 6704915b-8e60-4404-90f3-2567e74ec817, gfid 0fc1cd1c-8bb7-4fc4-929a
-58316177a469, offset: 3497865216,  count: 65536, UNSTABLE
[2013-09-22 23:55:13.364778] W [nfs3.c:2109:nfs3svc_write_cbk] 0-nfs: bce83e4: /CentOS-6.3-x86_64-bin-DVD1.iso => -1 (Input/output error)
[2013-09-22 23:55:13.364792] W [nfs3-helpers.c:3441:nfs3_log_write_res] 0-nfs-nfsv3: XID: bce83e4, WRITE: NFS: 5(I/O error), POSIX: 5(Input/output error), count: 0, UNSTABLE,wverf: 1379893029
[2013-09-22 23:55:13.364917] D [nfs3-helpers.c:1762:nfs3_log_rw_call] 0-nfs-nfsv3: XID: cce83e4, WRITE: args: FH: exportid 6704915b-8e60-4404-90f3-2567e74ec817, gfid 0fc1cd1c-8bb7-4fc4-929a-58316177a469, offset: 3497930752,  count: 65536, UNSTABLE
[2013-09-22 23:55:13.364960] W [nfs3.c:2109:nfs3svc_write_cbk] 0-nfs: cce83e4: /CentOS-6.3-x86_64-bin-DVD1.iso => -1 (Bad file descriptor)
[2013-09-22 23:55:13.364972] D [afr-lk-common.c:447:transaction_lk_op] 0-vol2-replicate-1: lk op is for a transaction
[2013-09-22 23:55:13.364984] D [nfs3-helpers.c:3441:nfs3_log_write_res] 0-nfs-nfsv3: XID: cce83e4, WRITE: NFS: 10007(Type not supported by the server), POSIX: 9(Bad file descriptor), count: 0, UNSTABLE,wverf: 1379893029
[2013-09-22 23:55:13.365116] D [nfs3-helpers.c:1762:nfs3_log_rw_call] 0-nfs-nfsv3: XID: dce83e4, WRITE: args: FH: exportid 6704915b-8e60-4404-90f3-2567e74ec817, gfid 0fc1cd1c-8bb7-4fc4-929a-58316177a469, offset: 3497996288,  count: 65536, UNSTABLE

Using dd to write a large file works with no issues.

I tried creating files using dd with following parameters and they work every time:
dd if=/dev/zero of=/mnt/tmp/testfile bs=1M count=102400 
dd if=/dev/zero of=/mnt/tmp/testfile bs=16k count=16384
dd if=/dev/zero of=/mnt/tmp/testfile bs=32k count=16384
dd if=/dev/zero of=/mnt/tmp/testfile bs=64k count=16384


Examples of the failure:
==========================
[root@testclient test]# cp -r /data/pxeimages/centos/6.3/CentOS-6.3-x86_64-bin-DVD1.iso .
cp: writing `./centos/6.3/CentOS-6.3-x86_64-bin-DVD1.iso': Unknown error 527


[root@testclient test]# cp -r /data/pxeimages/centos/6.3/CentOS-6.3-x86_64-bin-DVD1.iso .
cp: closing `./CentOS-6.3-x86_64-bin-DVD1.iso': Unknown error 527


Version-Release number of selected component (if applicable):
=================================================================
Gluster server: CentOS release 6.4 (Final)
Installed Gluster RPMs:
glusterfs-libs-3.4.0-8.el6.x86_64
glusterfs-server-3.4.0-8.el6.x86_64
glusterfs-cli-3.4.0-8.el6.x86_64
glusterfs-rdma-3.4.0-8.el6.x86_64
glusterfs-fuse-3.4.0-8.el6.x86_64
glusterfs-api-3.4.0-8.el6.x86_64
glusterfs-3.4.0-8.el6.x86_64
glusterfs-geo-replication-3.4.0-8.el6.x86_64
fuse-libs-2.8.3-4.el6.x86_64
fuse-2.8.3-4.el6.x86_64


How reproducible:
====================
Happens every time when cp is used to copy a large file (4G).


Steps to Reproduce:
======================
1. Mount gluster volume over nfs from client machine
2. Change to a directory on the mounted volume
3. cp large file from local disk to the nfs mount eg. "cp -r /tmp/CentOS-6.3-x86_64-bin-DVD1.iso ."
4. cp will fail after some time (10-15 seconds) with error Unknown error 527 either "writing" or "closing"


Actual results:
=================
cp returns Unknown error 527


Expected results:
===================
File is successfully copied without error


Additional info:
==================
gluster volume info vol2
 
Volume Name: vol2
Type: Distributed-Replicate
Volume ID: 6704915b-8e60-4404-90f3-2567e74ec817
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp,rdma
Bricks:
Brick1: storage01-prv1:/storage/brick3/exp1
Brick2: storage02-prv1:/storage/brick3/exp1
Brick3: storage01-prv1:/storage/brick4/exp1
Brick4: storage02-prv1:/storage/brick4/exp1

No errors in brick logs.

uname info for storage servers"
Linux storage1 2.6.32-279.22.1.el6.x86_64 #1 SMP Wed Feb 6 03:10:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Linux storage2 2.6.32-279.22.1.el6.x86_64 #1 SMP Wed Feb 6 03:10:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

uname info for nfs client machines:
Linux testclient 2.6.18-348.2.1.el5 #1 SMP Tue Mar 5 16:27:38 EST 2013 i686 i686 i386 GNU/Linux
Linux testclient1 2.6.18-308.4.1.el5 #1 SMP Tue Apr 17 17:08:00 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux

Client mounts over TCP via 1GB Ethernet with the following command:
mount -v -t nfs -o rw,vers=3,tcp storage:/vol2 /mnt/tmp

Comment 1 Jim 2013-09-23 00:16:50 UTC
Created attachment 801437 [details]
Log lines showing failure

Full log is too large to attach

Comment 2 santosh pradhan 2013-09-23 07:28:53 UTC
Hi Jim,
Could you just execute the following commands in the client (NFS client) machine and provide me the output:

i) sysctl -a |grep dirty
ii) nfsstat (after you notice the 'cp' command failed)

Could you please set the following parameter in the server (in case of 2x2: should be set in all the servers) and retry your 'cp' test. Just to check if RPC in-flight requests comes in the way.

sysctl -w sunrpc.tcp_max_slot_table_entries=224

Thanks in advance. 

-Santosh

Comment 3 santosh pradhan 2013-09-23 07:32:08 UTC
Make sure the disk is not full in backend/brick(s).

Comment 4 Jim 2013-09-23 19:22:12 UTC
[root@storage1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md2               95G  3.9G   86G   5% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/md0              194M   46M  139M  25% /boot
/dev/sdc              2.8T  129G  2.7T   5% /storage/brick1
/dev/sdd              2.8T  5.1G  2.8T   1% /storage/brick2
/dev/sde              2.8T   34M  2.8T   1% /storage/brick3
/dev/sdf              2.8T  6.3G  2.8T   1% /storage/brick4

Not full.

Comment 5 Jim 2013-09-23 19:34:14 UTC
Santosh, here is the output of the commands you requested:

From the client:
================
$ sysctl -a |grep dirty
vm.dirty_background_bytes = 0
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_writeback_centisecs = 500
vm.dirty_ratio = 40
vm.dirty_background_ratio = 10

# /usr/sbin/nfsstat
Server rpc stats:
calls      badcalls   badclnt    badauth    xdrcall
324672     0          0          0          0       

Server nfs v3:
null         getattr      setattr      lookup       access       readlink     
44        0% 78993    24% 9866      3% 75381    23% 27112     8% 36        0% 
read         write        create       mkdir        symlink      mknod        
9256      2% 79024    24% 10305     3% 1471      0% 0         0% 0         0% 
remove       rmdir        rename       link         readdir      readdirplus  
1972      0% 133       0% 0         0% 0         0% 963       0% 11761     3% 
fsstat       fsinfo       pathconf     commit       
3620      1% 72        0% 0         0% 13087     4% 

Server nfs v4:
null         compound     
10       13% 63       86% 

Server nfs v4 operations:
op0-unused   op1-unused   op2-future   access       close        commit       
0         0% 0         0% 0         0% 0         0% 0         0% 0         0% 
create       delegpurge   delegreturn  getattr      getfh        link         
0         0% 0         0% 0         0% 0         0% 30       19% 0         0% 
lock         lockt        locku        lookup       lookup_root  nverify      
0         0% 0         0% 0         0% 60       39% 0         0% 0         0% 
open         openattr     open_conf    open_dgrd    putfh        putpubfh     
0         0% 0         0% 0         0% 0         0% 0         0% 0         0% 
putrootfh    read         readdir      readlink     remove       rename       
63       41% 0         0% 0         0% 0         0% 0         0% 0         0% 
renew        restorefh    savefh       secinfo      setattr      setcltid     
0         0% 0         0% 0         0% 0         0% 0         0% 0         0% 
setcltidconf verify       write        rellockowner 
0         0% 0         0% 0         0% 0         0% 

Client rpc stats:
calls      retrans    authrefrsh
1567838    1          0       

Client nfs v3:
null         getattr      setattr      lookup       access       readlink     
0         0% 249       0% 25        0% 100       0% 190       0% 0         0% 
read         write        create       mkdir        symlink      mknod        
1702      0% 1565188  99% 36        0% 26        0% 0         0% 0         0% 
remove       rmdir        rename       link         readdir      readdirplus  
25        0% 17        0% 1         0% 0         0% 0         0% 145       0% 
fsstat       fsinfo       pathconf     commit       
50        0% 4         0% 0         0% 79        0% 

I set "sysctl -w sunrpc.tcp_max_slot_table_entries=224" on both servers and re-tried the test. I still see the same issue unfortunately.

Comment 6 santosh pradhan 2013-09-24 10:15:24 UTC
1) Did you unmount/mount the client(s) after setting tcp_max_slot_table_entries in the server side ?

2) What is the client memory size (cat /proc/meminfo) ?

If dd passes with I/O, I am not clear why cp would fail? Both should be doing same write operation with same caching. 

3) I am not clear about the config:

Brick1: storage01-prv1:/storage/brick3/exp1
Brick2: storage02-prv1:/storage/brick3/exp1
Brick3: storage01-prv1:/storage/brick4/exp1
Brick4: storage02-prv1:/storage/brick4/exp1

storage01-prv1 storage02-prv2 are different hosts? If yes, what is the below machine? server: client?

[root@storage1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md2               95G  3.9G   86G   5% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/md0              194M   46M  139M  25% /boot
/dev/sdc              2.8T  129G  2.7T   5% /storage/brick1
/dev/sdd              2.8T  5.1G  2.8T   1% /storage/brick2
/dev/sde              2.8T   34M  2.8T   1% /storage/brick3
/dev/sdf              2.8T  6.3G  2.8T   1% /storage/brick4

Is this a config issue?

I tried with 2x2 vol and cp/dd (with 4g file) both worked fine.

Comment 7 Jim 2013-09-25 06:31:05 UTC
Santosh,

Unmounting/remounting did not change the behavior.

The memory information on the 1st client machine is:
MemTotal:      8177464 kB
MemFree:         69212 kB
Buffers:         12188 kB
Cached:        5315272 kB
SwapCached:          0 kB
Active:        2523256 kB
Inactive:      5164576 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      8177464 kB
LowFree:         69212 kB
SwapTotal:    16383992 kB
SwapFree:     16383788 kB
Dirty:             300 kB
Writeback:           0 kB
AnonPages:     2360292 kB
Mapped:         120716 kB
Slab:           352788 kB
PageTables:      20328 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  20472724 kB
Committed_AS:  3585816 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    271024 kB
VmallocChunk: 34359466807 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

and on the second:
MemTotal:       774516 kB
MemFree:        346340 kB
Buffers:        141792 kB
Cached:         148840 kB
SwapCached:      33900 kB
Active:         251476 kB
Inactive:       126868 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       774516 kB
LowFree:        346340 kB
SwapTotal:     1572856 kB
SwapFree:      1517328 kB
Dirty:               4 kB
Writeback:           0 kB
AnonPages:       75676 kB
Mapped:          13444 kB
Slab:            39988 kB
PageTables:       1656 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   1960112 kB
Committed_AS:   192100 kB
VmallocTotal:   245752 kB
VmallocUsed:      5560 kB
VmallocChunk:   239784 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     4096 kB

There are two servers in the cluster, storage01 and storage02. The volume was created using the following command:

gluster volume create vol2 replica 2 transport tcp,rdma storage01-prv1:/storage/brick3/exp1 storage02-prv1:/storage/brick3/exp1 storage01-prv1:/storage/brick4/exp1 storage02-prv1:/storage/brick4/exp1

The gluster bricks are connected via infiniband and the client is connecting via tcp. 

The df output I provided was from the storage01 server. Both servers are identical with the exact same file system layout.

Comment 8 Jim 2013-11-03 02:32:50 UTC
Santosh, you asked if this is a configuration issue. How do I tell? Everything is working fine using the gluster native client. It's just gluster NFS client that is having issues.  I have not done changed any of the configuration from the default.

Thanks

Comment 9 santosh pradhan 2014-05-30 17:16:00 UTC
I am not working on this at the moment. Moving it to NEW state, so that other can look into it.

Comment 10 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.