1010747 – cp of large file from local disk to nfs mount fails with "Unknown error 527"

Bug 1010747 - cp of large file from local disk to nfs mount fails with "Unknown error 527"

Summary: cp of large file from local disk to nfs mount fails with "Unknown error 527"

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	nfs
Sub Component:
Version:	pre-release
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-09-23 00:12 UTC by Jim
Modified:	2015-10-22 15:40 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-10-22 15:40:20 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Log lines showing failure (112.38 KB, text/plain) 2013-09-23 00:16 UTC, Jim	no flags	Details
View All

Description Jim 2013-09-23 00:12:22 UTC

Description of problem:
==========================
cp of large file from local disk to nfs mounted gluster volume fails with  "Unknown error 527". nfs.log on server with DEBUG enabled shows:

[2013-09-22 23:55:13.364112] D [nfs3-helpers.c:1762:nfs3_log_rw_call] 0-nfs-nfsv3: XID: 9ce83e4, WRITE: args: FH: exportid 6704915b-8e60-4404-90f3-2567e74ec817, gfid 0fc1cd1c-8bb7-4fc4-929a
-58316177a469, offset: 3497734144,  count: 65536, UNSTABLE
[2013-09-22 23:55:13.364157] D [nfs3-helpers.c:3441:nfs3_log_write_res] 0-nfs-nfsv3: XID: 9ce83e4, WRITE: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 65536, UNSTABLE,wve
rf: 1379893029
[2013-09-22 23:55:13.364207] D [afr-transaction.c:1094:afr_post_nonblocking_inodelk_cbk] 0-vol2-replicate-1: Non blocking inodelks done. Proceeding to FOP
[2013-09-22 23:55:13.364528] D [nfs3-helpers.c:1762:nfs3_log_rw_call] 0-nfs-nfsv3: XID: ace83e4, WRITE: args: FH: exportid 6704915b-8e60-4404-90f3-2567e74ec817, gfid 0fc1cd1c-8bb7-4fc4-929a
-58316177a469, offset: 3497799680,  count: 65536, UNSTABLE
[2013-09-22 23:55:13.364649] D [nfs3-helpers.c:3441:nfs3_log_write_res] 0-nfs-nfsv3: XID: ace83e4, WRITE: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 65536, UNSTABLE,wve
rf: 1379893029
[2013-09-22 23:55:13.364748] D [nfs3-helpers.c:1762:nfs3_log_rw_call] 0-nfs-nfsv3: XID: bce83e4, WRITE: args: FH: exportid 6704915b-8e60-4404-90f3-2567e74ec817, gfid 0fc1cd1c-8bb7-4fc4-929a
-58316177a469, offset: 3497865216,  count: 65536, UNSTABLE
[2013-09-22 23:55:13.364778] W [nfs3.c:2109:nfs3svc_write_cbk] 0-nfs: bce83e4: /CentOS-6.3-x86_64-bin-DVD1.iso => -1 (Input/output error)
[2013-09-22 23:55:13.364792] W [nfs3-helpers.c:3441:nfs3_log_write_res] 0-nfs-nfsv3: XID: bce83e4, WRITE: NFS: 5(I/O error), POSIX: 5(Input/output error), count: 0, UNSTABLE,wverf: 1379893029
[2013-09-22 23:55:13.364917] D [nfs3-helpers.c:1762:nfs3_log_rw_call] 0-nfs-nfsv3: XID: cce83e4, WRITE: args: FH: exportid 6704915b-8e60-4404-90f3-2567e74ec817, gfid 0fc1cd1c-8bb7-4fc4-929a-58316177a469, offset: 3497930752,  count: 65536, UNSTABLE
[2013-09-22 23:55:13.364960] W [nfs3.c:2109:nfs3svc_write_cbk] 0-nfs: cce83e4: /CentOS-6.3-x86_64-bin-DVD1.iso => -1 (Bad file descriptor)
[2013-09-22 23:55:13.364972] D [afr-lk-common.c:447:transaction_lk_op] 0-vol2-replicate-1: lk op is for a transaction
[2013-09-22 23:55:13.364984] D [nfs3-helpers.c:3441:nfs3_log_write_res] 0-nfs-nfsv3: XID: cce83e4, WRITE: NFS: 10007(Type not supported by the server), POSIX: 9(Bad file descriptor), count: 0, UNSTABLE,wverf: 1379893029
[2013-09-22 23:55:13.365116] D [nfs3-helpers.c:1762:nfs3_log_rw_call] 0-nfs-nfsv3: XID: dce83e4, WRITE: args: FH: exportid 6704915b-8e60-4404-90f3-2567e74ec817, gfid 0fc1cd1c-8bb7-4fc4-929a-58316177a469, offset: 3497996288,  count: 65536, UNSTABLE

Using dd to write a large file works with no issues.

I tried creating files using dd with following parameters and they work every time:
dd if=/dev/zero of=/mnt/tmp/testfile bs=1M count=102400 
dd if=/dev/zero of=/mnt/tmp/testfile bs=16k count=16384
dd if=/dev/zero of=/mnt/tmp/testfile bs=32k count=16384
dd if=/dev/zero of=/mnt/tmp/testfile bs=64k count=16384


Examples of the failure:
==========================
[root@testclient test]# cp -r /data/pxeimages/centos/6.3/CentOS-6.3-x86_64-bin-DVD1.iso .
cp: writing `./centos/6.3/CentOS-6.3-x86_64-bin-DVD1.iso': Unknown error 527


[root@testclient test]# cp -r /data/pxeimages/centos/6.3/CentOS-6.3-x86_64-bin-DVD1.iso .
cp: closing `./CentOS-6.3-x86_64-bin-DVD1.iso': Unknown error 527


Version-Release number of selected component (if applicable):
=================================================================
Gluster server: CentOS release 6.4 (Final)
Installed Gluster RPMs:
glusterfs-libs-3.4.0-8.el6.x86_64
glusterfs-server-3.4.0-8.el6.x86_64
glusterfs-cli-3.4.0-8.el6.x86_64
glusterfs-rdma-3.4.0-8.el6.x86_64
glusterfs-fuse-3.4.0-8.el6.x86_64
glusterfs-api-3.4.0-8.el6.x86_64
glusterfs-3.4.0-8.el6.x86_64
glusterfs-geo-replication-3.4.0-8.el6.x86_64
fuse-libs-2.8.3-4.el6.x86_64
fuse-2.8.3-4.el6.x86_64


How reproducible:
====================
Happens every time when cp is used to copy a large file (4G).


Steps to Reproduce:
======================
1. Mount gluster volume over nfs from client machine
2. Change to a directory on the mounted volume
3. cp large file from local disk to the nfs mount eg. "cp -r /tmp/CentOS-6.3-x86_64-bin-DVD1.iso ."
4. cp will fail after some time (10-15 seconds) with error Unknown error 527 either "writing" or "closing"


Actual results:
=================
cp returns Unknown error 527


Expected results:
===================
File is successfully copied without error


Additional info:
==================
gluster volume info vol2
 
Volume Name: vol2
Type: Distributed-Replicate
Volume ID: 6704915b-8e60-4404-90f3-2567e74ec817
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp,rdma
Bricks:
Brick1: storage01-prv1:/storage/brick3/exp1
Brick2: storage02-prv1:/storage/brick3/exp1
Brick3: storage01-prv1:/storage/brick4/exp1
Brick4: storage02-prv1:/storage/brick4/exp1

No errors in brick logs.

uname info for storage servers"
Linux storage1 2.6.32-279.22.1.el6.x86_64 #1 SMP Wed Feb 6 03:10:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Linux storage2 2.6.32-279.22.1.el6.x86_64 #1 SMP Wed Feb 6 03:10:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

uname info for nfs client machines:
Linux testclient 2.6.18-348.2.1.el5 #1 SMP Tue Mar 5 16:27:38 EST 2013 i686 i686 i386 GNU/Linux
Linux testclient1 2.6.18-308.4.1.el5 #1 SMP Tue Apr 17 17:08:00 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux

Client mounts over TCP via 1GB Ethernet with the following command:
mount -v -t nfs -o rw,vers=3,tcp storage:/vol2 /mnt/tmp

Comment 1 Jim 2013-09-23 00:16:50 UTC

Created attachment 801437 [details]
Log lines showing failure

Full log is too large to attach

Comment 2 santosh pradhan 2013-09-23 07:28:53 UTC

Hi Jim,
Could you just execute the following commands in the client (NFS client) machine and provide me the output:

i) sysctl -a |grep dirty
ii) nfsstat (after you notice the 'cp' command failed)

Could you please set the following parameter in the server (in case of 2x2: should be set in all the servers) and retry your 'cp' test. Just to check if RPC in-flight requests comes in the way.

sysctl -w sunrpc.tcp_max_slot_table_entries=224

Thanks in advance. 

-Santosh

Comment 3 santosh pradhan 2013-09-23 07:32:08 UTC

Make sure the disk is not full in backend/brick(s).

Comment 4 Jim 2013-09-23 19:22:12 UTC

[root@storage1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md2               95G  3.9G   86G   5% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/md0              194M   46M  139M  25% /boot
/dev/sdc              2.8T  129G  2.7T   5% /storage/brick1
/dev/sdd              2.8T  5.1G  2.8T   1% /storage/brick2
/dev/sde              2.8T   34M  2.8T   1% /storage/brick3
/dev/sdf              2.8T  6.3G  2.8T   1% /storage/brick4

Not full.

Comment 5 Jim 2013-09-23 19:34:14 UTC

Santosh, here is the output of the commands you requested:

From the client:
================
$ sysctl -a |grep dirty
vm.dirty_background_bytes = 0
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_writeback_centisecs = 500
vm.dirty_ratio = 40
vm.dirty_background_ratio = 10

# /usr/sbin/nfsstat
Server rpc stats:
calls      badcalls   badclnt    badauth    xdrcall
324672     0          0          0          0       

Server nfs v3:
null         getattr      setattr      lookup       access       readlink     
44        0% 78993    24% 9866      3% 75381    23% 27112     8% 36        0% 
read         write        create       mkdir        symlink      mknod        
9256      2% 79024    24% 10305     3% 1471      0% 0         0% 0         0% 
remove       rmdir        rename       link         readdir      readdirplus  
1972      0% 133       0% 0         0% 0         0% 963       0% 11761     3% 
fsstat       fsinfo       pathconf     commit       
3620      1% 72        0% 0         0% 13087     4% 

Server nfs v4:
null         compound     
10       13% 63       86% 

Server nfs v4 operations:
op0-unused   op1-unused   op2-future   access       close        commit       
0         0% 0         0% 0         0% 0         0% 0         0% 0         0% 
create       delegpurge   delegreturn  getattr      getfh        link         
0         0% 0         0% 0         0% 0         0% 30       19% 0         0% 
lock         lockt        locku        lookup       lookup_root  nverify      
0         0% 0         0% 0         0% 60       39% 0         0% 0         0% 
open         openattr     open_conf    open_dgrd    putfh        putpubfh     
0         0% 0         0% 0         0% 0         0% 0         0% 0         0% 
putrootfh    read         readdir      readlink     remove       rename       
63       41% 0         0% 0         0% 0         0% 0         0% 0         0% 
renew        restorefh    savefh       secinfo      setattr      setcltid     
0         0% 0         0% 0         0% 0         0% 0         0% 0         0% 
setcltidconf verify       write        rellockowner 
0         0% 0         0% 0         0% 0         0% 

Client rpc stats:
calls      retrans    authrefrsh
1567838    1          0       

Client nfs v3:
null         getattr      setattr      lookup       access       readlink     
0         0% 249       0% 25        0% 100       0% 190       0% 0         0% 
read         write        create       mkdir        symlink      mknod        
1702      0% 1565188  99% 36        0% 26        0% 0         0% 0         0% 
remove       rmdir        rename       link         readdir      readdirplus  
25        0% 17        0% 1         0% 0         0% 0         0% 145       0% 
fsstat       fsinfo       pathconf     commit       
50        0% 4         0% 0         0% 79        0% 

I set "sysctl -w sunrpc.tcp_max_slot_table_entries=224" on both servers and re-tried the test. I still see the same issue unfortunately.

Comment 6 santosh pradhan 2013-09-24 10:15:24 UTC

1) Did you unmount/mount the client(s) after setting tcp_max_slot_table_entries in the server side ?

2) What is the client memory size (cat /proc/meminfo) ?

If dd passes with I/O, I am not clear why cp would fail? Both should be doing same write operation with same caching. 

3) I am not clear about the config:

Brick1: storage01-prv1:/storage/brick3/exp1
Brick2: storage02-prv1:/storage/brick3/exp1
Brick3: storage01-prv1:/storage/brick4/exp1
Brick4: storage02-prv1:/storage/brick4/exp1

storage01-prv1 storage02-prv2 are different hosts? If yes, what is the below machine? server: client?

[root@storage1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md2               95G  3.9G   86G   5% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/md0              194M   46M  139M  25% /boot
/dev/sdc              2.8T  129G  2.7T   5% /storage/brick1
/dev/sdd              2.8T  5.1G  2.8T   1% /storage/brick2
/dev/sde              2.8T   34M  2.8T   1% /storage/brick3
/dev/sdf              2.8T  6.3G  2.8T   1% /storage/brick4

Is this a config issue?

I tried with 2x2 vol and cp/dd (with 4g file) both worked fine.

Comment 7 Jim 2013-09-25 06:31:05 UTC

Santosh,

Unmounting/remounting did not change the behavior.

The memory information on the 1st client machine is:
MemTotal:      8177464 kB
MemFree:         69212 kB
Buffers:         12188 kB
Cached:        5315272 kB
SwapCached:          0 kB
Active:        2523256 kB
Inactive:      5164576 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      8177464 kB
LowFree:         69212 kB
SwapTotal:    16383992 kB
SwapFree:     16383788 kB
Dirty:             300 kB
Writeback:           0 kB
AnonPages:     2360292 kB
Mapped:         120716 kB
Slab:           352788 kB
PageTables:      20328 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  20472724 kB
Committed_AS:  3585816 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    271024 kB
VmallocChunk: 34359466807 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

and on the second:
MemTotal:       774516 kB
MemFree:        346340 kB
Buffers:        141792 kB
Cached:         148840 kB
SwapCached:      33900 kB
Active:         251476 kB
Inactive:       126868 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       774516 kB
LowFree:        346340 kB
SwapTotal:     1572856 kB
SwapFree:      1517328 kB
Dirty:               4 kB
Writeback:           0 kB
AnonPages:       75676 kB
Mapped:          13444 kB
Slab:            39988 kB
PageTables:       1656 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   1960112 kB
Committed_AS:   192100 kB
VmallocTotal:   245752 kB
VmallocUsed:      5560 kB
VmallocChunk:   239784 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     4096 kB

There are two servers in the cluster, storage01 and storage02. The volume was created using the following command:

gluster volume create vol2 replica 2 transport tcp,rdma storage01-prv1:/storage/brick3/exp1 storage02-prv1:/storage/brick3/exp1 storage01-prv1:/storage/brick4/exp1 storage02-prv1:/storage/brick4/exp1

The gluster bricks are connected via infiniband and the client is connecting via tcp. 

The df output I provided was from the storage01 server. Both servers are identical with the exact same file system layout.

Comment 8 Jim 2013-11-03 02:32:50 UTC

Santosh, you asked if this is a configuration issue. How do I tell? Everything is working fine using the gluster native client. It's just gluster NFS client that is having issues.  I have not done changed any of the configuration from the default.

Thanks

Comment 9 santosh pradhan 2014-05-30 17:16:00 UTC

I am not working on this at the moment. Moving it to NEW state, so that other can look into it.

Comment 10 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC

pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.

Note You need to log in before you can comment on or make changes to this bug.