Bug 1010747
Summary: | cp of large file from local disk to nfs mount fails with "Unknown error 527" | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Jim <jim> | ||||
Component: | nfs | Assignee: | bugs <bugs> | ||||
Status: | CLOSED EOL | QA Contact: | |||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | pre-release | CC: | bugs, gluster-bugs, jim, vagarwal | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-10-22 15:40:20 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jim
2013-09-23 00:12:22 UTC
Created attachment 801437 [details]
Log lines showing failure
Full log is too large to attach
Hi Jim, Could you just execute the following commands in the client (NFS client) machine and provide me the output: i) sysctl -a |grep dirty ii) nfsstat (after you notice the 'cp' command failed) Could you please set the following parameter in the server (in case of 2x2: should be set in all the servers) and retry your 'cp' test. Just to check if RPC in-flight requests comes in the way. sysctl -w sunrpc.tcp_max_slot_table_entries=224 Thanks in advance. -Santosh Make sure the disk is not full in backend/brick(s). [root@storage1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/md2 95G 3.9G 86G 5% / tmpfs 3.9G 0 3.9G 0% /dev/shm /dev/md0 194M 46M 139M 25% /boot /dev/sdc 2.8T 129G 2.7T 5% /storage/brick1 /dev/sdd 2.8T 5.1G 2.8T 1% /storage/brick2 /dev/sde 2.8T 34M 2.8T 1% /storage/brick3 /dev/sdf 2.8T 6.3G 2.8T 1% /storage/brick4 Not full. Santosh, here is the output of the commands you requested: From the client: ================ $ sysctl -a |grep dirty vm.dirty_background_bytes = 0 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_writeback_centisecs = 500 vm.dirty_ratio = 40 vm.dirty_background_ratio = 10 # /usr/sbin/nfsstat Server rpc stats: calls badcalls badclnt badauth xdrcall 324672 0 0 0 0 Server nfs v3: null getattr setattr lookup access readlink 44 0% 78993 24% 9866 3% 75381 23% 27112 8% 36 0% read write create mkdir symlink mknod 9256 2% 79024 24% 10305 3% 1471 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 1972 0% 133 0% 0 0% 0 0% 963 0% 11761 3% fsstat fsinfo pathconf commit 3620 1% 72 0% 0 0% 13087 4% Server nfs v4: null compound 10 13% 63 86% Server nfs v4 operations: op0-unused op1-unused op2-future access close commit 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% create delegpurge delegreturn getattr getfh link 0 0% 0 0% 0 0% 0 0% 30 19% 0 0% lock lockt locku lookup lookup_root nverify 0 0% 0 0% 0 0% 60 39% 0 0% 0 0% open openattr open_conf open_dgrd putfh putpubfh 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% putrootfh read readdir readlink remove rename 63 41% 0 0% 0 0% 0 0% 0 0% 0 0% renew restorefh savefh secinfo setattr setcltid 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% setcltidconf verify write rellockowner 0 0% 0 0% 0 0% 0 0% Client rpc stats: calls retrans authrefrsh 1567838 1 0 Client nfs v3: null getattr setattr lookup access readlink 0 0% 249 0% 25 0% 100 0% 190 0% 0 0% read write create mkdir symlink mknod 1702 0% 1565188 99% 36 0% 26 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 25 0% 17 0% 1 0% 0 0% 0 0% 145 0% fsstat fsinfo pathconf commit 50 0% 4 0% 0 0% 79 0% I set "sysctl -w sunrpc.tcp_max_slot_table_entries=224" on both servers and re-tried the test. I still see the same issue unfortunately. 1) Did you unmount/mount the client(s) after setting tcp_max_slot_table_entries in the server side ? 2) What is the client memory size (cat /proc/meminfo) ? If dd passes with I/O, I am not clear why cp would fail? Both should be doing same write operation with same caching. 3) I am not clear about the config: Brick1: storage01-prv1:/storage/brick3/exp1 Brick2: storage02-prv1:/storage/brick3/exp1 Brick3: storage01-prv1:/storage/brick4/exp1 Brick4: storage02-prv1:/storage/brick4/exp1 storage01-prv1 storage02-prv2 are different hosts? If yes, what is the below machine? server: client? [root@storage1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/md2 95G 3.9G 86G 5% / tmpfs 3.9G 0 3.9G 0% /dev/shm /dev/md0 194M 46M 139M 25% /boot /dev/sdc 2.8T 129G 2.7T 5% /storage/brick1 /dev/sdd 2.8T 5.1G 2.8T 1% /storage/brick2 /dev/sde 2.8T 34M 2.8T 1% /storage/brick3 /dev/sdf 2.8T 6.3G 2.8T 1% /storage/brick4 Is this a config issue? I tried with 2x2 vol and cp/dd (with 4g file) both worked fine. Santosh, Unmounting/remounting did not change the behavior. The memory information on the 1st client machine is: MemTotal: 8177464 kB MemFree: 69212 kB Buffers: 12188 kB Cached: 5315272 kB SwapCached: 0 kB Active: 2523256 kB Inactive: 5164576 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 8177464 kB LowFree: 69212 kB SwapTotal: 16383992 kB SwapFree: 16383788 kB Dirty: 300 kB Writeback: 0 kB AnonPages: 2360292 kB Mapped: 120716 kB Slab: 352788 kB PageTables: 20328 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 20472724 kB Committed_AS: 3585816 kB VmallocTotal: 34359738367 kB VmallocUsed: 271024 kB VmallocChunk: 34359466807 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB and on the second: MemTotal: 774516 kB MemFree: 346340 kB Buffers: 141792 kB Cached: 148840 kB SwapCached: 33900 kB Active: 251476 kB Inactive: 126868 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 774516 kB LowFree: 346340 kB SwapTotal: 1572856 kB SwapFree: 1517328 kB Dirty: 4 kB Writeback: 0 kB AnonPages: 75676 kB Mapped: 13444 kB Slab: 39988 kB PageTables: 1656 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 1960112 kB Committed_AS: 192100 kB VmallocTotal: 245752 kB VmallocUsed: 5560 kB VmallocChunk: 239784 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 4096 kB There are two servers in the cluster, storage01 and storage02. The volume was created using the following command: gluster volume create vol2 replica 2 transport tcp,rdma storage01-prv1:/storage/brick3/exp1 storage02-prv1:/storage/brick3/exp1 storage01-prv1:/storage/brick4/exp1 storage02-prv1:/storage/brick4/exp1 The gluster bricks are connected via infiniband and the client is connecting via tcp. The df output I provided was from the storage01 server. Both servers are identical with the exact same file system layout. Santosh, you asked if this is a configuration issue. How do I tell? Everything is working fine using the gluster native client. It's just gluster NFS client that is having issues. I have not done changed any of the configuration from the default. Thanks I am not working on this at the moment. Moving it to NEW state, so that other can look into it. pre-release version is ambiguous and about to be removed as a choice. If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it. |