Bug 762592 (GLUSTER-860) - GNFS is so slow to respond to client while running SFS2008 that sfs crashes.
Summary: GNFS is so slow to respond to client while running SFS2008 that sfs crashes.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-860
Product: GlusterFS
Classification: Community
Component: nfs
Version: nfs-beta
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Shehjar Tikoo
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-27 06:54 UTC by Prithu Tiwari
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTNR
Mount Type: nfs
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
sfslog (7.72 KB, application/octet-stream)
2010-04-27 05:08 UTC, Prithu Tiwari
no flags Details

Description Shehjar Tikoo 2010-04-27 04:00:38 UTC
Log copied to dev:/share/tickets/860

Comment 1 Prithu Tiwari 2010-04-27 05:08:47 UTC
Created attachment 187 [details]
yet another test program.

Comment 2 Shehjar Tikoo 2010-04-27 05:23:28 UTC
The gnfs trace log file does not hint at any timeout on the NFS side because the requests are getting served right till the last line of the log file. It is possible that the timeout referred to in the SFS log is a timeout for the sfsmanager and sfs client communication. We're going to test with a higher timeout value.

Comment 3 Prithu Tiwari 2010-04-27 06:54:52 UTC
We tried to run SFS2008 in one server one client mode. While for low load values 
(target IOPS in SFS) it runs though the CPU load of the glusterfs process is very high n the server-side.

Detailed set-up
US-server - brickX
GNFS - GlusterNFS-beta-rc1
export directory (/exports/gnfs   -  JBOD)
volfile :-

---------------------------------------------------------------------

volume localdisk-posix
        type storage/posix
        option directory /exports/gnfs
end-volume

volume localdisk-ac
        type features/access-control
        subvolumes localdisk-posix
end-volume

volume localdisk
        type features/locks
        subvolumes localdisk-ac
end-volume

volume brick
  type performance/io-threads
  option thread-count 8
  subvolumes localdisk
end-volume


volume nfsd
        type nfs/server
        subvolumes brick
        option rpc-auth.addr.allow *
end-volume

-------------------------------------------------------------------------


US-clientYY

SFS2008 - patched SFS to use mount-proto "tcp".

sfs_nfs_rc file as follows 

--------------------------------------------------------------------------


##############################################################################
#
#       @(#)sfs_nfs_rc  $Revision: 1.13 $
#
# Specify SFS parameters for sfs runs in this file.
#
# The following parameters are configurable within the SFS run and
# reporting rules.
#
# See below for details.
#
# Example shows an NFS V3 run of 100 to 1000 ops/sec
#
LOAD="1000"
INCR_LOAD=1000
NUM_RUNS=4
PROCS=1
CLIENTS="client08"
MNT_POINTS="brick5:/brick"
BIOD_MAX_WRITES=2
BIOD_MAX_READS=2
IPV6_ENABLE="off"
FS_PROTOCOL="nfs"
SFS_DIR="/home/prithu/sfs/bin"
SUFFIX=""
WORK_DIR="result"
PRIME_MON_SCRIPT=""
PRIME_MON_ARGS=""
INIT_TIMEOUT=8000
# Leaving BLOCK_SIZE un-set is the default. This will permit auto-negotiation.
# If you over-ride this and set it to a particular value, you must
# add the value that you used to the Other Notes section of the 
# submission/disclosure.
BLOCK_SIZE=
# SFS_NFS_USER_ID only needed if running NFS load on Windows client 
# It's value should match the UID, of the user's account, on the NFS server.
SFS_NFS_USER_ID=500
# SFS_NFS_GROUP_ID only needed if running NFS load on Windows client 
# It's value should match the GID, of the user's account, on the NFS server.
SFS_NFS_GROUP_ID=500
#
# The following parameters are strictly defined within the SFS
# run and reporting rules and may not be changed.
#
RUNTIME=300
WARMUP_TIME=300
MIXFILE=""
ACCESS_PCNT=30
APPEND_PCNT=70
BLOCK_FILE=""
DIR_COUNT=30
FILE_COUNT=
SYMLINK_COUNT=20
TCP="on"
#
# The following parameters are useful for debugging or general system
# tuning.  They may not be used during during a reportable SFS run.
#
DEBUG=""
DUMP=
POPULATE=
LAT_GRAPH=
PRIME_SLEEP=0
PRIME_TIMEOUT=0

----------------------------------------------------------------------------
rest is comment
----------------------------------------------------------------------------
----------------------------------------------------------------------------

sfs run command

java SfsManager -r sfs_nfs_rc -s junk


The sfs out-put file is attached it can be seen it crashed due to response time-out.

The trace-log of the server-side is at dev.gluster.com at ~prithu/gntr.l.bz2.

Comment 4 Shehjar Tikoo 2010-04-28 04:53:10 UTC
Prithu has verified that the run finishes after increasing the timeout value for SFS. Closing.


Note You need to log in before you can comment on or make changes to this bug.