Bug 906238 - glusterfs client hang when parallel operate the same dir
Summary: glusterfs client hang when parallel operate the same dir
Keywords:
Status: NEW
Alias: None
Product: Gluster-Documentation
Classification: Community
Component: Other
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Anjana Suparna Sriram
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-31 09:38 UTC by cailiang.song
Modified: 2023-01-04 04:36 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
gluster volume info, dump file, glusterfsd log (953.43 KB, application/octet-stream)
2013-01-31 09:38 UTC, cailiang.song
no flags Details

Description cailiang.song 2013-01-31 09:38:52 UTC
Created attachment 690802 [details]
gluster volume info, dump file, glusterfsd log

Description of problem:

Recently, Glusterfs will hang when we do stress testing. To locate the reason, we code a test script and run the script on 5 servers at the same time. For a moment, some test process is hanged. The test script is as following, "/mnt/gfs28" is mount point of volume gfs28.

for((i=1;i<=300;i++));
do 
  mkdir /mnt/gfs28/songcl/b$i
  if [ "$?" == "0" ];
  then 
  echo "create dir success"
fi

echo "1111" >>/mnt/gfs28/songcl/b$i/001.txt
echo "2222" >>/mnt/gfs28/songcl/b$i/002.txt
echo "3333" >>/mnt/gfs28/songcl/b$i/003.txt

done

echo "finish create all dirs"


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Create gluster volume "gfs28", the volume type is DHT+AFR.
2.Native mount the volume gfs28 to "/mnt/gfs28" on 5 servers.
3.Run the above script on 5 servers.
  
Actual results:
Some test process is hanged.

Expected results:


Additional info:
I use "kill -USR1 <hanged glusterfs client process ID>" to dump info and find that "gfs28-replicate-5" maybe be hanged. Then, I dump glusterfsd info of "Brick16: 10.1.10.188:/xmail/disk2/gfs28" and find the "/xmail/disk2/gfs28/songcl/b83/003.txt" is opened two times by "ls -asl /proce/pid/fd" command. Maybe this file is deadlocked according to corresponding glusterfsd log:
[2013-01-31 13:42:20.927077] T [rpcsvc.c:187:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterFS 3.2.7 - INODELK
[2013-01-31 13:42:20.927090] T [server-resolve.c:127:resolve_loc_touchup] 0-gfs28-server: return value inode_path 11
[2013-01-31 13:42:20.927104] T [common.c:103:get_domain] 0-posix-locks: Domain gfs28-replicate-5 found
[2013-01-31 13:42:20.927113] T [inodelk.c:218:__lock_inodelk] 0-gfs28-locks: Lock (pid=1059928640) lk-owner:140197382404672 9223372036854775806 - 0 => Blocked
[2013-01-31 13:42:20.927123] T [inodelk.c:486:pl_inode_setlk] 0-gfs28-locks: Lock (pid=1059928640) (lk-owner=140197382404672) 9223372036854775806 - 0 => NOK
[2013-01-31 13:42:20.927132] T [inodelk.c:218:__lock_inodelk] 0-gfs28-locks: Lock (pid=1059928640) lk-owner:140197382404672 9223372036854775806 - 0 => Blocked
[2013-01-31 13:42:20.933429] T [rpcsvc.c:443:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 987

For more information, please refer to attachment.

1. PID:6988 is the hanged glusterfs client dump file.
2. PID:31100 is the glusterfsd dump file of "Brick16".
3. 188-xmail-disk2-gfs28.log.splitab is the glusterfsd log of "Brick16".

Comment 1 Pranith Kumar K 2013-03-16 02:16:07 UTC
hi cailiang.song,
   I performed the same tests on 3.3.1 and latest upstream code. If found another bug https://bugzilla.redhat.com/show_bug.cgi?id=922292 while testing this on upstream but both of the versions ran without any hangs. I am not sure if we are going to make any more releases on 3.2.x. Let me know how I can help you resolve this issue.

Pranith.


Note You need to log in before you can comment on or make changes to this bug.