The bigfile test in the Connectathon test suite hangs the Gluster client: root@ns224055:/gluster/cthon# while :; do ./bigfile /gluster/foo ; date ; done Sun Oct 2 15:27:29 CEST 2011 Sun Oct 2 15:27:30 CEST 2011 Sun Oct 2 15:27:32 CEST 2011 Sun Oct 2 15:27:33 CEST 2011 Sun Oct 2 15:27:34 CEST 2011 Sun Oct 2 15:27:35 CEST 2011 Sun Oct 2 15:27:36 CEST 2011 Sun Oct 2 15:27:38 CEST 2011 Sun Oct 2 15:27:39 CEST 2011 Sun Oct 2 15:27:40 CEST 2011 Sun Oct 2 15:27:41 CEST 2011 Sun Oct 2 15:27:42 CEST 2011 <hangs> In the logs: [2011-10-02 15:20:55.108447] I [rpc-clnt.c:1551:rpc_clnt_reconfig] 0-vol1-client-0: changing port to 24009 (from 0) [2011-10-02 15:20:59.38207] I [client-handshake.c:1085:select_server_supported_programs] 0-vol1-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310) [2011-10-02 15:20:59.38793] I [client-handshake.c:917:client_setvolume_cbk] 0-vol1-client-0: Connected to 46.105.115.86:24009, attached to remote volume '/gluster-vol1'. [2011-10-02 15:20:59.45684] I [fuse-bridge.c:3340:fuse_graph_setup] 0-fuse: switched to graph 0 [2011-10-02 15:20:59.45903] I [fuse-bridge.c:2924:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.16 [2011-10-02 15:27:43.759449] W [client3_1-fops.c:77:client_submit_vec_request] 0-vol1-client-0: cannot add iobuf into iobref [2011-10-02 15:27:43.759528] W [client3_1-fops.c:77:client_submit_vec_request] 0-vol1-client-0: cannot add iobuf into iobref [2011-10-02 15:27:43.759553] W [client3_1-fops.c:77:client_submit_vec_request] 0-vol1-client-0: cannot add iobuf into iobref [2011-10-02 15:27:43.759587] W [client3_1-fops.c:77:client_submit_vec_request] 0-vol1-client-0: cannot add iobuf into iobref <end of logs> The client mount must be killed to unblock the test process. The test bed is Ubuntu server 10.04, Gluster 3.2.3 (built from source, no patch). The test suite was obtained from git://fedorapeople.org/~steved/cthon04 The bigfile test writes and read a 30MB file, first with read/write syscalls, then with mmap and memory accesses. The hang disappears if I #undef MMAP in bigfile.c.
Can you share output of 'gluster volume info'?
(In reply to comment #1) > Can you share output of 'gluster volume info'? I had some changes from the default config in vol1, so I created a vol2 from scratch which also reproduces the problem. root@ns224053:~# gluster volume create vol2 glu1:/home/gluster-vol2 Creation of volume vol2 has been successful. Please start the volume to access data. root@ns224053:~# gluster volume start vol2 Starting volume vol2 has been successful root@ns224053:~# gluster volume info vol2 Volume Name: vol2 Type: Distribute Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: glu1:/home/gluster-vol2 Server config: 1: volume vol2-posix 2: type storage/posix 3: option directory /home/gluster-vol2 4: end-volume 5: 6: volume vol2-access-control 7: type features/access-control 8: subvolumes vol2-posix 9: end-volume 10: 11: volume vol2-locks 12: type features/locks 13: subvolumes vol2-access-control 14: end-volume 15: 16: volume vol2-io-threads 17: type performance/io-threads 18: subvolumes vol2-locks 19: end-volume 20: 21: volume vol2-marker 22: type features/marker 23: option volume-uuid 7158b154-cd90-4049-8048-668f2e9ba769 24: option timestamp-file /etc/glusterd/vols/vol2/marker.tstamp 25: option xtime off 26: option quota off 27: subvolumes vol2-io-threads 28: end-volume 29: 30: volume /home/gluster-vol2 31: type debug/io-stats 32: option latency-measurement off 33: option count-fop-hits off 34: subvolumes vol2-marker 35: end-volume 36: 37: volume vol2-server 38: type protocol/server 39: option transport-type tcp 40: option auth.addr./home/gluster-vol2.allow * 41: subvolumes /home/gluster-vol2 42: end-volume Client config: 1: volume vol2-client-0 2: type protocol/client 3: option remote-host glu1 4: option remote-subvolume /home/gluster-vol2 5: option transport-type tcp 6: end-volume 7: 8: volume vol2-write-behind 9: type performance/write-behind 10: subvolumes vol2-client-0 11: end-volume 12: 13: volume vol2-read-ahead 14: type performance/read-ahead 15: subvolumes vol2-write-behind 16: end-volume 17: 18: volume vol2-io-cache 19: type performance/io-cache 20: subvolumes vol2-read-ahead 21: end-volume 22: 23: volume vol2-quick-read 24: type performance/quick-read 25: subvolumes vol2-io-cache 26: end-volume 27: 28: volume vol2-stat-prefetch 29: type performance/stat-prefetch 30: subvolumes vol2-quick-read 31: end-volume 32: 33: volume vol2 34: type debug/io-stats 35: option latency-measurement off 36: option count-fop-hits off 37: subvolumes vol2-stat-prefetch 38: end-volume The client is mounted with: root@ns224055:~# mount -t glusterfs glu1:/vol2 /gluster2
To run the test: root@ns224055:~# git clone git://fedorapeople.org/~steved/cthon04 Initialized empty Git repository in /root/cthon04/.git/ remote: Counting objects: 256, done. remote: Compressing objects: 100% (255/255), done. remote: Total 256 (delta 139), reused 0 (delta 0) Receiving objects: 100% (256/256), 124.98 KiB, done. Resolving deltas: 100% (139/139), done. root@ns224055:~# cd cthon04/special/ root@ns224055:~/cthon04/special# make bigfile cd ../basic; make subr.o make[1]: Entering directory `/root/cthon04/basic' cc `echo -DLINUX -DGLIBC=22 -DMMAP -DSTDARG` -c -o subr.o subr.c make[1]: Leaving directory `/root/cthon04/basic' cc `echo -DLINUX -DGLIBC=22 -DMMAP -DSTDARG` -o bigfile bigfile.c ../basic/subr.o `echo -lnsl` root@ns224055:~/cthon04/special# while : ; do ./bigfile /gluster2/foo || break ; date ; done
Thanks for the proper description. We could reproduce the issue in-house with your code. Will try to address this asap. Regards, Amar
Hi Jean, Please try with the patch: http://review.gluster.com/555 It did solve the issue for me while using bigfile
(In reply to comment #5) > Hi Jean, > > Please try with the patch: http://review.gluster.com/555 > > It did solve the issue for me while using bigfile Can you please attach the patch? For some reason, I can only browse the changes, not download a patch.
Created attachment 683 With this patch, things worked for me (it is based on master branch)
(In reply to comment #7) > With this patch, things worked for me (it is based on master branch) I backported the patch to v3.2.3, and can no longer reproduce the problem. Thanks! Also, if I may suggest, the Connectathon test suite is a useful tool for regression testing: $ cd cthon04 $ make ... $ ./runtests -a -f /gluster/foo I found this bug by running a loop of torture tests which includes this test suite.
Sure. We will integrate this in our testing suite.
CHANGE: http://review.gluster.com/555 (earlier it was hardcoded to 8, now increased the size to 16.) merged in master by Vijay Bellur (vijay)
CHANGE: http://review.gluster.com/570 (earlier it was hardcoded to 8, now increased the size to 16.) merged in release-3.2 by Vijay Bellur (vijay)
I do not see any hangs with 3.2.5qa4 on RHEL-6.1