Bug 765411 (GLUSTER-3679) - cannot add iobuf into iobref during mmap test
Summary: cannot add iobuf into iobref during mmap test
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3679
Product: GlusterFS
Classification: Community
Component: protocol
Version: 3.2.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-02 13:50 UTC by Jean-Marc Saffroy
Modified: 2013-12-19 00:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: 3.2.5qa4
Embargoed:


Attachments (Terms of Use)
Patch to fix the issue (3.67 KB, patch)
2011-10-04 01:36 UTC, Amar Tumballi
no flags Details | Diff

Description Jean-Marc Saffroy 2011-10-02 13:50:03 UTC
The bigfile test in the Connectathon test suite hangs the Gluster client:

root@ns224055:/gluster/cthon# while :; do ./bigfile /gluster/foo ; date ; done
Sun Oct  2 15:27:29 CEST 2011
Sun Oct  2 15:27:30 CEST 2011
Sun Oct  2 15:27:32 CEST 2011
Sun Oct  2 15:27:33 CEST 2011
Sun Oct  2 15:27:34 CEST 2011
Sun Oct  2 15:27:35 CEST 2011
Sun Oct  2 15:27:36 CEST 2011
Sun Oct  2 15:27:38 CEST 2011
Sun Oct  2 15:27:39 CEST 2011
Sun Oct  2 15:27:40 CEST 2011
Sun Oct  2 15:27:41 CEST 2011
Sun Oct  2 15:27:42 CEST 2011
<hangs>

In the logs:

[2011-10-02 15:20:55.108447] I [rpc-clnt.c:1551:rpc_clnt_reconfig] 0-vol1-client-0: changing port to 24009 (from 0)
[2011-10-02 15:20:59.38207] I [client-handshake.c:1085:select_server_supported_programs] 0-vol1-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-10-02 15:20:59.38793] I [client-handshake.c:917:client_setvolume_cbk] 0-vol1-client-0: Connected to 46.105.115.86:24009, attached to remote volume '/gluster-vol1'.
[2011-10-02 15:20:59.45684] I [fuse-bridge.c:3340:fuse_graph_setup] 0-fuse: switched to graph 0
[2011-10-02 15:20:59.45903] I [fuse-bridge.c:2924:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.16
[2011-10-02 15:27:43.759449] W [client3_1-fops.c:77:client_submit_vec_request] 0-vol1-client-0: cannot add iobuf into iobref
[2011-10-02 15:27:43.759528] W [client3_1-fops.c:77:client_submit_vec_request] 0-vol1-client-0: cannot add iobuf into iobref
[2011-10-02 15:27:43.759553] W [client3_1-fops.c:77:client_submit_vec_request] 0-vol1-client-0: cannot add iobuf into iobref
[2011-10-02 15:27:43.759587] W [client3_1-fops.c:77:client_submit_vec_request] 0-vol1-client-0: cannot add iobuf into iobref
<end of logs>

The client mount must be killed to unblock the test process.

The test bed is Ubuntu server 10.04, Gluster 3.2.3 (built from source, no patch). The test suite was obtained from git://fedorapeople.org/~steved/cthon04

The bigfile test writes and read a 30MB file, first with read/write syscalls, then with mmap and memory accesses. The hang disappears if I #undef MMAP in bigfile.c.

Comment 1 Amar Tumballi 2011-10-02 23:00:38 UTC
Can you share output of 'gluster volume info'?

Comment 2 Jean-Marc Saffroy 2011-10-03 06:07:24 UTC
(In reply to comment #1)
> Can you share output of 'gluster volume info'?

I had some changes from the default config in vol1, so I created a vol2 from scratch which also reproduces the problem.

root@ns224053:~# gluster volume create vol2 glu1:/home/gluster-vol2
Creation of volume vol2 has been successful. Please start the volume to access data.

root@ns224053:~# gluster volume start vol2
Starting volume vol2 has been successful

root@ns224053:~# gluster volume info vol2

Volume Name: vol2
Type: Distribute
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: glu1:/home/gluster-vol2

Server config:
  1: volume vol2-posix
  2:     type storage/posix
  3:     option directory /home/gluster-vol2
  4: end-volume
  5:
  6: volume vol2-access-control
  7:     type features/access-control
  8:     subvolumes vol2-posix
  9: end-volume
 10:
 11: volume vol2-locks
 12:     type features/locks
 13:     subvolumes vol2-access-control
 14: end-volume
 15:
 16: volume vol2-io-threads
 17:     type performance/io-threads
 18:     subvolumes vol2-locks
 19: end-volume
 20:
 21: volume vol2-marker
 22:     type features/marker
 23:     option volume-uuid 7158b154-cd90-4049-8048-668f2e9ba769
 24:     option timestamp-file /etc/glusterd/vols/vol2/marker.tstamp
 25:     option xtime off
 26:     option quota off
 27:     subvolumes vol2-io-threads
 28: end-volume
 29:
 30: volume /home/gluster-vol2
 31:     type debug/io-stats
 32:     option latency-measurement off
 33:     option count-fop-hits off
 34:     subvolumes vol2-marker
 35: end-volume
 36:
 37: volume vol2-server
 38:     type protocol/server
 39:     option transport-type tcp
 40:     option auth.addr./home/gluster-vol2.allow *
 41:     subvolumes /home/gluster-vol2
 42: end-volume

Client config:
  1: volume vol2-client-0
  2:     type protocol/client
  3:     option remote-host glu1
  4:     option remote-subvolume /home/gluster-vol2
  5:     option transport-type tcp
  6: end-volume
  7:
  8: volume vol2-write-behind
  9:     type performance/write-behind
 10:     subvolumes vol2-client-0
 11: end-volume
 12:
 13: volume vol2-read-ahead
 14:     type performance/read-ahead
 15:     subvolumes vol2-write-behind
 16: end-volume
 17:
 18: volume vol2-io-cache
 19:     type performance/io-cache
 20:     subvolumes vol2-read-ahead
 21: end-volume
 22:
 23: volume vol2-quick-read
 24:     type performance/quick-read
 25:     subvolumes vol2-io-cache
 26: end-volume
 27:
 28: volume vol2-stat-prefetch
 29:     type performance/stat-prefetch
 30:     subvolumes vol2-quick-read
 31: end-volume
 32:
 33: volume vol2
 34:     type debug/io-stats
 35:     option latency-measurement off
 36:     option count-fop-hits off
 37:     subvolumes vol2-stat-prefetch
 38: end-volume

The client is mounted with:
root@ns224055:~# mount -t glusterfs glu1:/vol2 /gluster2

Comment 3 Jean-Marc Saffroy 2011-10-03 06:36:30 UTC
To run the test:

root@ns224055:~# git clone git://fedorapeople.org/~steved/cthon04
Initialized empty Git repository in /root/cthon04/.git/
remote: Counting objects: 256, done.
remote: Compressing objects: 100% (255/255), done.
remote: Total 256 (delta 139), reused 0 (delta 0)
Receiving objects: 100% (256/256), 124.98 KiB, done.
Resolving deltas: 100% (139/139), done.

root@ns224055:~# cd cthon04/special/

root@ns224055:~/cthon04/special# make bigfile
cd ../basic; make subr.o
make[1]: Entering directory `/root/cthon04/basic'
cc `echo -DLINUX -DGLIBC=22 -DMMAP -DSTDARG`   -c -o subr.o subr.c
make[1]: Leaving directory `/root/cthon04/basic'
cc `echo -DLINUX -DGLIBC=22 -DMMAP -DSTDARG` -o bigfile bigfile.c ../basic/subr.o `echo -lnsl`

root@ns224055:~/cthon04/special# while : ; do ./bigfile /gluster2/foo || break ; date ; done

Comment 4 Amar Tumballi 2011-10-03 07:00:39 UTC
Thanks for the proper description. We could reproduce the issue in-house with your code. Will try to address this asap.

Regards,
Amar

Comment 5 Amar Tumballi 2011-10-03 07:16:23 UTC
Hi Jean,

Please try with the patch: http://review.gluster.com/555

It did solve the issue for me while using bigfile

Comment 6 Jean-Marc Saffroy 2011-10-03 07:32:17 UTC
(In reply to comment #5)
> Hi Jean,
> 
> Please try with the patch: http://review.gluster.com/555
> 
> It did solve the issue for me while using bigfile

Can you please attach the patch? For some reason, I can only browse the changes, not download a patch.

Comment 7 Amar Tumballi 2011-10-04 01:36:34 UTC
Created attachment 683


With this patch, things worked for me (it is based on master branch)

Comment 8 Jean-Marc Saffroy 2011-10-04 07:56:36 UTC
(In reply to comment #7)
> With this patch, things worked for me (it is based on master branch)

I backported the patch to v3.2.3, and can no longer reproduce the problem. Thanks!

Also, if I may suggest, the Connectathon test suite is a useful tool for regression testing:

$ cd cthon04
$ make
...
$ ./runtests -a -f /gluster/foo

I found this bug by running a loop of torture tests which includes this test suite.

Comment 9 Amar Tumballi 2011-10-04 10:30:58 UTC
Sure. We will integrate this in our testing suite.

Comment 10 Anand Avati 2011-10-10 07:57:15 UTC
CHANGE: http://review.gluster.com/555 (earlier it was hardcoded to 8, now increased the size to 16.) merged in master by Vijay Bellur (vijay)

Comment 11 Anand Avati 2011-10-10 07:58:31 UTC
CHANGE: http://review.gluster.com/570 (earlier it was hardcoded to 8, now increased the size to 16.) merged in release-3.2 by Vijay Bellur (vijay)

Comment 12 Rahul C S 2011-11-02 06:22:07 UTC
I do not see any hangs with 3.2.5qa4 on RHEL-6.1


Note You need to log in before you can comment on or make changes to this bug.