Created attachment 916658 [details] log-file written automatically by the gfapi I try to run fio (almost latest head) using the following command (in an attempt to reproduce #1093594). After that the process hangs (sometimes one thread starts transmitting data, sometimes it only hangs) and must be killed by SIGKILL (probably since glfs_init failed and glfs_fini gets called anyway): ~ # fio --ioengine=gfapi --volume=virtualization --brick=10.1.120.11 --filename=fio.test --size=4M --direct=1 --rw=randrw --refill_buffers --norandommap --bs=8k --rwmixread=70 --iodepth=16 --numjobs=16 --thread --stonewall --runtime=60 --group_reporting --name=fio-test |& tee fio.out fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=gfapi, iodepth=16 ... fio-2.1.9 Starting 16 threads [2014-07-09 07:10:12.923074] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x9) [0x7fc9af9af369] (-->/usr/lib64/libgfapi.so.0(glfs_init_common+0x93) [0x7fc9af9af223] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25) [0x7fc9ae22abe5]))) 0-mem-pool: invalid argument [2014-07-09 07:10:12.922950] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x9) [0x7fc9af9af369] (-->/usr/lib64/libgfapi.so.0(glfs_init_common+0x93) [0x7fc9af9af223] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25) [0x7fc9ae22abe5]))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? glfs_init failed. Is glusterd running on brick? [2014-07-09 07:10:12.924829] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x9) [0x7fc9af9af369] (-->/usr/lib64/libgfapi.so.0(glfs_init_common+0x93) [0x7fc9af9af223] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25) [0x7fc9ae22abe5]))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? [2014-07-09 07:10:12.924900] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x9) [0x7fc9af9af369] (-->/usr/lib64/libgfapi.so.0(glfs_init_common+0x93) [0x7fc9af9af223] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25) [0x7fc9ae22abe5]))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? [2014-07-09 07:10:12.926569] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x9) [0x7fc9af9af369] (-->/usr/lib64/libgfapi.so.0(glfs_init_common+0x93) [0x7fc9af9af223] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25) [0x7fc9ae22abe5]))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? ^C
... using 4 and even sometimes numjobs=8 (meaning: 4 or 8 threads) seems to work.
I am also seeing the same error and could not run fio with libgfapi engine. [root@localhost glusterfs-fio-bgregg]# fio --ioengine=gfapi --volume=vol1 --brick=192.168.1.246 --filename=fio.test --size=4M --direct=1 --rw=randrw --refill_buffers --norandommap --bs=8k --rwmixread=70 --iodepth=1 --numjobs=16 --thread --stonewall --runtime=60 --group_reporting --name=fio-test |& tee fio.out fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=gfapi, iodepth=1 ... fio-2.1.14-17-g8671 Starting 16 threads glfs_init failed. Is glusterd running on brick? glfs_init failed. Is glusterd running on brick? [2014-11-29 14:22:45.284473] E [mem-pool.c:355:mem_get0] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1ab)[0x7fe6e21ef12b] (--> /usr/lib64/libglusterfs.so.0(mem_get0+0x78)[0x7fe6e221cd38] (--> /usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25)[0x7fe6e21ea855] (--> /usr/lib64/libgfapi.so.0(+0x6402)[0x7fe6e2471402] (--> /usr/lib64/libgfapi.so.0(glfs_init+0x15)[0x7fe6e2471555] ))))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? [2014-11-29 14:22:45.284603] E [mem-pool.c:355:mem_get0] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1ab)[0x7fe6e21ef12b] (--> /usr/lib64/libglusterfs.so.0(mem_get0+0x78)[0x7fe6e221cd38] (--> /usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25)[0x7fe6e21ea855] (--> /usr/lib64/libgfapi.so.0(+0x6402)[0x7fe6e2471402] (--> /usr/lib64/libgfapi.so.0(glfs_init+0x15)[0x7fe6e2471555] ))))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? [2014-11-29 14:22:45.285025] E [mem-pool.c:355:mem_get0] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1ab)[0x7fe6e21ef12b] (--> /usr/lib64/libglusterfs.so.0(mem_get0+0x78)[0x7fe6e221cd38] (--> /usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25)[0x7fe6e21ea855] (--> /usr/lib64/libgfapi.so.0(+0x6402)[0x7fe6e2471402] (--> /usr/lib64/libgfapi.so.0(glfs_init+0x15)[0x7fe6e2471555] ))))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? [2014-11-29 14:22:45.284863] E [mem-pool.c:355:mem_get0] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1ab)[0x7fe6e21ef12b] (--> /usr/lib64/libglusterfs.so.0(mem_get0+0x78)[0x7fe6e221cd38] (--> /usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25)[0x7fe6e21ea855] (--> /usr/lib64/libgfapi.so.0(+0x6402)[0x7fe6e2471402] (--> /usr/lib64/libgfapi.so.0(glfs_init+0x15)[0x7fe6e2471555] ))))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? [2014-11-29 14:22:45.285292] E [mem-pool.c:355:mem_get0] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1ab)[0x7fe6e21ef12b] (--> /usr/lib64/libglusterfs.so.0(mem_get0+0x78)[0x7fe6e221cd38] (--> /usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25)[0x7fe6e21ea855] (--> /usr/lib64/libgfapi.so.0(+0x6402)[0x7fe6e2471402] (--> /usr/lib64/libgfapi.so.0(glfs_init+0x15)[0x7fe6e2471555] ))))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? [2014-11-29 14:22:45.285343] E [mem-pool.c:355:mem_get0] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1ab)[0x7fe6e21ef12b] (--> /usr/lib64/libglusterfs.so.0(mem_get0+0x78)[0x7fe6e221cd38] (--> /usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25)[0x7fe6e21ea855] (--> /usr/lib64/libgfapi.so.0(+0x6402)[0x7fe6e2471402] (--> /usr/lib64/libgfapi.so.0(glfs_init+0x15)[0x7fe6e2471555] ))))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? glfs_init failed. Is glusterd running on brick? glfs_init failed. Is glusterd running on brick? glfs_init failed. Is glusterd running on brick? glfs_init failed. Is glusterd running on brick? glfs_init failed. Is glusterd running on brick? glfs_init failed. Is glusterd running on brick? glfs_init failed. Is glusterd running on brick? glfs_init failed. Is glusterd running on brick? fio: pid=3732, err=-1/ fio: pid=3726, err=-1/ fio: pid=3733, err=-1/ fio: pid=3727, err=-1/ fio: pid=3728, err=-1/ fio: pid=3740, err=-1/ fio: pid=3737, err=-1/ fio: pid=3730, err=-1/ fio: pid=3731, err=-1/ fio: pid=3734, err=-1/ fio: pid=3735, err=-1/ fio: pid=3739, err=-1/ fio: pid=3741, err=-1/ fio: pid=3729, err=-1/ fio: pid=3736, err=-1/ fio: pid=3738, err=-1/ Run status group 0 (all jobs):
I am using Gluster version 3.6.1
I am getting below error now. [root@localhost glusterfs-fio-bgregg]# fio --ioengine=gfapi --volume=vol1 --brick=192.168.1.246 --filename=fio.test --size=4M --direct=1 --rw=randrw --refill_buffers --norandommap --bs=8k --rwmixread=70 --iodepth=1 --numjobs=4 --thread --stonewall --runtime=60 --group_reporting --name=fio-test |& tee fio.out fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=gfapi, iodepth=1 ... fio-2.1.14-17-g8671 Starting 4 threads failed fio extend file fio.test to 4194304 failed fio extend file fio.test to 4194304 failed fio extend file fio.test to 4194304 failed fio extend file fio.test to 4194304 Run status group 0 (all jobs):
It has started to execute properly now with few options.
I installed the rpm packages rather then building from source.
ok, the behaviour for v3.5.3-26-g526448e plus a backport of http://review.gluster.org/7857 is now better in the sense that there are proper error messages instead of hangers, see outputs below. The reason why I insist on that bug is that we see a similar behaviour when restarting libvirt on a server with 10+ Qemu/KVM VMs with their disk images attached via libgfapi. In that case libvirt starts scanning the images of the running VMs via libgfapi and then fails somewhere in between, resulting in immediate termination of all the machines it could not scan the images of. Together with issue #1093594 this leaves us in a very bad situation: libvirt leaking memory via libgfapi but unable to restart it without killing the VMs. == RUN 1 == ~ # fio --ioengine=gfapi --volume=gv-tier1-vm-01 --brick=10.1.120.11 --filename=fio.test --size=4M --direct=1 --rw=randrw --refill_buffers --norandommap --bs=8k --rwmixread=70 --iodepth=16 --numjobs=16 --thread --stonewall --runtime=60 --group_reporting --name=fio-test fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=gfapi, iodepth=16 ... fio-2.2.4 Starting 16 threads [2015-02-27 08:13:46.799546] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:46.799582] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:46.799589] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:54.308953] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:54.309001] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:54.309011] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:54.357649] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:54.357663] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:54.517976] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:54.518014] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:54.518021] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:54.584594] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:54.584637] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:54.584646] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:54.677121] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:54.677153] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:54.677161] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:54.882268] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:54.882303] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:54.882313] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:55.124261] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:55.124340] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:55.124348] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:55.430563] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:55.430608] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:55.430614] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. Jobs: 7 (f=6): [_(1),m(1),_(2),m(1),_(2),m(1),_(1),m(2),_(1),m(1),_(1),m(1),_(1)] [82.8% done] [1955KB/888KB/0KB /s] [244/111/0 iops] [2015-02-27 08:13:55.745790] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:55.745809] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:55.866561] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:55.866603] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:55.866619] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:55.918069] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:55.918104] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:55.918121] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:55.972400] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:55.972426] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:55.972434] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:56.171190] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:56.171228] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:56.171235] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:13:56.323489] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:56.323516] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:56.323524] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. Jobs: 1 (f=1): [_(7),m(1),_(8)] [100.0% done] [1573KB/622KB/0KB /s] [196/77/0 iops] [eta 00m:00s] [2015-02-27 08:13:56.855544] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:56.855579] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:13:56.855591] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. fio-test: (groupid=0, jobs=16): err= 0: pid=17173: Fri Feb 27 09:13:56 2015 read : io=46208KB, bw=2033.5KB/s, iops=254, runt= 22724msec clat (usec): min=28, max=1791.9K, avg=54832.87, stdev=57784.01 lat (usec): min=29, max=1791.9K, avg=54833.10, stdev=57784.00 clat percentiles (usec): | 1.00th=[ 868], 5.00th=[ 4512], 10.00th=[13376], 20.00th=[24960], | 30.00th=[32640], 40.00th=[40192], 50.00th=[46848], 60.00th=[55040], | 70.00th=[64768], 80.00th=[77312], 90.00th=[97792], 95.00th=[118272], | 99.00th=[179200], 99.50th=[224256], 99.90th=[741376], 99.95th=[1449984], | 99.99th=[1794048] bw (KB /s): min= 1, max= 1003, per=7.02%, avg=142.79, stdev=71.35 write: io=19328KB, bw=870967B/s, iops=106, runt= 22724msec clat (usec): min=8, max=4171, avg=52.05, stdev=176.55 lat (usec): min=8, max=4171, avg=52.27, stdev=176.57 clat percentiles (usec): | 1.00th=[ 9], 5.00th=[ 10], 10.00th=[ 12], 20.00th=[ 15], | 30.00th=[ 36], 40.00th=[ 41], 50.00th=[ 44], 60.00th=[ 46], | 70.00th=[ 49], 80.00th=[ 56], 90.00th=[ 67], 95.00th=[ 77], | 99.00th=[ 110], 99.50th=[ 149], 99.90th=[ 3760], 99.95th=[ 3920], | 99.99th=[ 4192] bw (KB /s): min= 4, max= 430, per=7.36%, avg=62.59, stdev=37.49 lat (usec) : 10=1.09%, 20=6.63%, 50=12.99%, 100=8.29%, 250=0.38% lat (usec) : 750=0.15%, 1000=0.96% lat (msec) : 2=1.16%, 4=1.21%, 10=2.05%, 20=5.14%, 50=27.27% lat (msec) : 100=25.95%, 250=6.48%, 500=0.12%, 750=0.07%, 1000=0.02% lat (msec) : 2000=0.04% cpu : usr=0.10%, sys=0.05%, ctx=6356, majf=1, minf=585 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=5776/w=2416/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): READ: io=46208KB, aggrb=2033KB/s, minb=2033KB/s, maxb=2033KB/s, mint=22724msec, maxt=22724msec WRITE: io=19328KB, aggrb=850KB/s, minb=850KB/s, maxb=850KB/s, mint=22724msec, maxt=22724msec == RUN 2 == ~ # fio --ioengine=gfapi --volume=gv-tier1-vm-01 --brick=10.1.120.11 --filename=fio.test --size=4M --direct=1 --rw=randrw --refill_buffers --norandommap --bs=8k --rwmixread=70 --iodepth=16 --numjobs=16 --thread --stonewall --runtime=60 --group_reporting --name=fio-test fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=gfapi, iodepth=16 ... fio-2.2.4 Starting 16 threads [2015-02-27 08:18:35.643845] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? fio: io engine init failed. Perhaps try reducing io depth? fio: pid=17314, err=-1/ [2015-02-27 08:18:35.645018] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? fio: io engine init failed. Perhaps try reducing io depth? fio: pid=17313, err=-1/ [2015-02-27 08:18:35.647171] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument [2015-02-27 08:18:35.647186] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? [2015-02-27 08:18:35.647244] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? fio: io engine init failed. Perhaps try reducing io depth? glfs_init failed. Is glusterd running on brick? fio: pid=17322, err=-1/ fio: io engine init failed. Perhaps try reducing io depth? fio: pid=17310, err=-1/ fio: io engine init failed. Perhaps try reducing io depth? fio: pid=17318, err=-1/ [2015-02-27 08:18:35.647656] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument glfs_init failed. Is glusterd running on brick? [2015-02-27 08:18:35.647674] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument fio: io engine init failed. Perhaps try reducing io depth? fio: pid=17324, err=-1/ glfs_init failed. Is glusterd running on brick? fio: io engine init failed. Perhaps try reducing io depth? fio: pid=17312, err=-1/ [2015-02-27 08:18:50.486867] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.486902] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.486908] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:18:50.616934] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.616989] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.616997] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:18:50.617179] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.617205] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.617211] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:18:50.676846] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.676874] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.676880] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:18:50.678779] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.678806] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.678813] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:18:50.700168] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.700187] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:18:50.773939] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.773970] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.773976] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:18:50.775616] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.775641] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-02-27 08:18:50.796954] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-27 08:18:50.796967] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. fio-test: (groupid=0, jobs=16): err=-1 (): pid=17310: Fri Feb 27 09:18:50 2015 read : io=25992KB, bw=1999.4KB/s, iops=249, runt= 13000msec clat (usec): min=14, max=243923, avg=35654.07, stdev=24352.77 lat (usec): min=14, max=243924, avg=35654.25, stdev=24352.77 clat percentiles (usec): | 1.00th=[ 1800], 5.00th=[ 5984], 10.00th=[10304], 20.00th=[16512], | 30.00th=[20608], 40.00th=[25984], 50.00th=[31360], 60.00th=[36608], | 70.00th=[42752], 80.00th=[51968], 90.00th=[66048], 95.00th=[80384], | 99.00th=[118272], 99.50th=[140288], 99.90th=[191488], 99.95th=[211968], | 99.99th=[244736] bw (KB /s): min= 104, max= 418, per=11.10%, avg=221.96, stdev=64.03 write: io=10872KB, bw=856379B/s, iops=104, runt= 13000msec clat (usec): min=9, max=16945, avg=91.96, stdev=886.42 lat (usec): min=9, max=16945, avg=92.19, stdev=886.43 clat percentiles (usec): | 1.00th=[ 9], 5.00th=[ 9], 10.00th=[ 11], 20.00th=[ 13], | 30.00th=[ 19], 40.00th=[ 39], 50.00th=[ 43], 60.00th=[ 44], | 70.00th=[ 48], 80.00th=[ 54], 90.00th=[ 63], 95.00th=[ 75], | 99.00th=[ 110], 99.50th=[ 1384], 99.90th=[16320], 99.95th=[17024], | 99.99th=[17024] bw (KB /s): min= 14, max= 229, per=11.15%, avg=93.21, stdev=38.67 lat (usec) : 10=1.69%, 20=7.18%, 50=12.80%, 100=7.44%, 250=0.20% lat (usec) : 750=0.09%, 1000=0.17% lat (msec) : 2=0.69%, 4=1.09%, 10=4.71%, 20=13.22%, 50=35.35% lat (msec) : 100=14.11%, 250=1.26% cpu : usr=0.10%, sys=0.05%, ctx=3447, majf=0, minf=165 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=3249/w=1359/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): READ: io=25992KB, aggrb=1999KB/s, minb=1999KB/s, maxb=1999KB/s, mint=13000msec, maxt=13000msec WRITE: io=10872KB, aggrb=836KB/s, minb=836KB/s, maxb=836KB/s, mint=13000msec, maxt=13000msec
Ok, the error indicates that the dict_pool is accessed before it got initialized in api/src/glfs.c:115. Since I can't reproduce the problem in every run (see above) this is indeed a race-condition.
There were some improvements done with respect to glfs_init and glfs_fini() in the recent releases. Could you repeat the tests on glusterfs-3.7* and check if you still hit this issue.
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.