1117655 – 0-mem-pool: invalid argument with fio --thread

Bug 1117655 - 0-mem-pool: invalid argument with fio --thread

Summary: 0-mem-pool: invalid argument with fio --thread

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	libgfapi
Sub Component:
Version:	3.5.3
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-09 07:29 UTC by Tiziano Müller
Modified:	2019-12-12 06:21 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-06-17 15:57:47 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
log-file written automatically by the gfapi (46.31 KB, text/x-log) 2014-07-09 07:29 UTC, Tiziano Müller	no flags	Details
View All

Description Tiziano Müller 2014-07-09 07:29:17 UTC

Created attachment 916658 [details]
log-file written automatically by the gfapi

I try to run fio (almost latest head) using the following command (in an attempt to reproduce #1093594).
After that the process hangs (sometimes one thread starts transmitting data, sometimes it only hangs) and must be killed by SIGKILL (probably since glfs_init failed and glfs_fini gets called anyway):

~ # fio --ioengine=gfapi     --volume=virtualization     --brick=10.1.120.11     --filename=fio.test     --size=4M     --direct=1     --rw=randrw     --refill_buffers     --norandommap     --bs=8k     --rwmixread=70     --iodepth=16     --numjobs=16     --thread --stonewall     --runtime=60     --group_reporting     --name=fio-test |& tee fio.out
fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=gfapi, iodepth=16
...
fio-2.1.9
Starting 16 threads
[2014-07-09 07:10:12.923074] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x9) [0x7fc9af9af369] (-->/usr/lib64/libgfapi.so.0(glfs_init_common+0x93) [0x7fc9af9af223] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25) [0x7fc9ae22abe5]))) 0-mem-pool: invalid argument
[2014-07-09 07:10:12.922950] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x9) [0x7fc9af9af369] (-->/usr/lib64/libgfapi.so.0(glfs_init_common+0x93) [0x7fc9af9af223] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25) [0x7fc9ae22abe5]))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
glfs_init failed. Is glusterd running on brick?
[2014-07-09 07:10:12.924829] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x9) [0x7fc9af9af369] (-->/usr/lib64/libgfapi.so.0(glfs_init_common+0x93) [0x7fc9af9af223] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25) [0x7fc9ae22abe5]))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
[2014-07-09 07:10:12.924900] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x9) [0x7fc9af9af369] (-->/usr/lib64/libgfapi.so.0(glfs_init_common+0x93) [0x7fc9af9af223] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25) [0x7fc9ae22abe5]))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
[2014-07-09 07:10:12.926569] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x9) [0x7fc9af9af369] (-->/usr/lib64/libgfapi.so.0(glfs_init_common+0x93) [0x7fc9af9af223] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25) [0x7fc9ae22abe5]))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
^C

Comment 1 Tiziano Müller 2014-07-09 07:54:11 UTC

... using 4 and even sometimes numjobs=8 (meaning: 4 or 8 threads) seems to work.

Comment 2 Kiran Patil 2014-11-29 12:08:05 UTC

I am also seeing the same error and could not run fio with libgfapi engine.

[root@localhost glusterfs-fio-bgregg]# fio --ioengine=gfapi     --volume=vol1     --brick=192.168.1.246     --filename=fio.test     --size=4M     --direct=1     --rw=randrw     --refill_buffers     --norandommap     --bs=8k     --rwmixread=70     --iodepth=1     --numjobs=16     --thread --stonewall     --runtime=60     --group_reporting     --name=fio-test |& tee fio.out
fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=gfapi, iodepth=1
...
fio-2.1.14-17-g8671
Starting 16 threads
glfs_init failed. Is glusterd running on brick?
glfs_init failed. Is glusterd running on brick?
[2014-11-29 14:22:45.284473] E [mem-pool.c:355:mem_get0] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1ab)[0x7fe6e21ef12b] (--> /usr/lib64/libglusterfs.so.0(mem_get0+0x78)[0x7fe6e221cd38] (--> /usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25)[0x7fe6e21ea855] (--> /usr/lib64/libgfapi.so.0(+0x6402)[0x7fe6e2471402] (--> /usr/lib64/libgfapi.so.0(glfs_init+0x15)[0x7fe6e2471555] ))))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
[2014-11-29 14:22:45.284603] E [mem-pool.c:355:mem_get0] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1ab)[0x7fe6e21ef12b] (--> /usr/lib64/libglusterfs.so.0(mem_get0+0x78)[0x7fe6e221cd38] (--> /usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25)[0x7fe6e21ea855] (--> /usr/lib64/libgfapi.so.0(+0x6402)[0x7fe6e2471402] (--> /usr/lib64/libgfapi.so.0(glfs_init+0x15)[0x7fe6e2471555] ))))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
[2014-11-29 14:22:45.285025] E [mem-pool.c:355:mem_get0] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1ab)[0x7fe6e21ef12b] (--> /usr/lib64/libglusterfs.so.0(mem_get0+0x78)[0x7fe6e221cd38] (--> /usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25)[0x7fe6e21ea855] (--> /usr/lib64/libgfapi.so.0(+0x6402)[0x7fe6e2471402] (--> /usr/lib64/libgfapi.so.0(glfs_init+0x15)[0x7fe6e2471555] ))))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
[2014-11-29 14:22:45.284863] E [mem-pool.c:355:mem_get0] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1ab)[0x7fe6e21ef12b] (--> /usr/lib64/libglusterfs.so.0(mem_get0+0x78)[0x7fe6e221cd38] (--> /usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25)[0x7fe6e21ea855] (--> /usr/lib64/libgfapi.so.0(+0x6402)[0x7fe6e2471402] (--> /usr/lib64/libgfapi.so.0(glfs_init+0x15)[0x7fe6e2471555] ))))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
[2014-11-29 14:22:45.285292] E [mem-pool.c:355:mem_get0] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1ab)[0x7fe6e21ef12b] (--> /usr/lib64/libglusterfs.so.0(mem_get0+0x78)[0x7fe6e221cd38] (--> /usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25)[0x7fe6e21ea855] (--> /usr/lib64/libgfapi.so.0(+0x6402)[0x7fe6e2471402] (--> /usr/lib64/libgfapi.so.0(glfs_init+0x15)[0x7fe6e2471555] ))))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
[2014-11-29 14:22:45.285343] E [mem-pool.c:355:mem_get0] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1ab)[0x7fe6e21ef12b] (--> /usr/lib64/libglusterfs.so.0(mem_get0+0x78)[0x7fe6e221cd38] (--> /usr/lib64/libglusterfs.so.0(get_new_dict_full+0x25)[0x7fe6e21ea855] (--> /usr/lib64/libgfapi.so.0(+0x6402)[0x7fe6e2471402] (--> /usr/lib64/libgfapi.so.0(glfs_init+0x15)[0x7fe6e2471555] ))))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
glfs_init failed. Is glusterd running on brick?
glfs_init failed. Is glusterd running on brick?
glfs_init failed. Is glusterd running on brick?
glfs_init failed. Is glusterd running on brick?
glfs_init failed. Is glusterd running on brick?
glfs_init failed. Is glusterd running on brick?
glfs_init failed. Is glusterd running on brick?
glfs_init failed. Is glusterd running on brick?
fio: pid=3732, err=-1/
fio: pid=3726, err=-1/
fio: pid=3733, err=-1/
fio: pid=3727, err=-1/
fio: pid=3728, err=-1/
fio: pid=3740, err=-1/
fio: pid=3737, err=-1/
fio: pid=3730, err=-1/
fio: pid=3731, err=-1/
fio: pid=3734, err=-1/
fio: pid=3735, err=-1/
fio: pid=3739, err=-1/
fio: pid=3741, err=-1/
fio: pid=3729, err=-1/
fio: pid=3736, err=-1/
fio: pid=3738, err=-1/


Run status group 0 (all jobs):

Comment 3 Kiran Patil 2014-11-29 12:09:16 UTC

I am using Gluster version 3.6.1

Comment 4 Kiran Patil 2014-11-29 12:50:53 UTC

I am getting below error now.

[root@localhost glusterfs-fio-bgregg]# fio --ioengine=gfapi     --volume=vol1     --brick=192.168.1.246     --filename=fio.test     --size=4M     --direct=1     --rw=randrw     --refill_buffers     --norandommap     --bs=8k     --rwmixread=70     --iodepth=1     --numjobs=4     --thread --stonewall     --runtime=60     --group_reporting     --name=fio-test |& tee fio.out
fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=gfapi, iodepth=1
...
fio-2.1.14-17-g8671
Starting 4 threads
failed fio extend file fio.test to 4194304
failed fio extend file fio.test to 4194304
failed fio extend file fio.test to 4194304
failed fio extend file fio.test to 4194304


Run status group 0 (all jobs):

Comment 5 Kiran Patil 2014-11-29 12:59:53 UTC

It has started to execute properly now with few options.

Comment 6 Kiran Patil 2014-11-29 13:43:34 UTC

I installed the rpm packages rather then building from source.

Comment 7 Tiziano Müller 2015-02-27 08:27:16 UTC

ok, the behaviour for v3.5.3-26-g526448e plus a backport of http://review.gluster.org/7857 is now better in the sense that there are proper error messages instead of hangers, see outputs below.

The reason why I insist on that bug is that we see a similar behaviour when restarting libvirt on a server with 10+ Qemu/KVM VMs with their disk images attached via libgfapi. In that case libvirt starts scanning the images of the running VMs via libgfapi and then fails somewhere in between, resulting in immediate termination of all the machines it could not scan the images of.

Together with issue #1093594 this leaves us in a very bad situation: libvirt leaking memory via libgfapi but unable to restart it without killing the VMs.

== RUN 1 ==

~ # fio --ioengine=gfapi     --volume=gv-tier1-vm-01     --brick=10.1.120.11     --filename=fio.test     --size=4M     --direct=1     --rw=randrw     --refill_buffers     --norandommap     --bs=8k     --rwmixread=70     --iodepth=16     --numjobs=16     --thread --stonewall     --runtime=60     --group_reporting     --name=fio-test
fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=gfapi, iodepth=16
...
fio-2.2.4
Starting 16 threads
[2015-02-27 08:13:46.799546] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:46.799582] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:46.799589] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:54.308953] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:54.309001] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:54.309011] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:54.357649] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:54.357663] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:54.517976] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:54.518014] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:54.518021] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:54.584594] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:54.584637] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:54.584646] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:54.677121] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:54.677153] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:54.677161] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:54.882268] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:54.882303] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:54.882313] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:55.124261] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:55.124340] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:55.124348] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:55.430563] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:55.430608] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:55.430614] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Jobs: 7 (f=6): [_(1),m(1),_(2),m(1),_(2),m(1),_(1),m(2),_(1),m(1),_(1),m(1),_(1)] [82.8% done] [1955KB/888KB/0KB /s] [244/111/0 iops] [2015-02-27 08:13:55.745790] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:55.745809] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:55.866561] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:55.866603] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:55.866619] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:55.918069] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:55.918104] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:55.918121] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:55.972400] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:55.972426] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:55.972434] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:56.171190] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:56.171228] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:56.171235] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:13:56.323489] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:56.323516] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:56.323524] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Jobs: 1 (f=1): [_(7),m(1),_(8)] [100.0% done] [1573KB/622KB/0KB /s] [196/77/0 iops] [eta 00m:00s]                                     [2015-02-27 08:13:56.855544] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:56.855579] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:13:56.855591] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.

fio-test: (groupid=0, jobs=16): err= 0: pid=17173: Fri Feb 27 09:13:56 2015
  read : io=46208KB, bw=2033.5KB/s, iops=254, runt= 22724msec
    clat (usec): min=28, max=1791.9K, avg=54832.87, stdev=57784.01
     lat (usec): min=29, max=1791.9K, avg=54833.10, stdev=57784.00
    clat percentiles (usec):
     |  1.00th=[  868],  5.00th=[ 4512], 10.00th=[13376], 20.00th=[24960],
     | 30.00th=[32640], 40.00th=[40192], 50.00th=[46848], 60.00th=[55040],
     | 70.00th=[64768], 80.00th=[77312], 90.00th=[97792], 95.00th=[118272],
     | 99.00th=[179200], 99.50th=[224256], 99.90th=[741376], 99.95th=[1449984],
     | 99.99th=[1794048]
    bw (KB  /s): min=    1, max= 1003, per=7.02%, avg=142.79, stdev=71.35
  write: io=19328KB, bw=870967B/s, iops=106, runt= 22724msec
    clat (usec): min=8, max=4171, avg=52.05, stdev=176.55
     lat (usec): min=8, max=4171, avg=52.27, stdev=176.57
    clat percentiles (usec):
     |  1.00th=[    9],  5.00th=[   10], 10.00th=[   12], 20.00th=[   15],
     | 30.00th=[   36], 40.00th=[   41], 50.00th=[   44], 60.00th=[   46],
     | 70.00th=[   49], 80.00th=[   56], 90.00th=[   67], 95.00th=[   77],
     | 99.00th=[  110], 99.50th=[  149], 99.90th=[ 3760], 99.95th=[ 3920],
     | 99.99th=[ 4192]
    bw (KB  /s): min=    4, max=  430, per=7.36%, avg=62.59, stdev=37.49
    lat (usec) : 10=1.09%, 20=6.63%, 50=12.99%, 100=8.29%, 250=0.38%
    lat (usec) : 750=0.15%, 1000=0.96%
    lat (msec) : 2=1.16%, 4=1.21%, 10=2.05%, 20=5.14%, 50=27.27%
    lat (msec) : 100=25.95%, 250=6.48%, 500=0.12%, 750=0.07%, 1000=0.02%
    lat (msec) : 2000=0.04%
  cpu          : usr=0.10%, sys=0.05%, ctx=6356, majf=1, minf=585
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=5776/w=2416/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: io=46208KB, aggrb=2033KB/s, minb=2033KB/s, maxb=2033KB/s, mint=22724msec, maxt=22724msec
  WRITE: io=19328KB, aggrb=850KB/s, minb=850KB/s, maxb=850KB/s, mint=22724msec, maxt=22724msec

== RUN 2 ==

~ # fio --ioengine=gfapi     --volume=gv-tier1-vm-01     --brick=10.1.120.11     --filename=fio.test     --size=4M     --direct=1     --rw=randrw     --refill_buffers     --norandommap     --bs=8k     --rwmixread=70     --iodepth=16     --numjobs=16     --thread --stonewall     --runtime=60     --group_reporting     --name=fio-test
fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=gfapi, iodepth=16
...
fio-2.2.4
Starting 16 threads
[2015-02-27 08:18:35.643845] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
fio: io engine init failed. Perhaps try reducing io depth?
fio: pid=17314, err=-1/
[2015-02-27 08:18:35.645018] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
fio: io engine init failed. Perhaps try reducing io depth?
fio: pid=17313, err=-1/
[2015-02-27 08:18:35.647171] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument
[2015-02-27 08:18:35.647186] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
[2015-02-27 08:18:35.647244] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
fio: io engine init failed. Perhaps try reducing io depth?
glfs_init failed. Is glusterd running on brick?
fio: pid=17322, err=-1/
fio: io engine init failed. Perhaps try reducing io depth?
fio: pid=17310, err=-1/
fio: io engine init failed. Perhaps try reducing io depth?
fio: pid=17318, err=-1/
[2015-02-27 08:18:35.647656] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument
glfs_init failed. Is glusterd running on brick?
[2015-02-27 08:18:35.647674] E [mem-pool.c:349:mem_get0] (-->/usr/lib64/libgfapi.so.0(glfs_init+0x1d) [0x7fb3cd1f0ced] (-->/usr/lib64/libgfapi.so.0(+0x6b33) [0x7fb3cd1f0b33] (-->/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x33) [0x7fb3cba46cd3]))) 0-mem-pool: invalid argument
fio: io engine init failed. Perhaps try reducing io depth?
fio: pid=17324, err=-1/
glfs_init failed. Is glusterd running on brick?
fio: io engine init failed. Perhaps try reducing io depth?
fio: pid=17312, err=-1/
[2015-02-27 08:18:50.486867] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.486902] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.486908] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:18:50.616934] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.616989] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.616997] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:18:50.617179] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.617205] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.617211] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:18:50.676846] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.676874] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.676880] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:18:50.678779] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.678806] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.678813] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:18:50.700168] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.700187] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:18:50.773939] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-0: disconnected from 10.1.120.11:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.773970] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.773976] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:18:50.775616] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.775641] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-02-27 08:18:50.796954] I [client.c:2229:client_rpc_notify] 0-gv-tier1-vm-01-client-1: disconnected from 10.1.120.12:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2015-02-27 08:18:50.796967] E [afr-common.c:4524:afr_notify] 0-gv-tier1-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.

fio-test: (groupid=0, jobs=16): err=-1 (): pid=17310: Fri Feb 27 09:18:50 2015
  read : io=25992KB, bw=1999.4KB/s, iops=249, runt= 13000msec
    clat (usec): min=14, max=243923, avg=35654.07, stdev=24352.77
     lat (usec): min=14, max=243924, avg=35654.25, stdev=24352.77
    clat percentiles (usec):
     |  1.00th=[ 1800],  5.00th=[ 5984], 10.00th=[10304], 20.00th=[16512],
     | 30.00th=[20608], 40.00th=[25984], 50.00th=[31360], 60.00th=[36608],
     | 70.00th=[42752], 80.00th=[51968], 90.00th=[66048], 95.00th=[80384],
     | 99.00th=[118272], 99.50th=[140288], 99.90th=[191488], 99.95th=[211968],
     | 99.99th=[244736]
    bw (KB  /s): min=  104, max=  418, per=11.10%, avg=221.96, stdev=64.03
  write: io=10872KB, bw=856379B/s, iops=104, runt= 13000msec
    clat (usec): min=9, max=16945, avg=91.96, stdev=886.42
     lat (usec): min=9, max=16945, avg=92.19, stdev=886.43
    clat percentiles (usec):
     |  1.00th=[    9],  5.00th=[    9], 10.00th=[   11], 20.00th=[   13],
     | 30.00th=[   19], 40.00th=[   39], 50.00th=[   43], 60.00th=[   44],
     | 70.00th=[   48], 80.00th=[   54], 90.00th=[   63], 95.00th=[   75],
     | 99.00th=[  110], 99.50th=[ 1384], 99.90th=[16320], 99.95th=[17024],
     | 99.99th=[17024]
    bw (KB  /s): min=   14, max=  229, per=11.15%, avg=93.21, stdev=38.67
    lat (usec) : 10=1.69%, 20=7.18%, 50=12.80%, 100=7.44%, 250=0.20%
    lat (usec) : 750=0.09%, 1000=0.17%
    lat (msec) : 2=0.69%, 4=1.09%, 10=4.71%, 20=13.22%, 50=35.35%
    lat (msec) : 100=14.11%, 250=1.26%
  cpu          : usr=0.10%, sys=0.05%, ctx=3447, majf=0, minf=165
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=3249/w=1359/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: io=25992KB, aggrb=1999KB/s, minb=1999KB/s, maxb=1999KB/s, mint=13000msec, maxt=13000msec
  WRITE: io=10872KB, aggrb=836KB/s, minb=836KB/s, maxb=836KB/s, mint=13000msec, maxt=13000msec

Comment 8 Tiziano Müller 2015-02-27 08:57:03 UTC

Ok, the error indicates that the dict_pool is accessed before it got initialized in api/src/glfs.c:115.

Since I can't reproduce the problem in every run (see above) this is indeed a race-condition.

Comment 9 Soumya Koduri 2016-02-09 12:20:08 UTC

There were some improvements done with respect to glfs_init and glfs_fini() in the recent releases. Could you repeat the tests on glusterfs-3.7* and check if you still hit this issue.

Comment 10 Niels de Vos 2016-06-17 15:57:47 UTC

This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Note You need to log in before you can comment on or make changes to this bug.