Bug 1233136 - Libgfapi client program crashes during glfs_fini() because io-cache xlator is leaking iobufs
Summary: Libgfapi client program crashes during glfs_fini() because io-cache xlator is...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: io-cache
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-18 09:55 UTC by Prashanth Pai
Modified: 2017-02-21 09:24 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-02-21 09:24:24 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Sample program to reproduce the crash (801 bytes, text/x-csrc)
2015-06-18 09:55 UTC, Prashanth Pai
no flags Details
bareos-glfs.log (2.48 MB, text/plain)
2015-09-29 12:53 UTC, Soumya Koduri
no flags Details

Description Prashanth Pai 2015-06-18 09:55:08 UTC
Created attachment 1040398 [details]
Sample program to reproduce the crash

Description of problem:
I have a binary file (a PNG picture). I read the first four bytes which works as expected. During glfs_fini() which frees resources, the program crashes.

Workaround:
This crash is not observed when io-cache xlator is turned off
# gluster volume set test io-cache off

Always reproducible with attached program and binary file given below.

Setup:

[root@fed22 glusterfs]# uname -a
Linux fed22 4.0.4-303.fc22.x86_64 #1 SMP Thu May 28 12:37:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

[root@fed22 glusterfs]# gcc --version
gcc (GCC) 5.1.1 20150422 (Red Hat 5.1.1-1)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[root@fed22 glusterfs]# gluster volume info
 
Volume Name: test
Type: Distributed-Replicate
Volume ID: e52a4cfe-9af8-40cf-ad20-71522bfc26cb
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: fed22:/export/brick1/b
Brick2: fed22:/export/brick2/b
Brick3: fed22:/export/brick3/b
Brick4: fed22:/export/brick4/b
Options Reconfigured:
performance.quick-read: off
performance.read-ahead: off
performance.write-behind: off
performance.open-behind: off
performance.io-cache: on
performance.readdir-ahead: off
server.allow-insecure: on
[root@fed22 glusterfs]# 


Steps to Reproduce:

# mount -t glusterfs localhost:test /mnt
# cd /mnt
# wget https://upload.wikimedia.org/wikipedia/en/2/24/Lenna.png
# umount /mnt

# gcc crash.c -I /usr/local/include/glusterfs/ -lgfapi -O0 -ggdb -o ./crash
# ./crash

Actual results:
[root@fed22 ~]# ./crash
read returned: 4
crash: iobuf.c:128: __iobuf_arena_destroy_iobufs: Assertion `iobuf->ref == 0' failed.
Aborted (core dumped)

Expected results: The client program should not crash.
[root@fed22 ~]# ./crash
read returned: 4
glfs_fini returned: 0

Debug core:
[root@fed22 ~]# gdb crash
GNU gdb (GDB) Fedora 7.9.1-13.fc22
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from crash...done.
(gdb) r
Starting program: /root/crash 
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.21-5.fc22.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffedbe2700 (LWP 26296)]
[New Thread 0x7ffff4a9d700 (LWP 26295)]
[New Thread 0x7ffff529e700 (LWP 26294)]
[New Thread 0x7fffeb12a700 (LWP 26301)]
read returned: 4
[Thread 0x7ffff529e700 (LWP 26294) exited]
[Thread 0x7ffff4a9d700 (LWP 26295) exited]
[Thread 0x7fffeb12a700 (LWP 26301) exited]
[Thread 0x7fffedbe2700 (LWP 26296) exited]
crash: iobuf.c:128: __iobuf_arena_destroy_iobufs: Assertion `iobuf->ref == 0' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff782fa98 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install keyutils-libs-1.5.9-4.fc22.x86_64 krb5-libs-1.13.1-3.fc22.x86_64 libacl-2.2.52-7.fc22.x86_64 libattr-2.4.47-9.fc22.x86_64 libcom_err-1.42.12-4.fc22.x86_64 libselinux-2.3-10.fc22.x86_64 openssl-libs-1.0.1k-8.fc22.x86_64 pcre-8.37-1.fc22.x86_64 zlib-1.2.8-7.fc22.x86_64
(gdb) bt
#0  0x00007ffff782fa98 in raise () from /lib64/libc.so.6
#1  0x00007ffff783172a in abort () from /lib64/libc.so.6
#2  0x00007ffff7828247 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff78282f2 in __assert_fail () from /lib64/libc.so.6
#4  0x00007ffff6f3ce3f in __iobuf_arena_destroy_iobufs (iobuf_arena=0x6201b0) at iobuf.c:128
#5  0x00007ffff6f3cf2f in __iobuf_arena_destroy (iobuf_pool=0x605060, iobuf_arena=0x6201b0) at iobuf.c:152
#6  0x00007ffff6f3d660 in iobuf_pool_destroy (iobuf_pool=0x605060) at iobuf.c:321
#7  0x00007ffff7bc33a0 in glusterfs_ctx_destroy (ctx=0x602190) at glfs.c:973
#8  0x00007ffff7bc38e9 in pub_glfs_fini (fs=0x602010) at glfs.c:1150
#9  0x0000000000400a89 in main () at crash.c:27
(gdb) f 4
#4  0x00007ffff6f3ce3f in __iobuf_arena_destroy_iobufs (iobuf_arena=0x6201b0) at iobuf.c:128
128	                GF_ASSERT (iobuf->ref == 0);
(gdb) p *iobuf
$1 = {{list = {next = 0x620208, prev = 0x620208}, {next = 0x620208, prev = 0x620208}}, iobuf_arena = 0x6201b0, lock = 1, ref = 1, ptr = 0x7ffff5a5f000, free_ptr = 0x0}
(gdb) 


Possible reason:
io-cache xlator is taking a ref somewhere and does not unref.

Comment 1 Vijay Bellur 2015-06-23 12:11:29 UTC
Raghavendra - can you please take a look?

Comment 2 Soumya Koduri 2015-09-29 09:44:29 UTC
Milind has seen similar crash while trying to take backup from gluster volumes using baroeas. And we have observed that there is no issue when io-cache xlator is disabled.


After adding few debug statements in iobref_ref & iobref_unref (attached the log file),
just before the crash -->

[2015-09-29 08:45:47.623190] E [MSGID: 101039] [iobuf.c:536:__iobuf_ref] 0-sun-client-0: $$$$$$$$$$$$$$$$$ in iobuf_ref iobuf(0x7fbb9402bae0), ref(1)
/usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xf2)[0x7fbb9d4f0c52]
/usr/local/lib/libglusterfs.so.0(__iobuf_ref+0x8b)[0x7fbb9d53527b]
/usr/local/lib/libglusterfs.so.0(iobuf_ref+0x87)[0x7fbb9d535e54]
/usr/local/lib/libglusterfs.so.0(iobuf_get2+0x12b)[0x7fbb9d53577e]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_record_build_record+0x1f1)[0x7fbb9d9d9422]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_record+0x2b3)[0x7fbb9d9d97ba]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_submit+0x1e5)[0x7fbb9d9d9c68]
/usr/local/lib/glusterfs/3.8dev/xlator/protocol/client.so(client_submit_request+0x540)[0x7fbb86dc0ef3]
/usr/local/lib/glusterfs/3.8dev/xlator/protocol/client.so(client3_3_getxattr+0x53a)[0x7fbb86de4e10]
/usr/local/lib/glusterfs/3.8dev/xlator/protocol/client.so(client_getxattr+0xee)[0x7fbb86dc5d1f]
/usr/local/lib/glusterfs/3.8dev/xlator/cluster/distribute.so(dht_getxattr+0x165f)[0x7fbb86b6ee09]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/glusterfs/3.8dev/xlator/performance/md-cache.so(mdc_getxattr+0x462)[0x7fbb85cd7956]
/usr/local/lib/glusterfs/3.8dev/xlator/debug/io-stats.so(io_stats_getxattr+0x23a)[0x7fbb85ac01cc]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(syncop_getxattr+0x2fb)[0x7fbb9d54aab7]
/usr/local/lib/libgfapi.so.0(+0x12de7)[0x7fbb9dbf8de7]
/usr/local/lib/libgfapi.so.0(glfs_llistxattr+0x36)[0x7fbb9dbf8f0d]
/usr/lib64/bareos/plugins/gfapi-fd.so(+0x6a13)[0x7fbb9de0ea13]
/usr/sbin/bareos-fd[0x417828]
/usr/sbin/bareos-fd[0x40d1d8]
/usr/sbin/bareos-fd[0x415f6f]
/usr/lib64/libbareosfind-15.3.0.so(_Z10find_filesP3JCRP6FF_PKTPFiS0_S2_bES4_+0x385)[0x7fbba8478905]
/usr/sbin/bareos-fd[0x40b435]
/usr/sbin/bareos-fd[0x41200f]
/usr/sbin/bareos-fd[0x413665]
/usr/sbin/bareos-fd[0x41cefc]
/usr/lib64/libbareos-15.3.0.so(workq_server+0x1f5)[0x7fbba7e35ec5]
/usr/lib64/libbareos-15.3.0.so(lmgr_thread_launcher+0x5f)[0x7fbba7e1f73f]
/usr/lib64/libpthread.so.0(+0x7555)[0x7fbba7369555]
/usr/lib64/libc.so.6(clone+0x6d)[0x7fbba65ffb9d]
[2015-09-29 08:45:47.623944] T [rpc-clnt.c:1256:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 96, payload: 28, rpc hdr: 68
[2015-09-29 08:45:47.623964] E [MSGID: 101039] [iobuf.c:536:__iobuf_ref] 0-sun-client-0: $$$$$$$$$$$$$$$$$ in iobuf_ref iobuf(0x7fbb9402bae0), ref(2)
/usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xf2)[0x7fbb9d4f0c52]
/usr/local/lib/libglusterfs.so.0(__iobuf_ref+0x8b)[0x7fbb9d53527b]
/usr/local/lib/libglusterfs.so.0(iobuf_ref+0x87)[0x7fbb9d535e54]
/usr/local/lib/libglusterfs.so.0(__iobref_add+0x164)[0x7fbb9d53645d]
/usr/local/lib/libglusterfs.so.0(iobref_add+0xf7)[0x7fbb9d53658a]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_submit+0x244)[0x7fbb9d9d9cc7]
/usr/local/lib/glusterfs/3.8dev/xlator/protocol/client.so(client_submit_request+0x540)[0x7fbb86dc0ef3]
/usr/local/lib/glusterfs/3.8dev/xlator/protocol/client.so(client3_3_getxattr+0x53a)[0x7fbb86de4e10]
/usr/local/lib/glusterfs/3.8dev/xlator/protocol/client.so(client_getxattr+0xee)[0x7fbb86dc5d1f]
/usr/local/lib/glusterfs/3.8dev/xlator/cluster/distribute.so(dht_getxattr+0x165f)[0x7fbb86b6ee09]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/glusterfs/3.8dev/xlator/performance/md-cache.so(mdc_getxattr+0x462)[0x7fbb85cd7956]
/usr/local/lib/glusterfs/3.8dev/xlator/debug/io-stats.so(io_stats_getxattr+0x23a)[0x7fbb85ac01cc]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(syncop_getxattr+0x2fb)[0x7fbb9d54aab7]
/usr/local/lib/libgfapi.so.0(+0x12de7)[0x7fbb9dbf8de7]
/usr/local/lib/libgfapi.so.0(glfs_llistxattr+0x36)[0x7fbb9dbf8f0d]
/usr/lib64/bareos/plugins/gfapi-fd.so(+0x6a13)[0x7fbb9de0ea13]
/usr/sbin/bareos-fd[0x417828]
/usr/sbin/bareos-fd[0x40d1d8]
/usr/sbin/bareos-fd[0x415f6f]
/usr/lib64/libbareosfind-15.3.0.so(_Z10find_filesP3JCRP6FF_PKTPFiS0_S2_bES4_+0x385)[0x7fbba8478905]
/usr/sbin/bareos-fd[0x40b435]
/usr/sbin/bareos-fd[0x41200f]
/usr/sbin/bareos-fd[0x413665]
/usr/sbin/bareos-fd[0x41cefc]
/usr/lib64/libbareos-15.3.0.so(workq_server+0x1f5)[0x7fbba7e35ec5]
/usr/lib64/libbareos-15.3.0.so(lmgr_thread_launcher+0x5f)[0x7fbba7e1f73f]
/usr/lib64/libpthread.so.0(+0x7555)[0x7fbba7369555]
/usr/lib64/libc.so.6(clone+0x6d)[0x7fbba65ffb9d]
[2015-09-29 08:45:47.624264] T [rpc-clnt.c:1593:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x1e Program: GlusterFS 3.3, ProgVers: 330, Proc: 18) to rpc-transport (sun-client-0)
[2015-09-29 08:45:47.624271] E [MSGID: 101039] [iobuf.c:549:__iobuf_unref] 0-sun-client-0: $$$$$$$$$$$$$$$$$ in iobuf_unref iobuf(0x7fbb9402bae0), ref(1)
/usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xf2)[0x7fbb9d4f0c52]
/usr/local/lib/libglusterfs.so.0(__iobuf_unref+0x8b)[0x7fbb9d535310]
/usr/local/lib/libglusterfs.so.0(iobuf_unref+0x8e)[0x7fbb9d535d9d]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_submit+0x493)[0x7fbb9d9d9f16]
/usr/local/lib/glusterfs/3.8dev/xlator/protocol/client.so(client_submit_request+0x540)[0x7fbb86dc0ef3]
/usr/local/lib/glusterfs/3.8dev/xlator/protocol/client.so(client3_3_getxattr+0x53a)[0x7fbb86de4e10]
/usr/local/lib/glusterfs/3.8dev/xlator/protocol/client.so(client_getxattr+0xee)[0x7fbb86dc5d1f]
/usr/local/lib/glusterfs/3.8dev/xlator/cluster/distribute.so(dht_getxattr+0x165f)[0x7fbb86b6ee09]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/glusterfs/3.8dev/xlator/performance/md-cache.so(mdc_getxattr+0x462)[0x7fbb85cd7956]
/usr/local/lib/glusterfs/3.8dev/xlator/debug/io-stats.so(io_stats_getxattr+0x23a)[0x7fbb85ac01cc]
/usr/local/lib/libglusterfs.so.0(default_getxattr+0xa5)[0x7fbb9d5089d0]
/usr/local/lib/libglusterfs.so.0(syncop_getxattr+0x2fb)[0x7fbb9d54aab7]
/usr/local/lib/libgfapi.so.0(+0x12de7)[0x7fbb9dbf8de7]
/usr/local/lib/libgfapi.so.0(glfs_llistxattr+0x36)[0x7fbb9dbf8f0d]
/usr/lib64/bareos/plugins/gfapi-fd.so(+0x6a13)[0x7fbb9de0ea13]
/usr/sbin/bareos-fd[0x417828]
/usr/sbin/bareos-fd[0x40d1d8]
/usr/sbin/bareos-fd[0x415f6f]
/usr/lib64/libbareosfind-15.3.0.so(_Z10find_filesP3JCRP6FF_PKTPFiS0_S2_bES4_+0x385)[0x7fbba8478905]
/usr/sbin/bareos-fd[0x40b435]
/usr/sbin/bareos-fd[0x41200f]
/usr/sbin/bareos-fd[0x413665]
/usr/sbin/bareos-fd[0x41cefc]
/usr/lib64/libbareos-15.3.0.so(workq_server+0x1f5)[0x7fbba7e35ec5]
/usr/lib64/libbareos-15.3.0.so(lmgr_thread_launcher+0x5f)[0x7fbba7e1f73f]
/usr/lib64/libpthread.so.0(+0x7555)[0x7fbba7369555]
/usr/lib64/libc.so.6(clone+0x6d)[0x7fbba65ffb9d]


There is a ref taken on iobuf(0x7fbb9402bae0) twice in rpc_clnt_submit. But it has got unref'ed only once which led to the crash.

Comment 3 Soumya Koduri 2015-09-29 12:53:07 UTC
Created attachment 1078321 [details]
bareos-glfs.log

Comment 4 Prashanth Pai 2017-02-21 09:24:24 UTC
As of current master (420c2d03543a35f1f5483f200541577f4a606b22), I can't reproduce this. I'll reopen this bug or file a new one if I ever bump into this again.

[root@gd2-1 1233136]# gcc crash.c -I /usr/local/include/glusterfs/ -lgfapi -O0 -ggdb -o ./crash

[root@gd2-1 1233136]# ./crash 
read returned: 4
glfs_fini returned: 0


Note You need to log in before you can comment on or make changes to this bug.