Created attachment 1186818 [details] Sample reproducer using libgfapi-python Description of problem: I was trying to benchmark libgfapi-python bindings for filesystem walk (metadata intensive) workload to do comparison against FUSE for same workload. The program crashes during virtual unmount (fini). Setup details: [root@f24 ~]# rpm -qa | grep gluster glusterfs-3.8.1-1.fc24.x86_64 python-gluster-3.8.1-1.fc24.noarch glusterfs-libs-3.8.1-1.fc24.x86_64 glusterfs-client-xlators-3.8.1-1.fc24.x86_64 glusterfs-cli-3.8.1-1.fc24.x86_64 glusterfs-fuse-3.8.1-1.fc24.x86_64 glusterfs-server-3.8.1-1.fc24.x86_64 glusterfs-api-3.8.1-1.fc24.x86_64 glusterfs-debuginfo-3.8.1-1.fc24.x86_64 [root@f24 ~]# uname -a Linux f24 4.6.4-301.fc24.x86_64 #1 SMP Tue Jul 12 11:50:00 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux [root@f24]# gluster volume info Volume Name: test Type: Distributed-Replicate Volume ID: e675d53a-c9b6-468f-bb8f-7101828bec70 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: f24:/export/brick1/data Brick2: f24:/export/brick2/data Brick3: f24:/export/brick3/data Brick4: f24:/export/brick4/data Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet NOTE: The volume is created with 4 RAM disks (1GB each) as bricks. [root@f24 ~]# df -h | grep 'ram\|Filesystem' Filesystem Size Used Avail Use% Mounted on /dev/ram1 971M 272M 700M 28% /export/brick1 /dev/ram2 971M 272M 700M 28% /export/brick2 /dev/ram3 971M 272M 700M 28% /export/brick3 /dev/ram4 971M 272M 700M 28% /export/brick4 [root@f24 ~]# df -ih | grep 'ram\|Filesystem' Filesystem Inodes IUsed IFree IUse% Mounted on /dev/ram1 489K 349K 140K 72% /export/brick1 /dev/ram2 489K 349K 140K 72% /export/brick2 /dev/ram3 489K 349K 140K 72% /export/brick3 /dev/ram4 489K 349K 140K 72% /export/brick4 How reproducible: Always and consistently (at least on my fedora 24 test VM) Steps to Reproduce: 1. Create large nested directory tree using FUSE mount. 2. Unmount FUSE mount. This is just to generate initial data. 3. Use python-libgfapi bindings with patch (http://review.gluster.org/#/c/14770/). It's reproducible without this patch too but the patch makes the crawling faster. 4. Run the python script that does walk using libgfapi [root@f24 ~]# ./reproducer.py Segmentation fault (core dumped) There *could* be a simple reproducer too but I haven't had the time to look further into it. Excerpt from bt: #0 list_add_tail (head=0x90, new=0x7fcac00040b8) at list.h:41 41 new->prev = head->prev; [Current thread is 1 (Thread 0x7fcae3dbb700 (LWP 4165))] Missing separate debuginfos, use: dnf debuginfo-install glibc-2.23.1-8.fc24.x86_64 keyutils-libs-1.5.9-8.fc24.x86_64 krb5-libs-1.14.1-8.fc24.x86_64 libacl-2.2.52-11.fc24.x86_64 libattr-2.4.47-16.fc24.x86_64 libcom_err-1.42.13-4.fc24.x86_64 libffi-3.1-9.fc24.x86_64 libselinux-2.5-9.fc24.x86_64 libuuid-2.28-3.fc24.x86_64 openssl-libs-1.0.2h-1.fc24.x86_64 pcre-8.39-2.fc24.x86_64 zlib-1.2.8-10.fc24.x86_64 (gdb) bt #0 list_add_tail (head=0x90, new=0x7fcac00040b8) at list.h:41 #1 list_move_tail (head=0x90, list=0x7fcac00040b8) at list.h:107 #2 __inode_retire (inode=0x7fcac0004030) at inode.c:439 #3 0x00007fcad764100e in inode_table_prune (table=table@entry=0x7fcac0004040) at inode.c:1521 #4 0x00007fcad7642f22 in inode_table_destroy (inode_table=0x7fcac0004040) at inode.c:1808 #5 0x00007fcad7642fee in inode_table_destroy_all (ctx=ctx@entry=0x55f82360b430) at inode.c:1733 #6 0x00007fcad7f3fde6 in pub_glfs_fini (fs=0x55f823537950) at glfs.c:1204 Expected results: No crash Additional_info: [root@f24 ~]# rpm -qa | grep glusterfs-debuginfo glusterfs-debuginfo-3.8.1-1.fc24.x86_64 [root@f24 ~]# rpm -qa | grep python-debuginfo python-debuginfo-2.7.12-1.fc24.x86_64 [root@f24 coredump]# ls -lh total 311M -rw-r--r--. 1 root root 350M Aug 2 17:31 core.python.0.21e34182be6844658e00bba43a55dfa0.4165.1470139182000000000000 -rw-r-----. 1 root root 22M Aug 2 17:29 core.python.0.21e34182be6844658e00bba43a55dfa0.4165.1470139182000000000000.lz4 Can provide the coredump offline.
Created attachment 1186826 [details] Compressed core dump
Also, I can't seem to reproduce this when the test filesystem tree is small enough.
I suspect below could have caused the issue - In inode_table_destroy(), we first purge all the lru entries but the lru count is not adjusted accordingly. So when inode_table_prune() is called in case if the lru count was larger than lru limit (as can be seen in the core), we shall end up accessing invalid memory. (gdb) f 3 #3 0x00007fcad764100e in inode_table_prune (table=table@entry=0x7fcac0004040) at inode.c:1521 1521 __inode_retire (entry); (gdb) p table->lru_size $4 = 132396 (gdb) p table->lru_limit $5 = 131072 (gdb) p table->lru $6 = {next = 0x90, prev = 0xcafecafe} (gdb) p &&table->lru A syntax error in expression, near `&&table->lru'. (gdb) p &table->lru $7 = (struct list_head *) 0x7fcac00040b8 (gdb) I will send a fix for it.
(In reply to Soumya Koduri from comment #3) > I suspect below could have caused the issue - > > In inode_table_destroy(), we first purge all the lru entries but the lru > count is not adjusted accordingly. This is not true. Can you point out the code where an inode is moved into/out of lru list but lru_size is not modified atomically? I think we need to explore further to find out why lru_size and lru_list were out of sync, before considering this bug as closed with patch http://review.gluster.org/#/c/15087/
(In reply to Raghavendra G from comment #4) > (In reply to Soumya Koduri from comment #3) > > I suspect below could have caused the issue - > > > > In inode_table_destroy(), we first purge all the lru entries but the lru > > count is not adjusted accordingly. > > This is not true. Can you point out the code where an inode is moved > into/out of lru list but lru_size is not modified atomically? Sorry I missed out that inode_table_destroy is the culprit. We need to fix inode_table_destroy to update lru_size while moving inodes from lru to purge list (just like inode_table_prune). > > I think we need to explore further to find out why lru_size and lru_list > were out of sync, before considering this bug as closed with patch > http://review.gluster.org/#/c/15087/
(In reply to Raghavendra G from comment #5) > (In reply to Raghavendra G from comment #4) > > (In reply to Soumya Koduri from comment #3) > > > I suspect below could have caused the issue - > > > > > > In inode_table_destroy(), we first purge all the lru entries but the lru > > > count is not adjusted accordingly. > > > > This is not true. Can you point out the code where an inode is moved > > into/out of lru list but lru_size is not modified atomically? > > Sorry I missed out that inode_table_destroy is the culprit. We need to fix > inode_table_destroy to update lru_size while moving inodes from lru to purge > list (just like inode_table_prune). > yes.. and that is exactly what http://review.gluster.org/#/c/15087/ does. Sorry ..am I missing anything?
REVIEW: http://review.gluster.org/15129 (inode: Adjust lru_size while retiring entries in lru list) posted (#2) for review on release-3.8 by Oleksandr Natalenko (oleksandr)
*** Bug 1365748 has been marked as a duplicate of this bug. ***
COMMIT: http://review.gluster.org/15129 committed in release-3.8 by Niels de Vos (ndevos) ------ commit dae860ab7e1c5a205646393f2cb80a0a06986c30 Author: Soumya Koduri <skoduri> Date: Thu Aug 4 16:00:31 2016 +0530 inode: Adjust lru_size while retiring entries in lru list As part of inode_table_destroy(), we first retire entries in the lru list but the lru_size is not adjusted accordingly. This may result in invalid memory reference in inode_table_prune if the lru_size > lru_limit. > Reviewed-on: http://review.gluster.org/15087 > Smoke: Gluster Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > Reviewed-by: Raghavendra G <rgowdapp> > Reviewed-by: Prashanth Pai <ppai> BUG: 1362540 Change-Id: I29ee3c03b0eaa8a118d06dc0cefba85877daf963 Signed-off-by: Soumya Koduri <skoduri> Signed-off-by: Oleksandr Natalenko <oleksandr> Reviewed-on: http://review.gluster.org/15129 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Prashanth Pai <ppai> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Niels de Vos <ndevos>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.3, please open a new bug report. glusterfs-3.8.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://www.gluster.org/pipermail/announce/2016-August/000059.html [2] https://www.gluster.org/pipermail/gluster-users/