Bug 1362540 - glfs_fini() crashes with SIGSEGV
Summary: glfs_fini() crashes with SIGSEGV
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: libgfapi
Version: 3.8.1
Hardware: x86_64
OS: All
unspecified
medium
Target Milestone: ---
Assignee: Soumya Koduri
QA Contact: Sudhir D
URL:
Whiteboard:
: 1365748 (view as bug list)
Depends On: 1364026
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-02 13:13 UTC by Prashanth Pai
Modified: 2016-08-24 10:20 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8.3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1364026 (view as bug list)
Environment:
Last Closed: 2016-08-24 10:20:46 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Sample reproducer using libgfapi-python (252 bytes, text/x-python)
2016-08-02 13:13 UTC, Prashanth Pai
no flags Details
Compressed core dump (17.10 MB, application/octet-stream)
2016-08-02 13:24 UTC, Prashanth Pai
no flags Details

Description Prashanth Pai 2016-08-02 13:13:38 UTC
Created attachment 1186818 [details]
Sample reproducer using libgfapi-python

Description of problem:
I was trying to benchmark libgfapi-python bindings for filesystem walk (metadata intensive) workload to do comparison against FUSE for same workload. The program crashes during virtual unmount (fini).

Setup details:

[root@f24 ~]# rpm -qa | grep gluster
glusterfs-3.8.1-1.fc24.x86_64
python-gluster-3.8.1-1.fc24.noarch
glusterfs-libs-3.8.1-1.fc24.x86_64
glusterfs-client-xlators-3.8.1-1.fc24.x86_64
glusterfs-cli-3.8.1-1.fc24.x86_64
glusterfs-fuse-3.8.1-1.fc24.x86_64
glusterfs-server-3.8.1-1.fc24.x86_64
glusterfs-api-3.8.1-1.fc24.x86_64
glusterfs-debuginfo-3.8.1-1.fc24.x86_64

[root@f24 ~]# uname -a
Linux f24 4.6.4-301.fc24.x86_64 #1 SMP Tue Jul 12 11:50:00 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

[root@f24]# gluster volume info
 
Volume Name: test
Type: Distributed-Replicate
Volume ID: e675d53a-c9b6-468f-bb8f-7101828bec70
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: f24:/export/brick1/data
Brick2: f24:/export/brick2/data
Brick3: f24:/export/brick3/data
Brick4: f24:/export/brick4/data
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet

NOTE: The volume is created with 4 RAM disks (1GB each) as bricks.

[root@f24 ~]# df -h | grep 'ram\|Filesystem'
Filesystem               Size  Used Avail Use% Mounted on
/dev/ram1                971M  272M  700M  28% /export/brick1
/dev/ram2                971M  272M  700M  28% /export/brick2
/dev/ram3                971M  272M  700M  28% /export/brick3
/dev/ram4                971M  272M  700M  28% /export/brick4

[root@f24 ~]# df -ih | grep 'ram\|Filesystem'
Filesystem              Inodes IUsed IFree IUse% Mounted on
/dev/ram1                 489K  349K  140K   72% /export/brick1
/dev/ram2                 489K  349K  140K   72% /export/brick2
/dev/ram3                 489K  349K  140K   72% /export/brick3
/dev/ram4                 489K  349K  140K   72% /export/brick4


How reproducible:
Always and consistently (at least on my fedora 24 test VM)

Steps to Reproduce:
1. Create large nested directory tree using FUSE mount.
2. Unmount FUSE mount. This is just to generate initial data.
3. Use python-libgfapi bindings with patch (http://review.gluster.org/#/c/14770/). It's reproducible without this patch too but the patch makes the crawling faster.
4. Run the python script that does walk using libgfapi

[root@f24 ~]# ./reproducer.py 
Segmentation fault (core dumped)

There *could* be a simple reproducer too but I haven't had the time to look further into it.

Excerpt from bt:

#0  list_add_tail (head=0x90, new=0x7fcac00040b8) at list.h:41
41		new->prev = head->prev;
[Current thread is 1 (Thread 0x7fcae3dbb700 (LWP 4165))]
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.23.1-8.fc24.x86_64 keyutils-libs-1.5.9-8.fc24.x86_64 krb5-libs-1.14.1-8.fc24.x86_64 libacl-2.2.52-11.fc24.x86_64 libattr-2.4.47-16.fc24.x86_64 libcom_err-1.42.13-4.fc24.x86_64 libffi-3.1-9.fc24.x86_64 libselinux-2.5-9.fc24.x86_64 libuuid-2.28-3.fc24.x86_64 openssl-libs-1.0.2h-1.fc24.x86_64 pcre-8.39-2.fc24.x86_64 zlib-1.2.8-10.fc24.x86_64
(gdb) bt
#0  list_add_tail (head=0x90, new=0x7fcac00040b8) at list.h:41
#1  list_move_tail (head=0x90, list=0x7fcac00040b8) at list.h:107
#2  __inode_retire (inode=0x7fcac0004030) at inode.c:439
#3  0x00007fcad764100e in inode_table_prune (table=table@entry=0x7fcac0004040) at inode.c:1521
#4  0x00007fcad7642f22 in inode_table_destroy (inode_table=0x7fcac0004040) at inode.c:1808
#5  0x00007fcad7642fee in inode_table_destroy_all (ctx=ctx@entry=0x55f82360b430) at inode.c:1733
#6  0x00007fcad7f3fde6 in pub_glfs_fini (fs=0x55f823537950) at glfs.c:1204


Expected results:
No crash

Additional_info:

[root@f24 ~]# rpm -qa | grep glusterfs-debuginfo
glusterfs-debuginfo-3.8.1-1.fc24.x86_64
[root@f24 ~]# rpm -qa | grep python-debuginfo
python-debuginfo-2.7.12-1.fc24.x86_64

[root@f24 coredump]# ls -lh
total 311M
-rw-r--r--. 1 root root 350M Aug  2 17:31 core.python.0.21e34182be6844658e00bba43a55dfa0.4165.1470139182000000000000
-rw-r-----. 1 root root  22M Aug  2 17:29 core.python.0.21e34182be6844658e00bba43a55dfa0.4165.1470139182000000000000.lz4

Can provide the coredump offline.

Comment 1 Prashanth Pai 2016-08-02 13:24:40 UTC
Created attachment 1186826 [details]
Compressed core dump

Comment 2 Prashanth Pai 2016-08-02 13:35:30 UTC
Also, I can't seem to reproduce this when the test filesystem tree is small enough.

Comment 3 Soumya Koduri 2016-08-04 10:14:55 UTC
I suspect below could have caused the issue - 

In inode_table_destroy(), we first purge all the lru entries but the lru count is not adjusted accordingly. So when inode_table_prune() is called in case if the lru count was larger than lru limit (as can be seen in the core), we shall end up accessing invalid memory. 

(gdb) f 3
#3  0x00007fcad764100e in inode_table_prune (table=table@entry=0x7fcac0004040) at inode.c:1521
1521	                        __inode_retire (entry);
(gdb) p table->lru_size
$4 = 132396
(gdb) p table->lru_limit
$5 = 131072
(gdb) p table->lru
$6 = {next = 0x90, prev = 0xcafecafe}
(gdb) p &&table->lru
A syntax error in expression, near `&&table->lru'.
(gdb) p &table->lru
$7 = (struct list_head *) 0x7fcac00040b8
(gdb) 

I will send a fix for it.

Comment 4 Raghavendra G 2016-08-09 11:30:21 UTC
(In reply to Soumya Koduri from comment #3)
> I suspect below could have caused the issue - 
> 
> In inode_table_destroy(), we first purge all the lru entries but the lru
> count is not adjusted accordingly.

This is not true. Can you point out the code where an inode is moved into/out of lru list but lru_size is not modified atomically?

I think we need to explore further to find out why lru_size and lru_list were out of sync, before considering this bug as closed with patch http://review.gluster.org/#/c/15087/

Comment 5 Raghavendra G 2016-08-09 11:34:36 UTC
(In reply to Raghavendra G from comment #4)
> (In reply to Soumya Koduri from comment #3)
> > I suspect below could have caused the issue - 
> > 
> > In inode_table_destroy(), we first purge all the lru entries but the lru
> > count is not adjusted accordingly.
> 
> This is not true. Can you point out the code where an inode is moved
> into/out of lru list but lru_size is not modified atomically?

Sorry I missed out that inode_table_destroy is the culprit. We need to fix inode_table_destroy to update lru_size while moving inodes from lru to purge list (just like inode_table_prune).

> 
> I think we need to explore further to find out why lru_size and lru_list
> were out of sync, before considering this bug as closed with patch
> http://review.gluster.org/#/c/15087/

Comment 6 Soumya Koduri 2016-08-09 11:49:53 UTC
(In reply to Raghavendra G from comment #5)
> (In reply to Raghavendra G from comment #4)
> > (In reply to Soumya Koduri from comment #3)
> > > I suspect below could have caused the issue - 
> > > 
> > > In inode_table_destroy(), we first purge all the lru entries but the lru
> > > count is not adjusted accordingly.
> > 
> > This is not true. Can you point out the code where an inode is moved
> > into/out of lru list but lru_size is not modified atomically?
> 
> Sorry I missed out that inode_table_destroy is the culprit. We need to fix
> inode_table_destroy to update lru_size while moving inodes from lru to purge
> list (just like inode_table_prune).
> 

yes.. and that is exactly what http://review.gluster.org/#/c/15087/ does. Sorry ..am I missing anything?

Comment 7 Vijay Bellur 2016-08-10 07:27:11 UTC
REVIEW: http://review.gluster.org/15129 (inode: Adjust lru_size while retiring entries in lru list) posted (#2) for review on release-3.8 by Oleksandr Natalenko (oleksandr)

Comment 8 Soumya Koduri 2016-08-10 07:27:40 UTC
*** Bug 1365748 has been marked as a duplicate of this bug. ***

Comment 9 Vijay Bellur 2016-08-13 08:04:04 UTC
COMMIT: http://review.gluster.org/15129 committed in release-3.8 by Niels de Vos (ndevos) 
------
commit dae860ab7e1c5a205646393f2cb80a0a06986c30
Author: Soumya Koduri <skoduri>
Date:   Thu Aug 4 16:00:31 2016 +0530

    inode: Adjust lru_size while retiring entries in lru list
    
    As part of inode_table_destroy(), we first retire entries
    in the lru list but the lru_size is not adjusted accordingly.
    This may result in invalid memory reference in inode_table_prune
    if the lru_size > lru_limit.
    
    > Reviewed-on: http://review.gluster.org/15087
    > Smoke: Gluster Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > Reviewed-by: Raghavendra G <rgowdapp>
    > Reviewed-by: Prashanth Pai <ppai>
    
    BUG: 1362540
    Change-Id: I29ee3c03b0eaa8a118d06dc0cefba85877daf963
    Signed-off-by: Soumya Koduri <skoduri>
    Signed-off-by: Oleksandr Natalenko <oleksandr>
    Reviewed-on: http://review.gluster.org/15129
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Prashanth Pai <ppai>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Niels de Vos <ndevos>

Comment 10 Niels de Vos 2016-08-24 10:20:46 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.3, please open a new bug report.

glusterfs-3.8.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/announce/2016-August/000059.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.