Bug 1212110 - bricks process crash
Summary: bricks process crash
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: posix
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
: 1222942 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-15 14:53 UTC by Saurabh
Modified: 2016-06-16 12:52 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-16 12:52:08 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
coredump of brick process (11.49 MB, application/x-xz)
2015-04-15 14:59 UTC, Saurabh
no flags Details

Description Saurabh 2015-04-15 14:53:11 UTC
Description of problem:

I was having the fs-sanity getting executed over glusterfs-nfs verions=3,

Meanwhile this execution, one the brick process has dumped core,
starting off this bz with posix.

(gdb) bt
#0  0x0000003315e0c380 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007fbfac734d6d in _posix_handle_xattr_keyvalue_pair (d=0x7fbfb6003044, k=0x7fbf5c045310 "trusted.glusterfs.quota.6e6c3d48-fafb-4c3d-a8bb-8cf6240454e1.contri", v=0x7fbfb5e1f7ac, tmp=0x7fbefe9dbb10)
    at posix.c:4648
#2  0x00007fbfb77623a3 in dict_foreach_match (dict=0x7fbfb6003044, match=0x7fbfb7762320 <dict_match_everything>, match_data=0x0, action=0x7fbfac734d10 <_posix_handle_xattr_keyvalue_pair>, 
    action_data=0x7fbefe9dbb10) at dict.c:1182
#3  0x00007fbfb7762438 in dict_foreach (dict=<value optimized out>, fn=<value optimized out>, data=<value optimized out>) at dict.c:1141
#4  0x00007fbfac7342e5 in do_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa8006880, loc=0x7fbfb608eb5c, fd=<value optimized out>, optype=GF_XATTROP_ADD_ARRAY64, xattr=0x7fbfb6003044) at posix.c:4821
#5  0x00007fbfac7347f1 in posix_xattrop (frame=<value optimized out>, this=<value optimized out>, loc=<value optimized out>, optype=<value optimized out>, xattr=<value optimized out>, 
    xdata=<value optimized out>) at posix.c:4836
#6  0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa8009030, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>)
    at defaults.c:1978
#7  0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa800a5f0, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>)
    at defaults.c:1978
#8  0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa800cc00, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>)
    at defaults.c:1978
#9  0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa800eb20, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>)
    at defaults.c:1978
#10 0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa8010060, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>)
    at defaults.c:1978
#11 0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa80113d0, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>)
    at defaults.c:1978
#12 0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa8012750, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>)
    at defaults.c:1978
#13 0x00007fbfb7772d22 in default_xattrop_resume (frame=0x7fbfb660866c, this=0x7fbfa8013bb0, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=0x7fbfb6003044, xdata=0x0) at defaults.c:1539
#14 0x00007fbfb778e080 in call_resume (stub=0x7fbfb608eb1c) at call-stub.c:2894
#15 0x00007fbfa7118398 in iot_worker (data=0x7fbfa8052990) at io-threads.c:214
#16 0x0000003315e079d1 in start_thread () from /lib64/libpthread.so.0
#17 0x0000003315ae88fd in clone () from /lib64/libc.so.6


Version-Release number of selected component (if applicable):
glusterfs-3.7dev-0.994.gitf522001.el6.x86_64
How reproducible:
seen this time

Actual results:
[root@nfs-rdma1 ~]# gluster volume status
Status of volume: vol0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.36.45:/rhs/brick1/d1r1          N/A       N/A        N       11462
Brick 10.70.36.47:/rhs/brick1/d1r1          49152     0          Y       21534
Brick 10.70.36.45:/rhs/brick1/d2r1          49153     0          Y       11479
Brick 10.70.36.47:/rhs/brick1/d2r2          49153     0          Y       21551
NFS Server on localhost                     2049      0          Y       11500
Self-heal Daemon on localhost               N/A       N/A        Y       11507
Quota Daemon on localhost                   N/A       N/A        Y       11514
NFS Server on 10.70.36.47                   2049      0          Y       21572
Self-heal Daemon on 10.70.36.47             N/A       N/A        Y       21579
Quota Daemon on 10.70.36.47                 N/A       N/A        Y       21586
 
Task Status of Volume vol0
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@nfs-rdma1 ~]# gluster volume info
 
Volume Name: vol0
Type: Distributed-Replicate
Volume ID: 7336876c-c9ab-4dfc-8931-98d84d71d05b
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.36.45:/rhs/brick1/d1r1
Brick2: 10.70.36.47:/rhs/brick1/d1r1
Brick3: 10.70.36.45:/rhs/brick1/d2r1
Brick4: 10.70.36.47:/rhs/brick1/d2r2
Options Reconfigured:
nfs.disable: off
features.quota: on
features.quota-deem-statfs: on

logs from brick process,
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-04-14 21:04:03
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7dev
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7fbfb7769d26]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x7fbfb7785a3f]
/lib64/libc.so.6[0x3315a326a0]
/lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x3315e0c380]
/usr/lib64/glusterfs/3.7dev/xlator/storage/posix.so(+0xbd6d)[0x7fbfac734d6d]
/usr/lib64/libglusterfs.so.0(dict_foreach_match+0x73)[0x7fbfb77623a3]
/usr/lib64/libglusterfs.so.0(dict_foreach+0x18)[0x7fbfb7762438]
/usr/lib64/glusterfs/3.7dev/xlator/storage/posix.so(do_xattrop+0x135)[0x7fbfac7342e5]
/usr/lib64/glusterfs/3.7dev/xlator/storage/posix.so(posix_xattrop+0x11)[0x7fbfac7347f1]
/usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663]
/usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663]
/usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663]
/usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663]
/usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663]
/usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663]
/usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663]
/usr/lib64/libglusterfs.so.0(default_xattrop_resume+0x142)[0x7fbfb7772d22]
/usr/lib64/libglusterfs.so.0(call_resume+0x80)[0x7fbfb778e080]
/usr/lib64/glusterfs/3.7dev/xlator/performance/io-threads.so(iot_worker+0x158)[0x7fbfa7118398]
/lib64/libpthread.so.0[0x3315e079d1]
/lib64/libc.so.6(clone+0x6d)[0x3315ae88fd]
---------
(END) 

Expected results:
With fs-sanity in execution brick process crash in unexpected.

Additional info:

Comment 1 Saurabh 2015-04-15 14:59:01 UTC
Created attachment 1014818 [details]
coredump of brick process

Comment 2 Krutika Dhananjay 2015-04-16 02:54:05 UTC
The most recent change that went into master in posix xlator was http://review.gluster.org/#/c/10180/ (sent by me!), which was merged on April 13th. It's probably what caused the crash.

Will investigate. Thanks for the bug report.

Comment 3 Apeksha 2015-04-16 06:42:45 UTC
Seeing the same crashes on the servers while running BVT.

Comment 4 Krutika Dhananjay 2015-04-16 07:07:24 UTC
OK, it has nothing to do with http://review.gluster.org/#/c/10180/. RCA'd it. Will send out a patch soon.

The bug is hit with quota due to a race between xattrop and unlink.
After quota winds an xattrop fop on a path, if by the time it reaches posix, another client has unlinked it, the brick crashes due to the absence of the gfid link && the absence of a proper NULL-check to handle this case. The fix involves handling the case of missing gfid link on the backend, in posix xlator appropriately and returning failure to the xlator above.

Comment 6 Pranith Kumar K 2015-05-29 08:59:22 UTC
Sent http://review.gluster.org/10999 to identify the code path where this can happen a lot earlier. This is not a fix but a step in identifying how the malformed link got created in the first place. So not moving the bug to POST

Comment 7 Pranith Kumar K 2015-06-01 08:11:26 UTC
http://review.gluster.org/11028 is sent to prevent crash when this issue is hit. Now it will fail with ESTALE instead of crashing. This is still not the complete fix. So not moving bug to POST.

Comment 8 Anand Avati 2015-06-04 13:02:17 UTC
REVIEW: http://review.gluster.org/10999 (storage/posix: Prevent malformed internal link creations) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 9 Anand Avati 2015-06-06 17:03:25 UTC
REVIEW: http://review.gluster.org/10999 (storage/posix: Prevent malformed internal link creations) posted (#3) for review on master by Vijay Bellur (vbellur)

Comment 10 Pranith Kumar K 2015-06-10 17:41:51 UTC
*** Bug 1222942 has been marked as a duplicate of this bug. ***

Comment 11 Anand Avati 2015-06-16 07:27:05 UTC
COMMIT: http://review.gluster.org/11028 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 476d4070dbdb9d73b36bd04f3eb3d6eda84abe73
Author: Pranith Kumar K <pkarampu>
Date:   Mon Jun 1 13:34:33 2015 +0530

    storage/posix: Handle MAKE_INODE_HANDLE failures
    
    Change-Id: Ia176ccd4cac82c66ba50e3896fbe72c2da860c20
    BUG: 1212110
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/11028
    Reviewed-by: Krutika Dhananjay <kdhananj>
    Tested-by: NetBSD Build System <jenkins.org>

Comment 12 Anand Avati 2015-07-09 06:10:38 UTC
REVIEW: http://review.gluster.org/10999 (storage/posix: Prevent malformed internal link creations) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 13 Anand Avati 2015-07-09 10:50:35 UTC
REVIEW: http://review.gluster.org/10999 (storage/posix: Prevent malformed internal link creations) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 15 Niels de Vos 2016-06-16 12:52:08 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.