Bug 1430360 - glusterfsd segfault in trash_truncate_stat_cbk
Summary: glusterfsd segfault in trash_truncate_stat_cbk
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: trash-xlator
Version: mainline
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Anoop C S
QA Contact:
URL:
Whiteboard:
: 1432043 (view as bug list)
Depends On: 1432043
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-08 12:54 UTC by GCth
Modified: 2019-06-18 10:31 UTC (History)
5 users (show)

Fixed In Version: glusterfs-6.x
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1432043 (view as bug list)
Environment:
Last Closed: 2019-06-18 10:31:33 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description GCth 2017-03-08 12:54:26 UTC
Description of problem:

I'm experiencing random segmentation faults of glusterfsd process. The problem started appearing since the trash has been enabled.

Version-Release number of selected component (if applicable):

Debian Jessie, the packages are up to date:

ii  glusterfs-client                3.8.9-1                     amd64        clustered file-system (client package)
ii  glusterfs-common                3.8.9-1                     amd64        GlusterFS common libraries and translator modules
ii  glusterfs-dbg                   3.8.9-1                     amd64        GlusterFS debugging symbols
ii  glusterfs-server                3.8.9-1                     amd64        clustered file-system (server package)


How reproducible:

I've assembled the following cluster:

Volume Name: volume1
Type: Distributed-Replicate
Volume ID: c4c0e0a4-e705-472d-8d27-485619cc66db
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.10.9.7:/export/data1
Brick2: 10.10.9.8:/export/data1
Brick3: 10.10.9.9:/export/data1
Brick4: 10.10.9.10:/export/data1
Brick5: 10.10.9.5:/export/data1
Brick6: 10.10.9.6:/export/data1
Options Reconfigured:
features.trash: on
features.trash-eliminate-path: _REMOVED,_db_backup,*/private
performance.readdir-ahead: on
cluster.self-heal-daemon: enable
server.allow-insecure: on
performance.read-ahead: on
cluster.min-free-disk: 5
performance.stat-prefetch: on
performance.quick-read: on
auth.allow: 10.*.*.*
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: ERROR
nfs.disable: on
features.trash-max-filesize: 100MB
performance.cache-size: 1GB
cluster.favorite-child-policy: mtime
cluster.server-quorum-ratio: 51%


Actual results:

Brick 10.10.9.10:/export/data1              N/A       N/A        N       N/A  
Status of volume: volume1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.9.7:/export/data1               49154     0          Y       25300
Brick 10.10.9.8:/export/data1               49154     0          Y       18486
Brick 10.10.9.9:/export/data1               49156     0          Y       32131
Brick 10.10.9.10:/export/data1              N/A       N/A        N       N/A  
Brick 10.10.9.5:/export/data1               49154     0          Y       8549 
Brick 10.10.9.6:/export/data1               49154     0          Y       18783
Self-heal Daemon on localhost               N/A       N/A        Y       25574
Self-heal Daemon on 10.10.9.9               N/A       N/A        Y       23093
Self-heal Daemon on 10.10.9.6               N/A       N/A        Y       10453
Self-heal Daemon on 10.10.9.8               N/A       N/A        Y       21167
Self-heal Daemon on 10.10.9.7               N/A       N/A        Y       25331
Self-heal Daemon on 10.10.9.5               N/A       N/A        Y       26096
 
Task Status of Volume volume1
------------------------------------------------------------------------------
There are no active volume tasks

Notice the unavailability of 10.10.9.10


Expected results:

glusterfsd not crashing.


Additional info:

The core dump shows:

Core was generated by `/usr/sbin/glusterfsd -s 10.10.9.10 --volfile-id volume1.10.10.9.10.export-data1 -p'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
106     ../sysdeps/x86_64/strlen.S: No such file or directory.
(gdb) bt
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x00007fd15b5aca63 in gf_strdup (src=<optimized out>) at ../../../../libglusterfs/src/mem-pool.h:185
#2  trash_truncate_stat_cbk (frame=0x7fd163965058, cookie=0x0, this=0x7fd15c0097a0, op_ret=0, op_errno=op_errno@entry=0, buf=0x7fd14180e930, xdata=0x7fd163113e14) at trash.c:1630
#3  0x00007fd15bdceac6 in posix_stat (frame=0x7fd163962d90, this=<optimized out>, loc=<optimized out>, xdata=<optimized out>) at posix.c:310
#4  0x00007fd15b5ad943 in trash_truncate (frame=0x7fd163965058, this=0x7fd15c0097a0, loc=0x7fd1631d7710, offset=140537295678864, xdata=0x0) at trash.c:1780
#5  0x00007fd15b384042 in ctr_truncate (frame=0x7fd16395ac60, this=0x7fd15c00b100, loc=0x7fd1631d7710, offset=0, xdata=0x7fd1631327e0) at changetimerecorder.c:731
#6  0x00007fd15ac8be24 in changelog_truncate (frame=0x7fd163966438, this=0x7fd15c00de60, loc=0x7fd1631d7710, offset=0, xdata=0x7fd1631327e0) at changelog.c:1753
#7  0x00007fd15aa6dbff in br_stub_truncate_resume (frame=0x7fd163955ce0, this=0x7fd15c00f680, loc=0x7fd1631d7710, offset=0, xdata=0x7fd1631327e0) at bit-rot-stub.c:2051
#8  0x00007fd165e99a2d in call_resume (stub=0x7fd1631d76c0) at call-stub.c:2508
#9  0x00007fd15aa71036 in br_stub_fd_incversioning_cbk (frame=0x7fd163955ce0, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=<optimized out>, xdata=<optimized out>) at bit-rot-stub.c:613
#10 0x00007fd15ac86ff4 in changelog_fsetxattr_cbk (frame=0x7fd163966efc, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, xdata=0x0) at changelog.c:1538
#11 0x00007fd15b38873c in ctr_fsetxattr_cbk (frame=0x7fd163960920, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, xdata=0x0) at changetimerecorder.c:1294
#12 0x00007fd15bdd8813 in posix_fsetxattr (frame=frame@entry=0x7fd16396650c, this=this@entry=0x7fd15c006d00, fd=fd@entry=0x7fd158018484, dict=dict@entry=0x7fd163182dd8, flags=flags@entry=0, xdata=xdata@entry=0x7fd163104ce0) at posix.c:5036
#13 0x00007fd165eec3eb in default_fsetxattr (frame=0x7fd16396650c, this=<optimized out>, fd=0x7fd158018484, dict=0x7fd163182dd8, flags=0, xdata=0x7fd163104ce0) at defaults.c:2328
#14 0x00007fd15b381f1d in ctr_fsetxattr (frame=0x7fd163960920, this=0x7fd15c00b100, fd=0x7fd158018484, dict=0x7fd163182dd8, flags=0, xdata=0x7fd163104ce0) at changetimerecorder.c:1325
#15 0x00007fd15ac891d0 in changelog_fsetxattr (frame=0x7fd163966efc, this=0x7fd15c00de60, fd=0x7fd158018484, dict=0x7fd163182dd8, flags=0, xdata=0x7fd163104ce0) at changelog.c:1571
#16 0x00007fd15aa7533b in br_stub_fd_versioning (this=0x7fd15c00f680, frame=0x7fd163955ce0, stub=0x7fd163966efc, dict=0x0, dict@entry=0x7fd163182dd8, fd=0x0, fd@entry=0x7fd158018484, callback=0x7fd15c00f680, memversion=6, versioningtype=2, durable=0)
    at bit-rot-stub.c:682
#17 0x00007fd15aa75508 in br_stub_perform_incversioning (this=0x7fd15c00f680, frame=0x7fd163955ce0, stub=0x7fd1631d76c0, fd=0x7fd158018484, ctx=<optimized out>) at bit-rot-stub.c:723
#18 0x00007fd15aa77284 in br_stub_truncate (frame=0x7fd163955ce0, this=0x7fd15c00f680, loc=0x7fd15c086610, offset=0, xdata=0x7fd1631327e0) at bit-rot-stub.c:2131
#19 0x00007fd15a85bfc8 in posix_acl_truncate (frame=0x7fd1639536c8, this=0x7fd15c010cf0, loc=0x7fd15c086610, off=0, xdata=0x7fd1631327e0) at posix-acl.c:1080
#20 0x00007fd15a641406 in truncate_stat_cbk (frame=0x7fd16395dfb8, cookie=0x0, this=0x7fd15c012260, op_ret=1543578208, op_errno=1670723272, buf=0x7fd14180e930, buf@entry=0x7fd14180fa50, xdata=0x0) at posix.c:795
#21 0x00007fd15bdceac6 in posix_stat (frame=frame@entry=0x7fd16396335c, this=this@entry=0x7fd15c006d00, loc=loc@entry=0x7fd163221068, xdata=xdata@entry=0x0) at posix.c:310
#22 0x00007fd165eed427 in default_stat (frame=frame@entry=0x7fd16396335c, this=this@entry=0x7fd15c0097a0, loc=loc@entry=0x7fd163221068, xdata=xdata@entry=0x0) at defaults.c:2647
#23 0x00007fd165eed427 in default_stat (frame=frame@entry=0x7fd16396335c, this=this@entry=0x7fd15c00b100, loc=loc@entry=0x7fd163221068, xdata=xdata@entry=0x0) at defaults.c:2647
#24 0x00007fd165eed427 in default_stat (frame=0x7fd16396335c, this=<optimized out>, loc=0x7fd163221068, xdata=0x0) at defaults.c:2647
#25 0x00007fd15aa721c7 in br_stub_stat (frame=0x7fd16396335c, this=0x7fd15c00de60, loc=0x7fd163221068, xdata=0x0) at bit-rot-stub.c:2818
#26 0x00007fd165eed427 in default_stat (frame=0x7fd16396335c, this=<optimized out>, loc=0x7fd163221068, xdata=0x0) at defaults.c:2647
#27 0x00007fd15a63a0f5 in pl_truncate (frame=0x7fd16395dfb8, this=0x7fd15c012260, loc=0x7fd163221068, offset=0, xdata=0x7fd15c012260) at posix.c:855
#28 0x00007fd15a42de57 in worm_truncate (frame=0x7fd16395dfb8, this=0x7fd15c012260, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at worm.c:185
#29 0x00007fd15a2206ec in ro_truncate (frame=0x7fd16395dfb8, this=0x7fd15c013680, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at read-only-common.c:175
#30 0x00007fd15a00fbca in leases_truncate (frame=0x7fd163958b40, this=0x7fd15c0162b0, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at leases.c:312
#31 0x00007fd159dfc2ce in up_truncate (frame=0x7fd163955640, this=0x7fd15c0177e0, loc=0x7fd163221068, offset=0, xdata=0x0) at upcall.c:301
#32 0x00007fd165f04d81 in default_truncate_resume (frame=0x7fd163967178, this=0x7fd15c018d90, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at defaults.c:1944
#33 0x00007fd165e99a2d in call_resume (stub=0x7fd163221018) at call-stub.c:2508
#34 0x00007fd159bee917 in iot_worker (data=0x7fd15c067480) at io-threads.c:220
#35 0x00007fd1650f6064 in start_thread (arg=0x7fd141810700) at pthread_create.c:309
#36 0x00007fd164a2f62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Comment 1 Jeff Darcy 2017-03-08 13:21:08 UTC
The immediate problem seems to be that trash_truncate_stat_cbk assumes local->loc.path will be non-NULL, but that's not entirely guaranteed to be the case.  In general, a loc_t can be used to resolve an inode in several ways, only some (and the less preferred ones at that) involving the path/name fields.  Adding a NULL check should help, but it might also be interesting to find out why we're being called this way in case there are other implications of something the code clearly does not expect.

Comment 2 Niels de Vos 2017-11-07 10:41:28 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Comment 3 GCth 2017-11-20 21:45:52 UTC
Same issue with 3.10.7:

(gdb) bt full
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
No locals.
#1  0x00007fc629f0a3a3 in gf_strdup (src=<optimized out>) at ../../../../libglusterfs/src/mem-pool.h:185
        dup_str = 0x0
        len = 0
#2  trash_truncate_stat_cbk (frame=0x7fc5c82e74e0, cookie=0x0, this=0x7fc62400ab50, op_ret=0, op_errno=op_errno@entry=0, buf=0x7fc610c5b950, xdata=0x7fc5c82f3a60) at trash.c:1909
        priv = 0x7fc62407ada0
        local = 0x7fc5c8377860
        loc_newname = '\000' <repeats 4095 times>
        ret = 0
        __FUNCTION__ = "trash_truncate_stat_cbk"
#3  0x00007fc62a72ccc2 in posix_stat (frame=0x7fc5c82c44d0, this=<optimized out>, loc=<optimized out>, xdata=<optimized out>) at posix.c:356
        fn = 0x7fc629f09f40 <trash_truncate_stat_cbk>
        _parent = 0x7fc5c82e74e0
        old_THIS = 0x7fc624008350
        buf = {ia_ino = 11222377774609377881, ia_gfid = "fB\036v\345\267M[\233\275\345 \226\324\252Y", ia_dev = 64768, ia_type = IA_IFREG, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 1 '\001', write = 1 '\001', exec = 0 '\000'}, group = {read = 1 '\001', write = 0 '\000', 
              exec = 0 '\000'}, other = {read = 1 '\001', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 1, ia_uid = 1238, ia_gid = 0, ia_rdev = 0, ia_size = 124016, ia_blksize = 4096, ia_blocks = 243, ia_atime = 1510917960, ia_atime_nsec = 197765292, ia_mtime = 1510381517, ia_mtime_nsec = 551886884, 
          ia_ctime = 1511209751, ia_ctime_nsec = 823985111}
        op_ret = <optimized out>
        op_errno = <optimized out>
        priv = <optimized out>
        real_path = <optimized out>
        xattr_rsp = <optimized out>
        __FUNCTION__ = "posix_stat"
#4  0x00007fc629f0b2d5 in trash_truncate (frame=0x7fc5c82e74e0, this=0x7fc62400ab50, loc=0x7fc5c82daaa0, offset=140487443629264, xdata=0x0) at trash.c:2061
        _new = 0x7fc5c82c44d0
        old_THIS = 0x7fc62400ab50
        priv = 0x0
        local = 0x0
        match = 604023632
        pathbuf = 0x7fc5c8183390 "<gfid:66421e76-e5b7-4d5b-9bbd-e52096d4aa59>"
        ret = 0
        __FUNCTION__ = "trash_truncate"


There is also a message in brick log:

pending frames:
frame : type(0) op(10)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2017-11-20 20:29:11
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.10.7
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xac)[0x7fc630435eec]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x324)[0x7fc63043f504]
/lib/x86_64-linux-gnu/libc.so.6(+0x350e0)[0x7fc62ef380e0]
/lib/x86_64-linux-gnu/libc.so.6(strlen+0x2a)[0x7fc62ef84c3a]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/trash.so(+0x113a3)[0x7fc629f0a3a3]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/storage/posix.so(+0x6cc2)[0x7fc62a72ccc2]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/trash.so(+0x122d5)[0x7fc629f0b2d5]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/changetimerecorder.so(+0x72dc)[0x7fc629ce02dc]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/changelog.so(+0xb34e)[0x7fc6295e734e]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/bitrot-stub.so(+0x4bd9)[0x7fc6293c8bd9]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x75)[0x7fc630457cf5]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/bitrot-stub.so(+0x80f6)[0x7fc6293cc0f6]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/changelog.so(+0x6386)[0x7fc6295e2386]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/changetimerecorder.so(+0xc163)[0x7fc629ce5163]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/storage/posix.so(+0xf2d5)[0x7fc62a7352d5]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7fc6304ab225]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/changetimerecorder.so(+0x51f7)[0x7fc629cde1f7]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/changelog.so(+0x86ba)[0x7fc6295e46ba]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/bitrot-stub.so(+0xc765)[0x7fc6293d0765]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/bitrot-stub.so(+0xc938)[0x7fc6293d0938]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/bitrot-stub.so(+0xe7e4)[0x7fc6293d27e4]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/access-control.so(+0x5a02)[0x7fc6291b6a02]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/locks.so(+0x178a0)[0x7fc628f968a0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/storage/posix.so(+0x6cc2)[0x7fc62a72ccc2]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_stat+0xa1)[0x7fc6304ac1c1]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_stat+0xa1)[0x7fc6304ac1c1]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_stat+0xa1)[0x7fc6304ac1c1]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/bitrot-stub.so(+0x9401)[0x7fc6293cd401]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_stat+0xa1)[0x7fc6304ac1c1]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/locks.so(+0x740f)[0x7fc628f8640f]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/worm.so(+0x8311)[0x7fc628d79311]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/read-only.so(+0x277e)[0x7fc628b6b77e]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/leases.so(+0x7184)[0x7fc62895b184]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/features/upcall.so(+0x122df)[0x7fc62874b2df]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_truncate_resume+0x19b)[0x7fc6304c5b5b]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x75)[0x7fc630457cf5]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.7/xlator/performance/io-threads.so(+0x4dd4)[0x7fc628531dd4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064)[0x7fc62f6b2064]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fc62efeb62d]

Comment 4 Amar Tumballi 2018-09-14 07:24:18 UTC
Please validate it on latest master. We had not seen it in some time now.

Comment 5 Vijay Bellur 2018-11-20 04:17:41 UTC
*** Bug 1432043 has been marked as a duplicate of this bug. ***

Comment 6 Amar Tumballi 2019-06-18 10:31:33 UTC
Closing with WORKSFORME as the option. Please try with latest releases (glusterfs-6.x +) and see if this is fixed for you.


Note You need to log in before you can comment on or make changes to this bug.