+++ This bug was initially created as a clone of Bug #1430360 +++ +++ +++ +++ Use this bug to get a fix in the master branch before +++ +++ backporting it to the maintained versions. +++ Description of problem: I'm experiencing random segmentation faults of glusterfsd process. The problem started appearing since the trash has been enabled. Version-Release number of selected component (if applicable): Debian Jessie, the packages are up to date: ii glusterfs-client 3.8.9-1 amd64 clustered file-system (client package) ii glusterfs-common 3.8.9-1 amd64 GlusterFS common libraries and translator modules ii glusterfs-dbg 3.8.9-1 amd64 GlusterFS debugging symbols ii glusterfs-server 3.8.9-1 amd64 clustered file-system (server package) How reproducible: I've assembled the following cluster: Volume Name: volume1 Type: Distributed-Replicate Volume ID: c4c0e0a4-e705-472d-8d27-485619cc66db Status: Started Snapshot Count: 0 Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: 10.10.9.7:/export/data1 Brick2: 10.10.9.8:/export/data1 Brick3: 10.10.9.9:/export/data1 Brick4: 10.10.9.10:/export/data1 Brick5: 10.10.9.5:/export/data1 Brick6: 10.10.9.6:/export/data1 Options Reconfigured: features.trash: on features.trash-eliminate-path: _REMOVED,_db_backup,*/private performance.readdir-ahead: on cluster.self-heal-daemon: enable server.allow-insecure: on performance.read-ahead: on cluster.min-free-disk: 5 performance.stat-prefetch: on performance.quick-read: on auth.allow: 10.*.*.* diagnostics.brick-log-level: WARNING diagnostics.client-log-level: ERROR nfs.disable: on features.trash-max-filesize: 100MB performance.cache-size: 1GB cluster.favorite-child-policy: mtime cluster.server-quorum-ratio: 51% Actual results: Brick 10.10.9.10:/export/data1 N/A N/A N N/A Status of volume: volume1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.10.9.7:/export/data1 49154 0 Y 25300 Brick 10.10.9.8:/export/data1 49154 0 Y 18486 Brick 10.10.9.9:/export/data1 49156 0 Y 32131 Brick 10.10.9.10:/export/data1 N/A N/A N N/A Brick 10.10.9.5:/export/data1 49154 0 Y 8549 Brick 10.10.9.6:/export/data1 49154 0 Y 18783 Self-heal Daemon on localhost N/A N/A Y 25574 Self-heal Daemon on 10.10.9.9 N/A N/A Y 23093 Self-heal Daemon on 10.10.9.6 N/A N/A Y 10453 Self-heal Daemon on 10.10.9.8 N/A N/A Y 21167 Self-heal Daemon on 10.10.9.7 N/A N/A Y 25331 Self-heal Daemon on 10.10.9.5 N/A N/A Y 26096 Task Status of Volume volume1 ------------------------------------------------------------------------------ There are no active volume tasks Notice the unavailability of 10.10.9.10 Expected results: glusterfsd not crashing. Additional info: The core dump shows: Core was generated by `/usr/sbin/glusterfsd -s 10.10.9.10 --volfile-id volume1.10.10.9.10.export-data1 -p'. Program terminated with signal SIGSEGV, Segmentation fault. #0 strlen () at ../sysdeps/x86_64/strlen.S:106 106 ../sysdeps/x86_64/strlen.S: No such file or directory. (gdb) bt #0 strlen () at ../sysdeps/x86_64/strlen.S:106 #1 0x00007fd15b5aca63 in gf_strdup (src=<optimized out>) at ../../../../libglusterfs/src/mem-pool.h:185 #2 trash_truncate_stat_cbk (frame=0x7fd163965058, cookie=0x0, this=0x7fd15c0097a0, op_ret=0, op_errno=op_errno@entry=0, buf=0x7fd14180e930, xdata=0x7fd163113e14) at trash.c:1630 #3 0x00007fd15bdceac6 in posix_stat (frame=0x7fd163962d90, this=<optimized out>, loc=<optimized out>, xdata=<optimized out>) at posix.c:310 #4 0x00007fd15b5ad943 in trash_truncate (frame=0x7fd163965058, this=0x7fd15c0097a0, loc=0x7fd1631d7710, offset=140537295678864, xdata=0x0) at trash.c:1780 #5 0x00007fd15b384042 in ctr_truncate (frame=0x7fd16395ac60, this=0x7fd15c00b100, loc=0x7fd1631d7710, offset=0, xdata=0x7fd1631327e0) at changetimerecorder.c:731 #6 0x00007fd15ac8be24 in changelog_truncate (frame=0x7fd163966438, this=0x7fd15c00de60, loc=0x7fd1631d7710, offset=0, xdata=0x7fd1631327e0) at changelog.c:1753 #7 0x00007fd15aa6dbff in br_stub_truncate_resume (frame=0x7fd163955ce0, this=0x7fd15c00f680, loc=0x7fd1631d7710, offset=0, xdata=0x7fd1631327e0) at bit-rot-stub.c:2051 #8 0x00007fd165e99a2d in call_resume (stub=0x7fd1631d76c0) at call-stub.c:2508 #9 0x00007fd15aa71036 in br_stub_fd_incversioning_cbk (frame=0x7fd163955ce0, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=<optimized out>, xdata=<optimized out>) at bit-rot-stub.c:613 #10 0x00007fd15ac86ff4 in changelog_fsetxattr_cbk (frame=0x7fd163966efc, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, xdata=0x0) at changelog.c:1538 #11 0x00007fd15b38873c in ctr_fsetxattr_cbk (frame=0x7fd163960920, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, xdata=0x0) at changetimerecorder.c:1294 #12 0x00007fd15bdd8813 in posix_fsetxattr (frame=frame@entry=0x7fd16396650c, this=this@entry=0x7fd15c006d00, fd=fd@entry=0x7fd158018484, dict=dict@entry=0x7fd163182dd8, flags=flags@entry=0, xdata=xdata@entry=0x7fd163104ce0) at posix.c:5036 #13 0x00007fd165eec3eb in default_fsetxattr (frame=0x7fd16396650c, this=<optimized out>, fd=0x7fd158018484, dict=0x7fd163182dd8, flags=0, xdata=0x7fd163104ce0) at defaults.c:2328 #14 0x00007fd15b381f1d in ctr_fsetxattr (frame=0x7fd163960920, this=0x7fd15c00b100, fd=0x7fd158018484, dict=0x7fd163182dd8, flags=0, xdata=0x7fd163104ce0) at changetimerecorder.c:1325 #15 0x00007fd15ac891d0 in changelog_fsetxattr (frame=0x7fd163966efc, this=0x7fd15c00de60, fd=0x7fd158018484, dict=0x7fd163182dd8, flags=0, xdata=0x7fd163104ce0) at changelog.c:1571 #16 0x00007fd15aa7533b in br_stub_fd_versioning (this=0x7fd15c00f680, frame=0x7fd163955ce0, stub=0x7fd163966efc, dict=0x0, dict@entry=0x7fd163182dd8, fd=0x0, fd@entry=0x7fd158018484, callback=0x7fd15c00f680, memversion=6, versioningtype=2, durable=0) at bit-rot-stub.c:682 #17 0x00007fd15aa75508 in br_stub_perform_incversioning (this=0x7fd15c00f680, frame=0x7fd163955ce0, stub=0x7fd1631d76c0, fd=0x7fd158018484, ctx=<optimized out>) at bit-rot-stub.c:723 #18 0x00007fd15aa77284 in br_stub_truncate (frame=0x7fd163955ce0, this=0x7fd15c00f680, loc=0x7fd15c086610, offset=0, xdata=0x7fd1631327e0) at bit-rot-stub.c:2131 #19 0x00007fd15a85bfc8 in posix_acl_truncate (frame=0x7fd1639536c8, this=0x7fd15c010cf0, loc=0x7fd15c086610, off=0, xdata=0x7fd1631327e0) at posix-acl.c:1080 #20 0x00007fd15a641406 in truncate_stat_cbk (frame=0x7fd16395dfb8, cookie=0x0, this=0x7fd15c012260, op_ret=1543578208, op_errno=1670723272, buf=0x7fd14180e930, buf@entry=0x7fd14180fa50, xdata=0x0) at posix.c:795 #21 0x00007fd15bdceac6 in posix_stat (frame=frame@entry=0x7fd16396335c, this=this@entry=0x7fd15c006d00, loc=loc@entry=0x7fd163221068, xdata=xdata@entry=0x0) at posix.c:310 #22 0x00007fd165eed427 in default_stat (frame=frame@entry=0x7fd16396335c, this=this@entry=0x7fd15c0097a0, loc=loc@entry=0x7fd163221068, xdata=xdata@entry=0x0) at defaults.c:2647 #23 0x00007fd165eed427 in default_stat (frame=frame@entry=0x7fd16396335c, this=this@entry=0x7fd15c00b100, loc=loc@entry=0x7fd163221068, xdata=xdata@entry=0x0) at defaults.c:2647 #24 0x00007fd165eed427 in default_stat (frame=0x7fd16396335c, this=<optimized out>, loc=0x7fd163221068, xdata=0x0) at defaults.c:2647 #25 0x00007fd15aa721c7 in br_stub_stat (frame=0x7fd16396335c, this=0x7fd15c00de60, loc=0x7fd163221068, xdata=0x0) at bit-rot-stub.c:2818 #26 0x00007fd165eed427 in default_stat (frame=0x7fd16396335c, this=<optimized out>, loc=0x7fd163221068, xdata=0x0) at defaults.c:2647 #27 0x00007fd15a63a0f5 in pl_truncate (frame=0x7fd16395dfb8, this=0x7fd15c012260, loc=0x7fd163221068, offset=0, xdata=0x7fd15c012260) at posix.c:855 #28 0x00007fd15a42de57 in worm_truncate (frame=0x7fd16395dfb8, this=0x7fd15c012260, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at worm.c:185 #29 0x00007fd15a2206ec in ro_truncate (frame=0x7fd16395dfb8, this=0x7fd15c013680, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at read-only-common.c:175 #30 0x00007fd15a00fbca in leases_truncate (frame=0x7fd163958b40, this=0x7fd15c0162b0, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at leases.c:312 #31 0x00007fd159dfc2ce in up_truncate (frame=0x7fd163955640, this=0x7fd15c0177e0, loc=0x7fd163221068, offset=0, xdata=0x0) at upcall.c:301 #32 0x00007fd165f04d81 in default_truncate_resume (frame=0x7fd163967178, this=0x7fd15c018d90, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at defaults.c:1944 #33 0x00007fd165e99a2d in call_resume (stub=0x7fd163221018) at call-stub.c:2508 #34 0x00007fd159bee917 in iot_worker (data=0x7fd15c067480) at io-threads.c:220 #35 0x00007fd1650f6064 in start_thread (arg=0x7fd141810700) at pthread_create.c:309 #36 0x00007fd164a2f62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 --- Additional comment from Jeff Darcy on 2017-03-08 14:21:08 CET --- The immediate problem seems to be that trash_truncate_stat_cbk assumes local->loc.path will be non-NULL, but that's not entirely guaranteed to be the case. In general, a loc_t can be used to resolve an inode in several ways, only some (and the less preferred ones at that) involving the path/name fields. Adding a NULL check should help, but it might also be interesting to find out why we're being called this way in case there are other implications of something the code clearly does not expect.
*** This bug has been marked as a duplicate of bug 1430360 ***