Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1432043

Summary:	glusterfsd segfault in trash_truncate_stat_cbk
Product:	[Community] GlusterFS	Reporter:	Niels de Vos <ndevos>
Component:	trash-xlator	Assignee:	bugs <bugs>
Status:	CLOSED DUPLICATE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	low
Version:	mainline	CC:	anoopcs, atumball, bugs, jthottan, rhb1, vbellur
Target Milestone:	---	Keywords:	EasyFix, Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1430360	Environment:
Last Closed:	2018-11-20 04:17:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1430360

Description Niels de Vos 2017-03-14 12:11:03 UTC

+++ This bug was initially created as a clone of Bug #1430360 +++
+++                                                           +++
+++ Use this bug to get a fix in the master branch before     +++
+++ backporting it to the maintained versions.                +++

Description of problem:

I'm experiencing random segmentation faults of glusterfsd process. The problem started appearing since the trash has been enabled.

Version-Release number of selected component (if applicable):

Debian Jessie, the packages are up to date:

ii  glusterfs-client                3.8.9-1                     amd64        clustered file-system (client package)
ii  glusterfs-common                3.8.9-1                     amd64        GlusterFS common libraries and translator modules
ii  glusterfs-dbg                   3.8.9-1                     amd64        GlusterFS debugging symbols
ii  glusterfs-server                3.8.9-1                     amd64        clustered file-system (server package)


How reproducible:

I've assembled the following cluster:

Volume Name: volume1
Type: Distributed-Replicate
Volume ID: c4c0e0a4-e705-472d-8d27-485619cc66db
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.10.9.7:/export/data1
Brick2: 10.10.9.8:/export/data1
Brick3: 10.10.9.9:/export/data1
Brick4: 10.10.9.10:/export/data1
Brick5: 10.10.9.5:/export/data1
Brick6: 10.10.9.6:/export/data1
Options Reconfigured:
features.trash: on
features.trash-eliminate-path: _REMOVED,_db_backup,*/private
performance.readdir-ahead: on
cluster.self-heal-daemon: enable
server.allow-insecure: on
performance.read-ahead: on
cluster.min-free-disk: 5
performance.stat-prefetch: on
performance.quick-read: on
auth.allow: 10.*.*.*
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: ERROR
nfs.disable: on
features.trash-max-filesize: 100MB
performance.cache-size: 1GB
cluster.favorite-child-policy: mtime
cluster.server-quorum-ratio: 51%


Actual results:

Brick 10.10.9.10:/export/data1              N/A       N/A        N       N/A  
Status of volume: volume1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.9.7:/export/data1               49154     0          Y       25300
Brick 10.10.9.8:/export/data1               49154     0          Y       18486
Brick 10.10.9.9:/export/data1               49156     0          Y       32131
Brick 10.10.9.10:/export/data1              N/A       N/A        N       N/A  
Brick 10.10.9.5:/export/data1               49154     0          Y       8549 
Brick 10.10.9.6:/export/data1               49154     0          Y       18783
Self-heal Daemon on localhost               N/A       N/A        Y       25574
Self-heal Daemon on 10.10.9.9               N/A       N/A        Y       23093
Self-heal Daemon on 10.10.9.6               N/A       N/A        Y       10453
Self-heal Daemon on 10.10.9.8               N/A       N/A        Y       21167
Self-heal Daemon on 10.10.9.7               N/A       N/A        Y       25331
Self-heal Daemon on 10.10.9.5               N/A       N/A        Y       26096
 
Task Status of Volume volume1
------------------------------------------------------------------------------
There are no active volume tasks

Notice the unavailability of 10.10.9.10


Expected results:

glusterfsd not crashing.


Additional info:

The core dump shows:

Core was generated by `/usr/sbin/glusterfsd -s 10.10.9.10 --volfile-id volume1.10.10.9.10.export-data1 -p'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
106     ../sysdeps/x86_64/strlen.S: No such file or directory.
(gdb) bt
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x00007fd15b5aca63 in gf_strdup (src=<optimized out>) at ../../../../libglusterfs/src/mem-pool.h:185
#2  trash_truncate_stat_cbk (frame=0x7fd163965058, cookie=0x0, this=0x7fd15c0097a0, op_ret=0, op_errno=op_errno@entry=0, buf=0x7fd14180e930, xdata=0x7fd163113e14) at trash.c:1630
#3  0x00007fd15bdceac6 in posix_stat (frame=0x7fd163962d90, this=<optimized out>, loc=<optimized out>, xdata=<optimized out>) at posix.c:310
#4  0x00007fd15b5ad943 in trash_truncate (frame=0x7fd163965058, this=0x7fd15c0097a0, loc=0x7fd1631d7710, offset=140537295678864, xdata=0x0) at trash.c:1780
#5  0x00007fd15b384042 in ctr_truncate (frame=0x7fd16395ac60, this=0x7fd15c00b100, loc=0x7fd1631d7710, offset=0, xdata=0x7fd1631327e0) at changetimerecorder.c:731
#6  0x00007fd15ac8be24 in changelog_truncate (frame=0x7fd163966438, this=0x7fd15c00de60, loc=0x7fd1631d7710, offset=0, xdata=0x7fd1631327e0) at changelog.c:1753
#7  0x00007fd15aa6dbff in br_stub_truncate_resume (frame=0x7fd163955ce0, this=0x7fd15c00f680, loc=0x7fd1631d7710, offset=0, xdata=0x7fd1631327e0) at bit-rot-stub.c:2051
#8  0x00007fd165e99a2d in call_resume (stub=0x7fd1631d76c0) at call-stub.c:2508
#9  0x00007fd15aa71036 in br_stub_fd_incversioning_cbk (frame=0x7fd163955ce0, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=<optimized out>, xdata=<optimized out>) at bit-rot-stub.c:613
#10 0x00007fd15ac86ff4 in changelog_fsetxattr_cbk (frame=0x7fd163966efc, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, xdata=0x0) at changelog.c:1538
#11 0x00007fd15b38873c in ctr_fsetxattr_cbk (frame=0x7fd163960920, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, xdata=0x0) at changetimerecorder.c:1294
#12 0x00007fd15bdd8813 in posix_fsetxattr (frame=frame@entry=0x7fd16396650c, this=this@entry=0x7fd15c006d00, fd=fd@entry=0x7fd158018484, dict=dict@entry=0x7fd163182dd8, flags=flags@entry=0, xdata=xdata@entry=0x7fd163104ce0) at posix.c:5036
#13 0x00007fd165eec3eb in default_fsetxattr (frame=0x7fd16396650c, this=<optimized out>, fd=0x7fd158018484, dict=0x7fd163182dd8, flags=0, xdata=0x7fd163104ce0) at defaults.c:2328
#14 0x00007fd15b381f1d in ctr_fsetxattr (frame=0x7fd163960920, this=0x7fd15c00b100, fd=0x7fd158018484, dict=0x7fd163182dd8, flags=0, xdata=0x7fd163104ce0) at changetimerecorder.c:1325
#15 0x00007fd15ac891d0 in changelog_fsetxattr (frame=0x7fd163966efc, this=0x7fd15c00de60, fd=0x7fd158018484, dict=0x7fd163182dd8, flags=0, xdata=0x7fd163104ce0) at changelog.c:1571
#16 0x00007fd15aa7533b in br_stub_fd_versioning (this=0x7fd15c00f680, frame=0x7fd163955ce0, stub=0x7fd163966efc, dict=0x0, dict@entry=0x7fd163182dd8, fd=0x0, fd@entry=0x7fd158018484, callback=0x7fd15c00f680, memversion=6, versioningtype=2, durable=0)
    at bit-rot-stub.c:682
#17 0x00007fd15aa75508 in br_stub_perform_incversioning (this=0x7fd15c00f680, frame=0x7fd163955ce0, stub=0x7fd1631d76c0, fd=0x7fd158018484, ctx=<optimized out>) at bit-rot-stub.c:723
#18 0x00007fd15aa77284 in br_stub_truncate (frame=0x7fd163955ce0, this=0x7fd15c00f680, loc=0x7fd15c086610, offset=0, xdata=0x7fd1631327e0) at bit-rot-stub.c:2131
#19 0x00007fd15a85bfc8 in posix_acl_truncate (frame=0x7fd1639536c8, this=0x7fd15c010cf0, loc=0x7fd15c086610, off=0, xdata=0x7fd1631327e0) at posix-acl.c:1080
#20 0x00007fd15a641406 in truncate_stat_cbk (frame=0x7fd16395dfb8, cookie=0x0, this=0x7fd15c012260, op_ret=1543578208, op_errno=1670723272, buf=0x7fd14180e930, buf@entry=0x7fd14180fa50, xdata=0x0) at posix.c:795
#21 0x00007fd15bdceac6 in posix_stat (frame=frame@entry=0x7fd16396335c, this=this@entry=0x7fd15c006d00, loc=loc@entry=0x7fd163221068, xdata=xdata@entry=0x0) at posix.c:310
#22 0x00007fd165eed427 in default_stat (frame=frame@entry=0x7fd16396335c, this=this@entry=0x7fd15c0097a0, loc=loc@entry=0x7fd163221068, xdata=xdata@entry=0x0) at defaults.c:2647
#23 0x00007fd165eed427 in default_stat (frame=frame@entry=0x7fd16396335c, this=this@entry=0x7fd15c00b100, loc=loc@entry=0x7fd163221068, xdata=xdata@entry=0x0) at defaults.c:2647
#24 0x00007fd165eed427 in default_stat (frame=0x7fd16396335c, this=<optimized out>, loc=0x7fd163221068, xdata=0x0) at defaults.c:2647
#25 0x00007fd15aa721c7 in br_stub_stat (frame=0x7fd16396335c, this=0x7fd15c00de60, loc=0x7fd163221068, xdata=0x0) at bit-rot-stub.c:2818
#26 0x00007fd165eed427 in default_stat (frame=0x7fd16396335c, this=<optimized out>, loc=0x7fd163221068, xdata=0x0) at defaults.c:2647
#27 0x00007fd15a63a0f5 in pl_truncate (frame=0x7fd16395dfb8, this=0x7fd15c012260, loc=0x7fd163221068, offset=0, xdata=0x7fd15c012260) at posix.c:855
#28 0x00007fd15a42de57 in worm_truncate (frame=0x7fd16395dfb8, this=0x7fd15c012260, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at worm.c:185
#29 0x00007fd15a2206ec in ro_truncate (frame=0x7fd16395dfb8, this=0x7fd15c013680, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at read-only-common.c:175
#30 0x00007fd15a00fbca in leases_truncate (frame=0x7fd163958b40, this=0x7fd15c0162b0, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at leases.c:312
#31 0x00007fd159dfc2ce in up_truncate (frame=0x7fd163955640, this=0x7fd15c0177e0, loc=0x7fd163221068, offset=0, xdata=0x0) at upcall.c:301
#32 0x00007fd165f04d81 in default_truncate_resume (frame=0x7fd163967178, this=0x7fd15c018d90, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at defaults.c:1944
#33 0x00007fd165e99a2d in call_resume (stub=0x7fd163221018) at call-stub.c:2508
#34 0x00007fd159bee917 in iot_worker (data=0x7fd15c067480) at io-threads.c:220
#35 0x00007fd1650f6064 in start_thread (arg=0x7fd141810700) at pthread_create.c:309
#36 0x00007fd164a2f62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

--- Additional comment from Jeff Darcy on 2017-03-08 14:21:08 CET ---

The immediate problem seems to be that trash_truncate_stat_cbk assumes local->loc.path will be non-NULL, but that's not entirely guaranteed to be the case.  In general, a loc_t can be used to resolve an inode in several ways, only some (and the less preferred ones at that) involving the path/name fields.  Adding a NULL check should help, but it might also be interesting to find out why we're being called this way in case there are other implications of something the code clearly does not expect.

Comment 1 Vijay Bellur 2018-11-20 04:17:41 UTC


*** This bug has been marked as a duplicate of bug 1430360 ***