Bug 1722708 - WORM: Segmentation Fault if bitrot stub do signature (backport to Gluster 5)
Summary: WORM: Segmentation Fault if bitrot stub do signature (backport to Gluster 5)
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: bitrot
Version: 5
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
Assignee: david.spisla
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-21 05:15 UTC by david.spisla
Modified: 2020-03-12 14:47 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1717757
Environment:
Last Closed: 2020-03-12 14:47:38 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description david.spisla 2019-06-21 05:15:44 UTC
+++ This bug was initially created as a clone of Bug #1717757 +++

Description of problem:

Setup: 2-Node VM Cluster with a Replica 2 Volume
After doing several "wild" write and delete operations from a Win Client, one of the brick crashes. The crash report says:

[2019-06-05 09:05:05.137156] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-archive1-access-control: client: CTX_ID:fcab5e67-b9d9-4b72-8c15-f29de2084af3-GRAPH_ID:0-PID:189
16-HOST:fs-detlefh-c1-n2-PC_NAME:archive1-client-0-RECON_NO:-0, gfid: 494b42ad-7e40-4e27-8878-99387a80b5dc, req(uid:2000,gid:2000,perm:3,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:755,update
d-fop:LOOKUP, acl:-) [Permission denied]
pending frames:
frame : type(0) op(0)
frame : type(0) op(23)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-06-05 09:05:05
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.5
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f89faa7264c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f89faa7cd26]
/lib64/libc.so.6(+0x361a0)[0x7f89f9c391a0]
/usr/lib64/glusterfs/5.5/xlator/features/bitrot-stub.so(+0x13441)[0x7f89f22ae441]
/usr/lib64/libglusterfs.so.0(default_fsetxattr+0xce)[0x7f89faaf9f8e]
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x22636)[0x7f89f1e68636]
/usr/lib64/libglusterfs.so.0(default_fsetxattr+0xce)[0x7f89faaf9f8e]
/usr/lib64/libglusterfs.so.0(syncop_fsetxattr+0x26b)[0x7f89faab319b]
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xa901)[0x7f89f1c3d901]
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x11b66)[0x7f89f1e57b66]
/usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0xaebe)[0x7f89f208febe]
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb081)[0x7f89f1e51081]
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8c23)[0x7f89f1c3bc23]
/usr/lib64/glusterfs/5.5/xlator/features/read-only.so(+0x4e30)[0x7f89f1a2de30]
/usr/lib64/glusterfs/5.5/xlator/features/leases.so(+0xa444)[0x7f89f181b444]
/usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0x10a68)[0x7f89f1600a68]
/usr/lib64/libglusterfs.so.0(default_create_resume+0x212)[0x7f89fab10132]
/usr/lib64/libglusterfs.so.0(call_resume_wind+0x2cf)[0x7f89faa97e5f]
/usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f89faa983a5]
/usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f89f13e7088]
/lib64/libpthread.so.0(+0x7569)[0x7f89f9fc4569]
/lib64/libc.so.6(clone+0x3f)[0x7f89f9cfb9af]
---------

Version-Release number of selected component (if applicable): v5.5


Additional info:

The backtrace shows that there is a Nulllpointer for *fd in br_stub_fsetxattr:
Thread 1 (Thread 0x7f89f0099700 (LWP 2171)):
#0  br_stub_fsetxattr (frame=0x7f89b846a6e8, this=0x7f89ec015c00, fd=0x0, dict=0x7f89b84e9ad8, flags=0, xdata=0x0) at bit-rot-stub.c:1328
        ret = 0
        val = 0
        sign = 0x0
        priv = 0x7f89ec07ed60
        op_errno = 22
        __FUNCTION__ = "br_stub_fsetxattr"

This results in a segmentation fault in line 1328 of bit-rot_stub.c :

if (!IA_ISREG(fd->inode->ia_type))
        goto wind;

The bitrot-stub wants to signate a file but the corresponding fd is a Nullpointer.

The full backtrace is attached!!!

--- Additional comment from Amar Tumballi on 2019-06-06 06:57:25 UTC ---

Not sure why this happened, because, for bitrot, a fsetxattr() call shouldn't come at all if fd is NULL. It should have been prevented at higher level itself.


I found the reason after digging a bit. Ideally, in case of failure (here, worm_create_cbk() received -1, which means fd is NULL), one shouldn't consume fd and call fsetxattr(). If there is a need to do a xattr op in failure, then one should call setxattr with 'loc' passed in create() call. (you can store it in local).

----
#0  br_stub_fsetxattr (frame=0x7f89b846a6e8, this=0x7f89ec015c00, fd=0x0, dict=0x7f89b84e9ad8, flags=0, xdata=0x0) at bit-rot-stub.c:1328
        ret = 0
        val = 0
        sign = 0x0
        priv = 0x7f89ec07ed60
        op_errno = 22
        __FUNCTION__ = "br_stub_fsetxattr"
#1  0x00007f89faaf9f8e in default_fsetxattr () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#2  0x00007f89f1e68636 in pl_fsetxattr (frame=0x7f89b825ab48, this=0x7f89ec0194a0, fd=0x0, dict=0x7f89b84e9ad8, flags=0, xdata=0x0) at posix.c:1566
        _new = 0x7f89b846a6e8
        old_THIS = 0x7f89ec0194a0
        next_xl_fn = 0x7f89faaf9ec0 <default_fsetxattr>
        tmp_cbk = 0x7f89f1e56680 <pl_fsetxattr_cbk>
        op_ret = <optimized out>
        op_errno = 0
        lockinfo_buf = 0x0
        len = 0
        __FUNCTION__ = "pl_fsetxattr"
#3  0x00007f89faaf9f8e in default_fsetxattr () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#4  0x00007f89faab319b in syncop_fsetxattr () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#5  0x00007f89f1c3d901 in worm_create_cbk (frame=frame@entry=0x7f89b8302fe8, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=-1, op_errno=op_errno@entry=13, 
    fd=fd@entry=0x0, inode=0x0, buf=0x0, preparent=0x0, postparent=0x0, xdata=0x0) at worm.c:492
        ret = 0
        priv = 0x7f89ec074b38
        dict = 0x7f89b84e9ad8
        __FUNCTION__ = "worm_create_cbk"
----

Hopefully this helps.

--- Additional comment from Amar Tumballi on 2019-06-06 06:59:29 UTC ---

Can you check if below works?

diff --git a/xlators/features/read-only/src/worm.c b/xlators/features/read-only/src/worm.c
index cc3d15b8b2..6b44eae966 100644
--- a/xlators/features/read-only/src/worm.c
+++ b/xlators/features/read-only/src/worm.c
@@ -431,7 +431,7 @@ worm_create_cbk(call_frame_t *frame, void *cookie, xlator_t *this,
 
     priv = this->private;
     GF_ASSERT(priv);
-    if (priv->worm_file) {
+    if (priv->worm_file && (op_ret >= 0)) {
         dict = dict_new();
         if (!dict) {
             gf_log(this->name, GF_LOG_ERROR,


----

Great if you can confirm this.

--- Additional comment from  on 2019-06-06 07:08:49 UTC ---

I will check it!

--- Additional comment from  on 2019-06-07 08:30:08 UTC ---

@Amar

I wrote a patch with debug logs and I will observe the bricks now. During this time I have some questions concerning your patch suggestion:

1. According to crash report from the brick locks, there was a failure in

[2019-06-05 09:05:05.137156] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-archive1-access-control: client: CTX_ID:fcab5e67-b9d9-4b72-8c15-f29de2084af3-GRAPH_ID:0-PID:189
16-HOST:fs-detlefh-c1-n2-PC_NAME:archive1-client-0-RECON_NO:-0, gfid: 494b42ad-7e40-4e27-8878-99387a80b5dc, req(uid:2000,gid:2000,perm:3,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:755,update
d-fop:LOOKUP, acl:-) [Permission denied]

just before the crash. What can be the reason for this?

2. If this LOOKUP for acls fails, is it problematic to do a setxattr with loc? If we skip setting xattr when fd is NULL the file on that brick won't have the necessary xattr like trusted.worm_file and other. See an example directly after the crash:
# file: gluster/brick3/glusterbrick/test/data/BC/storage.log
trusted.gfid=0sag3y6RuoTgqAw//fx3ZB1Q==
trusted.gfid2path.273f2255a25b2961="bd910b86-d51a-4006-a2c4-515ef5f1777a/storage.log"
trusted.pgfid.bd910b86-d51a-4006-a2c4-515ef5f1777a=0sAAAAAQ==

On the healthy brick I got:
# file: gluster/brick3/glusterbrick/test/data/BC/storage.log
trusted.afr.dirty=0sAAAAAAAAAAAAAAAA
trusted.afr.test-client-0=0sAAAABAAAAAMAAAAA
trusted.bit-rot.version=0sAgAAAAAAAABc+P64AAEhGQ==
trusted.gfid=0sag3y6RuoTgqAw//fx3ZB1Q==
trusted.gfid2path.273f2255a25b2961="bd910b86-d51a-4006-a2c4-515ef5f1777a/storage.log"
trusted.glusterfs.mdata=0sAQAAAAAAAAAAAAAAAFz5AJEAAAAAMqdgMwAAAABcRwJEAAAAAAAAAAAAAAAAXPkAkQAAAAAAAAAA
trusted.pgfid.bd910b86-d51a-4006-a2c4-515ef5f1777a=0sAAAAAQ==
trusted.start_time="1559822481"
trusted.worm_file=0sMQA=

After restarting the faulty brick a heal was triggered and afterwards the file on the faulty brick is heal.It should be ensured that the broken file gets all necessary xattr. What is the better way? Triggering a setxattr with loc in worm_create_cbk or do a heal afterwards?

--- Additional comment from Amar Tumballi on 2019-06-07 08:35:33 UTC ---


1. permission denied is mostly probably a issue of missing permission (uid 2000, trying to create an entry in a directory with 755, owned by uid-0 (root)).


2. I think it is better to leave it to heal. If it is a create failure, we should anyways fail the operation is my opinion.

--- Additional comment from  on 2019-06-07 09:57:15 UTC ---

Allrigth, I will stress the system for a while and if everything is stable I will commit the patch to gerrit

--- Additional comment from Amar Tumballi on 2019-06-18 13:38:51 UTC ---

Looks like we need to check for 'op_ret' in most of the places in WORM code.

--- Additional comment from Worker Ant on 2019-06-19 11:10:56 UTC ---

REVIEW: https://review.gluster.org/22898 (WORM-Xlator: Avoid performing fsetxattr if fd is NULL) posted (#1) for review on master by David Spisla

--- Additional comment from  on 2019-06-19 11:26:35 UTC ---

I have send a patch to gerrit.
@Amar if there is any other place in the WORM Xlator whcih can cause an segfault please tell me. I will swrite some patches soon. At the moment the worm_create_cbk is the only one callback function in this xlator

--- Additional comment from Worker Ant on 2019-06-21 04:21:26 UTC ---

REVIEW: https://review.gluster.org/22898 (WORM-Xlator: Avoid performing fsetxattr if fd is NULL) merged (#5) on master by Amar Tumballi

Comment 1 Yaniv Kaul 2019-07-16 09:22:15 UTC
Above patch is merged to master. Do you plan to backport it?

Comment 2 david.spisla 2019-07-16 10:50:32 UTC
I think it should be backported to v5.x

Comment 3 david.spisla 2019-07-16 11:23:13 UTC
https://review.gluster.org/#/c/glusterfs/+/22915/

Comment 4 Worker Ant 2020-03-12 14:47:38 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/1062, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.