1402697 – glusterfsd crashed while taking snapshot using scheduler

Bug 1402697 - glusterfsd crashed while taking snapshot using scheduler

Summary: glusterfsd crashed while taking snapshot using scheduler

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Atin Mukherjee
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1401817 1401921
Blocks:	1402694
TreeView+	depends on / blocked

Reported:	2016-12-08 07:46 UTC by Atin Mukherjee
Modified:	2017-01-16 12:26 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.8.8
Clone Of:	1401921
Environment:
Last Closed:	2017-01-16 12:26:48 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Atin Mukherjee 2016-12-08 07:46:24 UTC

+++ This bug was initially created as a clone of Bug #1401921 +++

+++ This bug was initially created as a clone of Bug #1401817 +++

Description of problem:

While taking snapshot using scheduler one of the brick process crashed.


Version-Release number of selected component (if applicable):

glusterfs-3.8.4-6.el7rhgs.x86_64

How reproducible:

1/1


Steps to Reproduce:
1. Create 2*2 distributed replicate volume
2. enabled scheduler,
3. scheduled snapshot every one minute

Actual results:

One of the brick process crashed

Expected results:



Additional info:

bt
=======================

#0  0x00007f19a2a12394 in glusterfs_handle_barrier (req=0x7f19a30cffcc) at glusterfsd-mgmt.c:1348
        ret = <optimized out>
        brick_req = {name = 0x7f198c0008e0 "repvol", op = 10, input = {input_len = 1783, 
            input_val = 0x7f198c000900 ""}}
        brick_rsp = {op_ret = 0, op_errno = 0, output = {output_len = 0, output_val = 0x0}, op_errstr = 0x0}
        ctx = 0x7f19a3085010
        active = 0x0
        any = 0x0
        xlator = 0x0
        old_THIS = 0x0
        dict = 0x0
        name = '\000' <repeats 1023 times>
        barrier = _gf_true
        barrier_err = _gf_false
        __FUNCTION__ = "glusterfs_handle_barrier"
#1  0x00007f19a2550a92 in synctask_wrap (old_task=<optimized out>) at syncop.c:375
        task = 0x7f1990002510
#2  0x00007f19a0c0fcf0 in ?? () from /lib64/libc.so.6
No symbol table info available.
#3  0x0000000000000000 in ?? ()
No symbol table info available.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-12-06 03:07:10 EST ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.2.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Anil Shah on 2016-12-06 03:08:16 EST ---

root@rhs-client46 /]# gluster v info repvol
 
Volume Name: repvol
Type: Distributed-Replicate
Volume ID: cd9a174e-2c96-4d9f-a0f5-a342639c14b5
Status: Started
Snapshot Count: 35
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.36.70:/rhs/brick1/b1
Brick2: 10.70.36.71:/rhs/brick1/b2
Brick3: 10.70.36.46:/rhs/brick1/b3
Brick4: 10.70.44.7:/rhs/brick1/b4
Options Reconfigured:
features.inode-quota: on
features.quota: on
features.barrier: disable
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
features.uss: enable
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: off
storage.batch-fsync-delay-usec: 0
features.show-snapshot-directory: on
server.allow-insecure: on
features.quota-deem-statfs: on
cluster.enable-shared-storage: enable
[root@rhs-client46 /]#

--- Additional comment from Anil Shah on 2016-12-06 03:08:52 EST ---

root@rhs-client46 /]# gluster v info repvol
 
Volume Name: repvol
Type: Distributed-Replicate
Volume ID: cd9a174e-2c96-4d9f-a0f5-a342639c14b5
Status: Started
Snapshot Count: 35
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.36.70:/rhs/brick1/b1
Brick2: 10.70.36.71:/rhs/brick1/b2
Brick3: 10.70.36.46:/rhs/brick1/b3
Brick4: 10.70.44.7:/rhs/brick1/b4
Options Reconfigured:
features.inode-quota: on
features.quota: on
features.barrier: disable
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
features.uss: enable
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: off
storage.batch-fsync-delay-usec: 0
features.show-snapshot-directory: on
server.allow-insecure: on
features.quota-deem-statfs: on
cluster.enable-shared-storage: enable
[root@rhs-client46 /]#

--- Additional comment from Anil Shah on 2016-12-06 03:58:00 EST ---

sos reports uploaded @
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1401817/

--- Additional comment from Atin Mukherjee on 2016-12-06 07:03:04 EST ---

The function from where this core was generated is glusterfs_handle_barrier (). From the core it looks like glusterfsd_ctx (global context) in the brick process didn't have ctx->active initialized which happens during graph initialization. We also saw that when barrier brick op was sent by GlusterD brick process just came up. The hypothesis we have here is as follows:

T1. Brick process was in its init. However it still didn't finish doing the graph generation.
T2. GlusterD sent a barrier brick op (as a trigger to snapshot initiated by snapshot scheduler) as it understood the brick to be connected (received the rpc connect notify from brick process)

The time gap between T1 & T2 is very minimum and currently GlusterD doesn't know whether the brick process has finished all its initialization including the graph generation.

One mitigation approach to avoid this crash is to avoid null pointer dereferencing which can be addressed by a simple patch and then even if we hit this race, barrier would fail. But to fix this race entirely we need to come up with a concrete solution which may not be feasible in 3.2.0 time lines.

--- Additional comment from Atin Mukherjee on 2016-12-06 07:15:00 EST ---

Description of problem:

While taking snapshot using scheduler one of the brick process crashed.


Version-Release number of selected component (if applicable):
mainline

How reproducible:

1/1


Steps to Reproduce:
1. Create 2*2 distributed replicate volume
2. enabled scheduler,
3. scheduled snapshot every one minute

Actual results:

One of the brick process crashed


Additional info:

bt
=======================

#0  0x00007f19a2a12394 in glusterfs_handle_barrier (req=0x7f19a30cffcc) at glusterfsd-mgmt.c:1348
        ret = <optimized out>
        brick_req = {name = 0x7f198c0008e0 "repvol", op = 10, input = {input_len = 1783, 
            input_val = 0x7f198c000900 ""}}
        brick_rsp = {op_ret = 0, op_errno = 0, output = {output_len = 0, output_val = 0x0}, op_errstr = 0x0}
        ctx = 0x7f19a3085010
        active = 0x0
        any = 0x0
        xlator = 0x0
        old_THIS = 0x0
        dict = 0x0
        name = '\000' <repeats 1023 times>
        barrier = _gf_true
        barrier_err = _gf_false
        __FUNCTION__ = "glusterfs_handle_barrier"
#1  0x00007f19a2550a92 in synctask_wrap (old_task=<optimized out>) at syncop.c:375
        task = 0x7f1990002510
#2  0x00007f19a0c0fcf0 in ?? () from /lib64/libc.so.6
No symbol table info available.
#3  0x0000000000000000 in ?? ()
No symbol table info available.

RCA: 
The function from where this core was generated is glusterfs_handle_barrier (). From the core it looks like glusterfsd_ctx (global context) in the brick process didn't have ctx->active initialized which happens during graph initialization. We also saw that when barrier brick op was sent by GlusterD brick process just came up. The hypothesis we have here is as follows:

T1. Brick process was in its init. However it still didn't finish doing the graph generation.
T2. GlusterD sent a barrier brick op (as a trigger to snapshot initiated by snapshot scheduler) as it understood the brick to be connected (received the rpc connect notify from brick process)

The time gap between T1 & T2 is very minimum and currently GlusterD doesn't know whether the brick process has finished all its initialization including the graph generation.

One mitigation approach to avoid this crash is to avoid null pointer dereferencing which can be addressed by a simple patch and then even if we hit this race, barrier would fail. But to fix this race entirely we need to come up with a concrete solution which may not be feasible in 3.2.0 time lines.

--- Additional comment from Worker Ant on 2016-12-06 07:18:23 EST ---

REVIEW: http://review.gluster.org/16043 (glusterfsd : fix null pointer dereference in glusterfs_handle_barrier) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2016-12-06 23:49:29 EST ---

REVIEW: http://review.gluster.org/16043 (glusterfsd : fix null pointer dereference in glusterfs_handle_barrier) posted (#2) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2016-12-08 02:38:28 EST ---

COMMIT: http://review.gluster.org/16043 committed in master by Vijay Bellur (vbellur) 
------
commit 369c619f946f9ec1cf86cc83a7dcb11c29f1f0c7
Author: Atin Mukherjee <amukherj>
Date:   Tue Dec 6 16:21:41 2016 +0530

    glusterfsd : fix null pointer dereference in glusterfs_handle_barrier
    
    Change-Id: Iab86a3c4970e54c22d3170e68708e0ea432a8ea4
    BUG: 1401921
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/16043
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 1 Worker Ant 2016-12-08 07:47:20 UTC

REVIEW: http://review.gluster.org/16066 (glusterfsd : fix null pointer dereference in glusterfs_handle_barrier) posted (#1) for review on release-3.8 by Atin Mukherjee (amukherj)

Comment 2 Worker Ant 2017-01-02 13:09:13 UTC

COMMIT: http://review.gluster.org/16066 committed in release-3.8 by Niels de Vos (ndevos) 
------
commit ed3fb30254af39e560d09466c6a755d6e0e4b32d
Author: Atin Mukherjee <amukherj>
Date:   Tue Dec 6 16:21:41 2016 +0530

    glusterfsd : fix null pointer dereference in glusterfs_handle_barrier
    
    >Reviewed-on: http://review.gluster.org/16043
    >Smoke: Gluster Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >Reviewed-by: Vijay Bellur <vbellur>
    
    Change-Id: Iab86a3c4970e54c22d3170e68708e0ea432a8ea4
    BUG: 1402697
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/16066
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Niels de Vos <ndevos>

Comment 3 Niels de Vos 2017-01-16 12:26:48 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.8, please open a new bug report.

glusterfs-3.8.8 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2017-January/000064.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.