Bug 1240603 - glusterfsd crashed after volume start force
Summary: glusterfsd crashed after volume start force
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: protocol
Version: 3.7.2
Hardware: All
OS: All
urgent
urgent
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On: 1239280 1240161
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-07 10:46 UTC by Raghavendra G
Modified: 2015-07-30 09:50 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.7.3
Doc Type: Bug Fix
Doc Text:
Clone Of: 1240161
Environment:
Last Closed: 2015-07-30 09:50:36 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Raghavendra G 2015-07-07 10:46:56 UTC
+++ This bug was initially created as a clone of Bug #1240161 +++

+++ This bug was initially created as a clone of Bug #1239280 +++

Description of problem:
======================

While verifying one of the bug wherein nfs mount hung after volume start force, see the crash after the volume is started failing 2 of the bricks in each distribute in a distributed disperse volume. One of the brick failed to start.

Backtrace:
==========
(gdb) bt
#0  0x00007f49d5dc4c40 in gf_client_put (client=0x0, detached=0x0) at client_t.c:299
#1  0x00007f49c1609265 in server_setvolume (req=0x7f49c067406c) at server-handshake.c:715
#2  0x00007f49d5b2cee5 in rpcsvc_handle_rpc_call (svc=<value optimized out>, trans=<value optimized out>, msg=0x7f49bc001800)
    at rpcsvc.c:703
#3  0x00007f49d5b2d123 in rpcsvc_notify (trans=0x7f49bc000920, mydata=<value optimized out>, event=<value optimized out>, 
    data=0x7f49bc001800) at rpcsvc.c:797
#4  0x00007f49d5b2ead8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#5  0x00007f49ca894255 in socket_event_poll_in (this=0x7f49bc000920) at socket.c:2290
#6  0x00007f49ca895e4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7f49bc000920, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2403
#7  0x00007f49d5dc7970 in event_dispatch_epoll_handler (data=0x7f49c4020260) at event-epoll.c:575
#8  event_dispatch_epoll_worker (data=0x7f49c4020260) at event-epoll.c:678
#9  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f49d47b896d in clone () from /lib64/libc.so.6
(gdb) t a a bt
(gdb) t a a bt

Thread 14 (Thread 0x7f49cc8a2700 (LWP 41134)):
#0  0x00007f49d4e56535 in sigwait () from /lib64/libpthread.so.0
#1  0x00007f49d622d02b in glusterfs_sigwaiter (arg=<value optimized out>) at glusterfsd.c:1989
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f49b17fb700 (LWP 41145)):
#0  0x00007f49d47b13e3 in select () from /lib64/libc.so.6
#1  0x00007f49c33260ba in changelog_ev_dispatch (data=0x7f49c40720d0) at changelog-ev-handle.c:335
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f49b35fe700 (LWP 41142)):
#0  0x00007f49d4e5263c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49c3326373 in changelog_ev_connector (data=0x7f49c40720d0) at changelog-ev-handle.c:193
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f49b2bfd700 (LWP 41143)):
#0  0x00007f49d47b13e3 in select () from /lib64/libc.so.6
#1  0x00007f49c33260ba in changelog_ev_dispatch (data=0x7f49c40720d0) at changelog-ev-handle.c:335
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f49b21fc700 (LWP 41144)):
#0  0x00007f49d47b13e3 in select () from /lib64/libc.so.6
#1  0x00007f49c33260ba in changelog_ev_dispatch (data=0x7f49c40720d0) at changelog-ev-handle.c:335
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f49b3fff700 (LWP 41141)):
#0  0x00007f49d4e5263c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49c2eef6f3 in br_stub_signth (arg=0x7f49c4064a60) at bit-rot-stub.c:649
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f49c0673700 (LWP 41139)):
#0  0x00007f49d4e5263c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49c2293e0b in index_worker (data=<value optimized out>) at index.c:71
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---

Thread 7 (Thread 0x7f49c9067700 (LWP 41137)):
#0  0x00007f49c353f6f0 in sqlite3_threadsafe () from /usr/lib64/libsqlite3.so.0
#1  0x00007f49c37c68a5 in gf_sqlite3_init (args=0x7f49d3318c74, db_conn=0x7f49c40203a0) at gfdb_sqlite3.c:408
#2  0x00007f49c37c061e in init_db (args=0x7f49d3318c74, gfdb_db_type=GFDB_SQLITE3) at gfdb_data_store.c:270
#3  0x00007f49c39cdd1a in init (this=0x7f49c400a780) at changetimerecorder.c:1506
#4  0x00007f49d5d60882 in __xlator_init (xl=0x7f49c400a780) at xlator.c:399
#5  xlator_init (xl=0x7f49c400a780) at xlator.c:423
#6  0x00007f49d5da7901 in glusterfs_graph_init (graph=<value optimized out>) at graph.c:322
#7  0x00007f49d5da7a65 in glusterfs_graph_activate (graph=0x7f49c4000af0, ctx=0x7f49d6697010) at graph.c:669
#8  0x00007f49d622cd4b in glusterfs_process_volfp (ctx=0x7f49d6697010, fp=0x7f49c4002610) at glusterfsd.c:2174
#9  0x00007f49d6234cd5 in mgmt_getspec_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x7f49d37536d4) at glusterfsd-mgmt.c:1560
#10 0x00007f49d5b32445 in rpc_clnt_handle_reply (clnt=0x7f49d66fcff0, pollin=0x7f49c4001850) at rpc-clnt.c:766
#11 0x00007f49d5b338f2 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x7f49d66fd020, event=<value optimized out>, 
    data=<value optimized out>) at rpc-clnt.c:894
#12 0x00007f49d5b2ead8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#13 0x00007f49ca894255 in socket_event_poll_in (this=0x7f49d66feb40) at socket.c:2290
#14 0x00007f49ca895e4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7f49d66feb40, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2403
#15 0x00007f49d5dc7970 in event_dispatch_epoll_handler (data=0x7f49d66ffd00) at event-epoll.c:575
#16 event_dispatch_epoll_worker (data=0x7f49d66ffd00) at event-epoll.c:678
#17 0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f49cbea1700 (LWP 41135)):
#0  0x00007f49d4e52a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49d5daacab in syncenv_task (proc=0x7f49d66c5440) at syncop.c:595
#2  0x00007f49d5dafba0 in syncenv_processor (thdata=0x7f49d66c5440) at syncop.c:687
#3  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f49c0572700 (LWP 41140)):
#0  0x00007f49d4e52a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49c26aa4a0 in iot_worker (data=0x7f49c404fc30) at io-threads.c:181
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f49d6211740 (LWP 41132)):
#0  0x00007f49d4e4f2ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f49d5dc741d in event_dispatch_epoll (event_pool=0x7f49d66b5c90) at event-epoll.c:762
---Type <return> to continue, or q <return> to quit---
#2  0x00007f49d622eef1 in main (argc=19, argv=0x7ffd324d6a48) at glusterfsd.c:2333

Thread 3 (Thread 0x7f49cd2a3700 (LWP 41133)):
#0  0x00007f49d4e55fbd in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f49d5d855ca in gf_timer_proc (ctx=0x7f49d6697010) at timer.c:205
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f49cb4a0700 (LWP 41136)):
#0  0x00007f49d4e52a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49d5daacab in syncenv_task (proc=0x7f49d66c5800) at syncop.c:595
#2  0x00007f49d5dafba0 in syncenv_processor (thdata=0x7f49d66c5800) at syncop.c:687
#3  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f49c15dc700 (LWP 41138)):
#0  0x00007f49d5dc4c40 in gf_client_put (client=0x0, detached=0x0) at client_t.c:299
#1  0x00007f49c1609265 in server_setvolume (req=0x7f49c067406c) at server-handshake.c:715
#2  0x00007f49d5b2cee5 in rpcsvc_handle_rpc_call (svc=<value optimized out>, trans=<value optimized out>, msg=0x7f49bc001800)
    at rpcsvc.c:703
#3  0x00007f49d5b2d123 in rpcsvc_notify (trans=0x7f49bc000920, mydata=<value optimized out>, event=<value optimized out>, 
    data=0x7f49bc001800) at rpcsvc.c:797
#4  0x00007f49d5b2ead8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#5  0x00007f49ca894255 in socket_event_poll_in (this=0x7f49bc000920) at socket.c:2290
#6  0x00007f49ca895e4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7f49bc000920, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2403
#7  0x00007f49d5dc7970 in event_dispatch_epoll_handler (data=0x7f49c4020260) at event-epoll.c:575
#8  event_dispatch_epoll_worker (data=0x7f49c4020260) at event-epoll.c:678
#9  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f49d47b896d in clone () from /lib64/libc.so.6
(gdb) q



Version-Release number of selected component (if applicable):
=============================================================
[root@transformers bricks]# gluster --version
glusterfs 3.7.1 built on Jul  2 2015 21:01:51
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@transformers bricks]# 

How reproducible:
=================
seen once

Steps to Reproduce:
1. Create a distributed disperse volume and nfs mount on the client
2. start IO, dd and mkdir's from the client
3. After some time bring down 2 of the bricks in each distribute (2x(4+2) volume.
4. Bring back the bricks with volume start force and check for the crash.

Actual results:


Expected results:


Additional info:
================
Attaching the core file.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-07-05 06:33:28 EDT ---

This bug is automatically being proposed for Red Hat Gluster Storage 3.1.0 by setting the release flag 'rhgs‑3.1.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Anand Avati on 2015-07-06 06:27:03 EDT ---

REVIEW: http://review.gluster.org/11550 (protocol/server: Add null check to gf_client_put) posted (#1) for review on master by Raghavendra G (rgowdapp@redhat.com)

--- Additional comment from Anand Avati on 2015-07-06 11:15:30 EDT ---

COMMIT: http://review.gluster.org/11550 committed in master by Raghavendra G (rgowdapp@redhat.com) 
------
commit 5547db849770ff79a11a8bc1260478c56e4ffa9c
Author: Raghavendra G <rgowdapp@redhat.com>
Date:   Mon Jul 6 15:45:45 2015 +0530

    protocol/server: Add null check to gf_client_put
    
    Change-Id: I8bab3cd7387f89743e15e7569f0bc83a7df3c754
    BUG: 1240161
    Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
    Reviewed-on: http://review.gluster.org/11550
    Tested-by: NetBSD Build System <jenkins@build.gluster.org>
    Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>

Comment 1 Anand Avati 2015-07-07 10:50:09 UTC
REVIEW: http://review.gluster.org/11562 (protocol/server: Add null check to gf_client_put) posted (#1) for review on release-3.7 by Raghavendra G (rgowdapp@redhat.com)

Comment 2 Anand Avati 2015-07-09 10:10:55 UTC
COMMIT: http://review.gluster.org/11562 committed in release-3.7 by Vijay Bellur (vbellur@redhat.com) 
------
commit 7fe55d6ffc73d614890c3fb9a3139cb7a6236423
Author: Raghavendra G <rgowdapp@redhat.com>
Date:   Mon Jul 6 15:45:45 2015 +0530

    protocol/server: Add null check to gf_client_put
    
    Change-Id: I8bab3cd7387f89743e15e7569f0bc83a7df3c754
    BUG: 1240603
    Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
    Reviewed-on: http://review.gluster.org/11550
    Tested-by: NetBSD Build System <jenkins@build.gluster.org>
    Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
    Reviewed-on: http://review.gluster.org/11562
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Vijay Bellur <vbellur@redhat.com>

Comment 3 Kaushal 2015-07-30 09:50:36 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.