Bug 1239280 - glusterfsd crashed after volume start force
Summary: glusterfsd crashed after volume start force
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: RHGS 3.1.0
Assignee: Raghavendra G
QA Contact: Bhaskarakiran
URL:
Whiteboard:
: 1240957 (view as bug list)
Depends On:
Blocks: 1202842 1223636 1240161 1240603
TreeView+ depends on / blocked
 
Reported: 2015-07-05 10:33 UTC by Bhaskarakiran
Modified: 2016-11-23 23:11 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.7.1-8
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1240161 (view as bug list)
Environment:
Last Closed: 2015-07-29 05:09:27 UTC
Embargoed:


Attachments (Terms of Use)
core file (696.67 KB, application/zip)
2015-07-05 10:33 UTC, Bhaskarakiran
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1495 0 normal SHIPPED_LIVE Important: Red Hat Gluster Storage 3.1 update 2015-07-29 08:26:26 UTC

Description Bhaskarakiran 2015-07-05 10:33:27 UTC
Created attachment 1046191 [details]
core file

Description of problem:
======================

While verifying one of the bug wherein nfs mount hung after volume start force, see the crash after the volume is started failing 2 of the bricks in each distribute in a distributed disperse volume. One of the brick failed to start.

Backtrace:
==========
(gdb) bt
#0  0x00007f49d5dc4c40 in gf_client_put (client=0x0, detached=0x0) at client_t.c:299
#1  0x00007f49c1609265 in server_setvolume (req=0x7f49c067406c) at server-handshake.c:715
#2  0x00007f49d5b2cee5 in rpcsvc_handle_rpc_call (svc=<value optimized out>, trans=<value optimized out>, msg=0x7f49bc001800)
    at rpcsvc.c:703
#3  0x00007f49d5b2d123 in rpcsvc_notify (trans=0x7f49bc000920, mydata=<value optimized out>, event=<value optimized out>, 
    data=0x7f49bc001800) at rpcsvc.c:797
#4  0x00007f49d5b2ead8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#5  0x00007f49ca894255 in socket_event_poll_in (this=0x7f49bc000920) at socket.c:2290
#6  0x00007f49ca895e4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7f49bc000920, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2403
#7  0x00007f49d5dc7970 in event_dispatch_epoll_handler (data=0x7f49c4020260) at event-epoll.c:575
#8  event_dispatch_epoll_worker (data=0x7f49c4020260) at event-epoll.c:678
#9  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f49d47b896d in clone () from /lib64/libc.so.6
(gdb) t a a bt
(gdb) t a a bt

Thread 14 (Thread 0x7f49cc8a2700 (LWP 41134)):
#0  0x00007f49d4e56535 in sigwait () from /lib64/libpthread.so.0
#1  0x00007f49d622d02b in glusterfs_sigwaiter (arg=<value optimized out>) at glusterfsd.c:1989
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f49b17fb700 (LWP 41145)):
#0  0x00007f49d47b13e3 in select () from /lib64/libc.so.6
#1  0x00007f49c33260ba in changelog_ev_dispatch (data=0x7f49c40720d0) at changelog-ev-handle.c:335
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f49b35fe700 (LWP 41142)):
#0  0x00007f49d4e5263c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49c3326373 in changelog_ev_connector (data=0x7f49c40720d0) at changelog-ev-handle.c:193
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f49b2bfd700 (LWP 41143)):
#0  0x00007f49d47b13e3 in select () from /lib64/libc.so.6
#1  0x00007f49c33260ba in changelog_ev_dispatch (data=0x7f49c40720d0) at changelog-ev-handle.c:335
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f49b21fc700 (LWP 41144)):
#0  0x00007f49d47b13e3 in select () from /lib64/libc.so.6
#1  0x00007f49c33260ba in changelog_ev_dispatch (data=0x7f49c40720d0) at changelog-ev-handle.c:335
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f49b3fff700 (LWP 41141)):
#0  0x00007f49d4e5263c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49c2eef6f3 in br_stub_signth (arg=0x7f49c4064a60) at bit-rot-stub.c:649
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f49c0673700 (LWP 41139)):
#0  0x00007f49d4e5263c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49c2293e0b in index_worker (data=<value optimized out>) at index.c:71
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---

Thread 7 (Thread 0x7f49c9067700 (LWP 41137)):
#0  0x00007f49c353f6f0 in sqlite3_threadsafe () from /usr/lib64/libsqlite3.so.0
#1  0x00007f49c37c68a5 in gf_sqlite3_init (args=0x7f49d3318c74, db_conn=0x7f49c40203a0) at gfdb_sqlite3.c:408
#2  0x00007f49c37c061e in init_db (args=0x7f49d3318c74, gfdb_db_type=GFDB_SQLITE3) at gfdb_data_store.c:270
#3  0x00007f49c39cdd1a in init (this=0x7f49c400a780) at changetimerecorder.c:1506
#4  0x00007f49d5d60882 in __xlator_init (xl=0x7f49c400a780) at xlator.c:399
#5  xlator_init (xl=0x7f49c400a780) at xlator.c:423
#6  0x00007f49d5da7901 in glusterfs_graph_init (graph=<value optimized out>) at graph.c:322
#7  0x00007f49d5da7a65 in glusterfs_graph_activate (graph=0x7f49c4000af0, ctx=0x7f49d6697010) at graph.c:669
#8  0x00007f49d622cd4b in glusterfs_process_volfp (ctx=0x7f49d6697010, fp=0x7f49c4002610) at glusterfsd.c:2174
#9  0x00007f49d6234cd5 in mgmt_getspec_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x7f49d37536d4) at glusterfsd-mgmt.c:1560
#10 0x00007f49d5b32445 in rpc_clnt_handle_reply (clnt=0x7f49d66fcff0, pollin=0x7f49c4001850) at rpc-clnt.c:766
#11 0x00007f49d5b338f2 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x7f49d66fd020, event=<value optimized out>, 
    data=<value optimized out>) at rpc-clnt.c:894
#12 0x00007f49d5b2ead8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#13 0x00007f49ca894255 in socket_event_poll_in (this=0x7f49d66feb40) at socket.c:2290
#14 0x00007f49ca895e4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7f49d66feb40, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2403
#15 0x00007f49d5dc7970 in event_dispatch_epoll_handler (data=0x7f49d66ffd00) at event-epoll.c:575
#16 event_dispatch_epoll_worker (data=0x7f49d66ffd00) at event-epoll.c:678
#17 0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f49cbea1700 (LWP 41135)):
#0  0x00007f49d4e52a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49d5daacab in syncenv_task (proc=0x7f49d66c5440) at syncop.c:595
#2  0x00007f49d5dafba0 in syncenv_processor (thdata=0x7f49d66c5440) at syncop.c:687
#3  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f49c0572700 (LWP 41140)):
#0  0x00007f49d4e52a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49c26aa4a0 in iot_worker (data=0x7f49c404fc30) at io-threads.c:181
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f49d6211740 (LWP 41132)):
#0  0x00007f49d4e4f2ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f49d5dc741d in event_dispatch_epoll (event_pool=0x7f49d66b5c90) at event-epoll.c:762
---Type <return> to continue, or q <return> to quit---
#2  0x00007f49d622eef1 in main (argc=19, argv=0x7ffd324d6a48) at glusterfsd.c:2333

Thread 3 (Thread 0x7f49cd2a3700 (LWP 41133)):
#0  0x00007f49d4e55fbd in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f49d5d855ca in gf_timer_proc (ctx=0x7f49d6697010) at timer.c:205
#2  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f49cb4a0700 (LWP 41136)):
#0  0x00007f49d4e52a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49d5daacab in syncenv_task (proc=0x7f49d66c5800) at syncop.c:595
#2  0x00007f49d5dafba0 in syncenv_processor (thdata=0x7f49d66c5800) at syncop.c:687
#3  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f49d47b896d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f49c15dc700 (LWP 41138)):
#0  0x00007f49d5dc4c40 in gf_client_put (client=0x0, detached=0x0) at client_t.c:299
#1  0x00007f49c1609265 in server_setvolume (req=0x7f49c067406c) at server-handshake.c:715
#2  0x00007f49d5b2cee5 in rpcsvc_handle_rpc_call (svc=<value optimized out>, trans=<value optimized out>, msg=0x7f49bc001800)
    at rpcsvc.c:703
#3  0x00007f49d5b2d123 in rpcsvc_notify (trans=0x7f49bc000920, mydata=<value optimized out>, event=<value optimized out>, 
    data=0x7f49bc001800) at rpcsvc.c:797
#4  0x00007f49d5b2ead8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#5  0x00007f49ca894255 in socket_event_poll_in (this=0x7f49bc000920) at socket.c:2290
#6  0x00007f49ca895e4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7f49bc000920, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2403
#7  0x00007f49d5dc7970 in event_dispatch_epoll_handler (data=0x7f49c4020260) at event-epoll.c:575
#8  event_dispatch_epoll_worker (data=0x7f49c4020260) at event-epoll.c:678
#9  0x00007f49d4e4ea51 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f49d47b896d in clone () from /lib64/libc.so.6
(gdb) q



Version-Release number of selected component (if applicable):
=============================================================
[root@transformers bricks]# gluster --version
glusterfs 3.7.1 built on Jul  2 2015 21:01:51
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@transformers bricks]# 

How reproducible:
=================
seen once

Steps to Reproduce:
1. Create a distributed disperse volume and nfs mount on the client
2. start IO, dd and mkdir's from the client
3. After some time bring down 2 of the bricks in each distribute (2x(4+2) volume.
4. Bring back the bricks with volume start force and check for the crash.

Actual results:


Expected results:


Additional info:
================
Attaching the core file.

Comment 4 Raghavendra G 2015-07-09 10:44:35 UTC
*** Bug 1240957 has been marked as a duplicate of this bug. ***

Comment 5 Bhaskarakiran 2015-07-13 11:28:15 UTC
Have verified on 3.7.1-9 build and didn't see the issue. While the IO is running on the nfs mount, brought down 2 of the bricks and brought them up with volume start force. Didn't see the crash. Moving this to fixed.

Comment 6 errata-xmlrpc 2015-07-29 05:09:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html


Note You need to log in before you can comment on or make changes to this bug.