Bug 1283139 - glusterd crashed
Summary: glusterd crashed
Keywords:
Status: CLOSED DUPLICATE of bug 1238067
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: core
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: Anoop
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-18 11:13 UTC by RajeshReddy
Modified: 2016-09-17 14:40 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-19 10:44:01 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description RajeshReddy 2015-11-18 11:13:49 UTC
Description of problem:
============
Glusterd crashed 


Version-Release number of selected component (if applicable):
=============
glusterfs-server-3.7.5-6.


How reproducible:


Steps to Reproduce:
===========
1. Create distributed replica volume and then add 4 hot tier bricks 
2. While doing IO on the volume observed glusterd crash with the fallowing back trace

-rw-------. 1 root root 96M Nov 16 16:04 core.12053.1447670057.dump
[root@rhs-client19 core]# file core.12053.1447670057.dump
core.12053.1447670057.dump: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'
[root@rhs-client19 core]# gdb /usr/sbin/glusterd core.12053.1447670057.dump
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.

warning: core file may not match specified executable file.
[New LWP 12228]
[New LWP 12054]
[New LWP 12227]
[New LWP 12055]
[New LWP 12056]
[New LWP 12057]
[New LWP 12053]

warning: .dynamic section for "/usr/lib64/glusterfs/3.7.5/rpc-transport/socket.so" is not at the expected address (wrong library or version mismatch?)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f36cf1bf0ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
Missing separate debuginfos, use: debuginfo-install userspace-rcu-0.7.9-2.el7rhgs.x86_64
(gdb) bt
#0  0x00007f36cf1bf0ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#1  0x00007f36cf853342 in gd_add_friend_to_dict (friend=0x0, dict=0x0, prefix=0x0) at glusterd-peer-utils.c:578
#2  0x00007f36dac59081 in __glusterfs_this_location () at globals.c:155
#3  0x00007f36cf7a4cdc in glusterd_volume_brickinfo_get (uuid=<optimized out>, hostname=<optimized out>, 
    path=0x7f36bc0a8ca0 "rhs-clie\200Ci\331\066\177", volinfo=0x7f36cf853779 <gd_peerinfo_find_from_hostname+441>, 
    brickinfo=0x7f36cb2f8c28) at glusterd-utils.c:1324
#4  0x00007f36cf7a4d68 in glusterd_volume_brickinfo_get_by_brick (brick=0x3 <Address 0x3 out of bounds>, 
    volinfo=0x7f36dbf40be0, brickinfo=0x7f36cb2f9a10) at glusterd-utils.c:1345
#5  0x00007f36cf77f36f in get_brickinfo_from_brickid (brickinfo=0x7f36cb2f8c28, 
    brickid=0x7f36b41421d0 "ea4bd2c2-efd3-4d25-bbc1-8f6d9c75dafc:rhs-client19.lab.eng.blr.redhat.com:/rhs/brick6/tier")
    at glusterd-handler.c:4805
#6  __glusterd_brick_rpc_notify (rpc=<optimized out>, mydata=0x7f36b41421d0, event=RPC_CLNT_CONNECT, data=<optimized out>)
    at glusterd-handler.c:4842
#7  0x00007f36cf78246c in glusterd_big_locked_notify (rpc=0x3, mydata=0x7f36dbf40be0, event=3408894480, data=0x7f36bc001870, 
    notify_fn=0x7f36dbf40be0) at glusterd-handler.c:66
#8  0x00007f36b41478b0 in ?? ()
#9  0x00007f36dbf45220 in ?? ()
#10 0x00000000ffffffff in ?? ()
#11 0x0000000000000005 in ?? ()
#12 0x00007f36b41482b0 in ?? ()
#13 0x00007f36da9f0d84 in rpc_clnt_destroy (rpc=0x7f36cc2c0800) at rpc-clnt.c:1682
#14 rpc_clnt_notify (trans=<optimized out>, mydata=0x7f36b4147880, event=<optimized out>, data=0x7f36b4147880)
    at rpc-clnt.c:886
#15 0x00007f36da9ec883 in rpc_transport_unref (this=0x5) at rpc-transport.c:528
#16 0x0000000000000000 in ?? ()



Expected results:
=============
Glusterd should not crash


Additional info:
========
[root@rhs-client19 core]# gluster vol info disrep_tier 
 
Volume Name: disrep_tier
Type: Tier
Volume ID: ea4bd2c2-efd3-4d25-bbc1-8f6d9c75dafc
Status: Started
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick5/tier
Brick2: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick5/tier
Brick3: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick6/tier
Brick4: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick6/tier
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick5: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick7/disrep_teri
Brick6: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick7/disrep_teri
Brick7: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick6/disrep_teri
Brick8: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick6/disrep_teri
Options Reconfigured:
cluster.tier-demote-frequency: 600
performance.readdir-ahead: on
features.ctr-enabled: on

Comment 2 Atin Mukherjee 2015-11-19 04:34:56 UTC
Could you attach the sosreport? core file is a mandate to debug any crashes.

Comment 3 RajeshReddy 2015-11-19 06:31:18 UTC
sosreport and core are available @ /home/repo/sosreports/bug.1283139 on rhsqe-repo.lab.eng.blr.redhat.com

Comment 4 Anand Nekkunti 2015-11-19 10:39:23 UTC
Bt in description was showed incorrectly due to mismatch between  core file and specified executable file. 
I got set up from Rajesh Reddy , below is the back trash for crash 

 0x00007f3ab1d9b0ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#1  0x00007f3ab242f342 in gd_peerinfo_find_from_hostname (hoststr=hoststr@entry=0x7f3aa000cd70 "rhs-client19.lab.eng.blr.redhat.com") at glusterd-peer-utils.c:639
#2  0x00007f3ab242f81d in glusterd_peerinfo_find_by_hostname (hoststr=hoststr@entry=0x7f3aa000cd70 "rhs-client19.lab.eng.blr.redhat.com") at glusterd-peer-utils.c:111
#3  0x00007f3ab242fa09 in glusterd_hostname_to_uuid (hostname=hostname@entry=0x7f3aa000cd70 "rhs-client19.lab.eng.blr.redhat.com", uuid=uuid@entry=0x7f3aaded4ba0 "") at glusterd-peer-utils.c:155
#4  0x00007f3ab2380cdc in glusterd_volume_brickinfo_get (uuid=uuid@entry=0x0, hostname=0x7f3aa000cd70 "rhs-client19.lab.eng.blr.redhat.com", path=0x7f3aa000d170 "/rhs/brick4/afr2x2", volinfo=volinfo@entry=0x7f3abfb11ea0, 
    brickinfo=brickinfo@entry=0x7f3aaded4c88) at glusterd-utils.c:1310
#5  0x00007f3ab2380d68 in glusterd_volume_brickinfo_get_by_brick (brick=brick@entry=0x7f3aa0005b85 "rhs-client19.lab.eng.blr.redhat.com:/rhs/brick4/afr2x2", volinfo=0x7f3abfb11ea0, brickinfo=brickinfo@entry=0x7f3aaded4c88)
    at glusterd-utils.c:1354
#6  0x00007f3ab235b36f in get_brickinfo_from_brickid (brickinfo=0x7f3aaded4c88, brickid=0x7f3a9c026e90 "dbf7ab58-21a1-4951-b8ae-44e3aaa4c0ea:rhs-client19.lab.eng.blr.redhat.com:/rhs/brick4/afr2x2") at glusterd-handler.c:4816
#7  __glusterd_brick_rpc_notify (rpc=rpc@entry=0x7f3a9c026f40, mydata=mydata@entry=0x7f3a9c026e90, event=event@entry=RPC_CLNT_DISCONNECT, data=data@entry=0x0) at glusterd-handler.c:4842
#8  0x00007f3ab235e46c in glusterd_big_locked_notify (rpc=0x7f3a9c026f40, mydata=0x7f3a9c026e90, event=RPC_CLNT_DISCONNECT, data=0x0, notify_fn=0x7f3ab235b270 <__glusterd_brick_rpc_notify>) at glusterd-handler.c:71
#9  0x00007f3abd5ccc60 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f3a9c026f70, event=RPC_TRANSPORT_DISCONNECT, data=0x7f3a9c02a0e0) at rpc-clnt.c:874
#10 0x00007f3abd5c8883 in rpc_transport_notify (this=this@entry=0x7f3a9c02a0e0, event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x7f3a9c02a0e0) at rpc-transport.c:545
#11 0x00007f3ab00b13a2 in socket_event_poll_err (this=0x7f3a9c02a0e0) at socket.c:1151
#12 socket_event_handler (fd=fd@entry=24, idx=idx@entry=15, data=0x7f3a9c02a0e0, poll_in=1, poll_out=0, poll_err=<optimized out>) at socket.c:2356
#13 0x00007f3abd85f8ba in event_dispatch_epoll_handler (event=0x7f3aaded4e80, event_pool=0x7f3abfa8bd10) at event-epoll.c:575
#14 event_dispatch_epoll_worker (data=0x7f3abfb078b0) at event-epoll.c:678
#15 0x00007f3abc666df5 in start_thread (arg=0x7f3aaded5700) at pthread_create.c:308
#16 0x00007f3abbfad1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) t a a bt

Thread 7 (Thread 0x7f3aae6d6700 (LWP 25006)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f3ab2401133 in hooks_worker (args=<optimized out>) at glusterd-hooks.c:534
#2  0x00007f3abc666df5 in start_thread (arg=0x7f3aae6d6700) at pthread_create.c:308
#3  0x00007f3abbfad1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 6 (Thread 0x7f3ab36a4700 (LWP 24879)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f3abd841f08 in syncenv_task (proc=proc@entry=0x7f3abfa9acb0) at syncop.c:607
#2  0x00007f3abd842c40 in syncenv_processor (thdata=0x7f3abfa9acb0) at syncop.c:699
#3  0x00007f3abc666df5 in start_thread (arg=0x7f3ab36a4700) at pthread_create.c:308
#4  0x00007f3abbfad1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 5 (Thread 0x7f3ab2ea3700 (LWP 24880)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f3abd841f08 in syncenv_task (proc=proc@entry=0x7f3abfa9b070) at syncop.c:607
#2  0x00007f3abd842c40 in syncenv_processor (thdata=0x7f3abfa9b070) at syncop.c:699
#3  0x00007f3abc666df5 in start_thread (arg=0x7f3ab2ea3700) at pthread_create.c:308
#4  0x00007f3abbfad1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 4 (Thread 0x7f3ab3ea5700 (LWP 24878)):
#0  0x00007f3ab1b94f04 in _fini () from /lib64/liburcu-cds.so.1
#1  0x00007f3abdab3b78 in _dl_fini () at dl-fini.c:258
#2  0x00007f3abbeefe49 in __run_exit_handlers (status=status@entry=0, listp=0x7f3abc2716c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#3  0x00007f3abbeefe95 in __GI_exit (status=status@entry=0) at exit.c:99
#4  0x00007f3abdcd1733 in cleanup_and_exit (signum=<optimized out>) at glusterfsd.c:1293
#5  0x00007f3abdcd1855 in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2014
#6  0x00007f3abc666df5 in start_thread (arg=0x7f3ab3ea5700) at pthread_create.c:308
#7  0x00007f3abbfad1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 3 (Thread 0x7f3ab46a6700 (LWP 24877)):
#0  0x00007f3abc66d99d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f3abd81d944 in gf_timer_proc (ctx=0x7f3abfa6d010) at timer.c:205
#2  0x00007f3abc666df5 in start_thread (arg=0x7f3ab46a6700) at pthread_create.c:308
#3  0x00007f3abbfad1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 2 (Thread 0x7f3abdcb4780 (LWP 24876)):
#0  0x00007f3abc667f27 in pthread_join (threadid=139890002843392, thread_return=thread_return@entry=0x0) at pthread_join.c:92
#1  0x00007f3abd85fc18 in event_dispatch_epoll (event_pool=0x7f3abfa8bd10) at event-epoll.c:762
#2  0x00007f3abdcce747 in main (argc=5, argv=0x7ffef53b2bd8) at glusterfsd.c:2350

Thread 1 (Thread 0x7f3aaded5700 (LWP 25007)):
#0  0x00007f3ab1d9b0ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#1  0x00007f3ab242f342 in gd_peerinfo_find_from_hostname (hoststr=hoststr@entry=0x7f3aa000cd70 "rhs-client19.lab.eng.blr.redhat.com") at glusterd-peer-utils.c:639
#2  0x00007f3ab242f81d in glusterd_peerinfo_find_by_hostname (hoststr=hoststr@entry=0x7f3aa000cd70 "rhs-client19.lab.eng.blr.redhat.com") at glusterd-peer-utils.c:111
#3  0x00007f3ab242fa09 in glusterd_hostname_to_uuid (hostname=hostname@entry=0x7f3aa000cd70 "rhs-client19.lab.eng.blr.redhat.com", uuid=uuid@entry=0x7f3aaded4ba0 "") at glusterd-peer-utils.c:155
#4  0x00007f3ab2380cdc in glusterd_volume_brickinfo_get (uuid=uuid@entry=0x0, hostname=0x7f3aa000cd70 "rhs-client19.lab.eng.blr.redhat.com", path=0x7f3aa000d170 "/rhs/brick4/afr2x2", volinfo=volinfo@entry=0x7f3abfb11ea0, 
    brickinfo=brickinfo@entry=0x7f3aaded4c88) at glusterd-utils.c:1310
#5  0x00007f3ab2380d68 in glusterd_volume_brickinfo_get_by_brick (brick=brick@entry=0x7f3aa0005b85 "rhs-client19.lab.eng.blr.redhat.com:/rhs/brick4/afr2x2", volinfo=0x7f3abfb11ea0, brickinfo=brickinfo@entry=0x7f3aaded4c88)
    at glusterd-utils.c:1354
#6  0x00007f3ab235b36f in get_brickinfo_from_brickid (brickinfo=0x7f3aaded4c88, brickid=0x7f3a9c026e90 "dbf7ab58-21a1-4951-b8ae-44e3aaa4c0ea:rhs-client19.lab.eng.blr.redhat.com:/rhs/brick4/afr2x2") at glusterd-handler.c:4816
#7  __glusterd_brick_rpc_notify (rpc=rpc@entry=0x7f3a9c026f40, mydata=mydata@entry=0x7f3a9c026e90, event=event@entry=RPC_CLNT_DISCONNECT, data=data@entry=0x0) at glusterd-handler.c:4842
#8  0x00007f3ab235e46c in glusterd_big_locked_notify (rpc=0x7f3a9c026f40, mydata=0x7f3a9c026e90, event=RPC_CLNT_DISCONNECT, data=0x0, notify_fn=0x7f3ab235b270 <__glusterd_brick_rpc_notify>) at glusterd-handler.c:71
#9  0x00007f3abd5ccc60 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f3a9c026f70, event=RPC_TRANSPORT_DISCONNECT, data=0x7f3a9c02a0e0) at rpc-clnt.c:874
#10 0x00007f3abd5c8883 in rpc_transport_notify (this=this@entry=0x7f3a9c02a0e0, event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x7f3a9c02a0e0) at rpc-transport.c:545
#11 0x00007f3ab00b13a2 in socket_event_poll_err (this=0x7f3a9c02a0e0) at socket.c:1151
#12 socket_event_handler (fd=fd@entry=24, idx=idx@entry=15, data=0x7f3a9c02a0e0, poll_in=1, poll_out=0, poll_err=<optimized out>) at socket.c:2356
#13 0x00007f3abd85f8ba in event_dispatch_epoll_handler (event=0x7f3aaded4e80, event_pool=0x7f3abfa8bd10) at event-epoll.c:575
#14 event_dispatch_epoll_worker (data=0x7f3abfb078b0) at event-epoll.c:678
#15 0x00007f3abc666df5 in start_thread (arg=0x7f3aaded5700) at pthread_create.c:308
#16 0x00007f3abbfad1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113


From above bt it similar to https://bugzilla.redhat.com/show_bug.cgi?id=1238067 
. crash due to glusterd going down and same time other thread trying to access rcu resource ,this can be confirmed by fini() and rcu_read_lock() function are executing simultaneously (Thread1 and Thread4).

Comment 5 Anand Nekkunti 2015-11-19 10:44:01 UTC

*** This bug has been marked as a duplicate of bug 1238067 ***


Note You need to log in before you can comment on or make changes to this bug.