Bug 1461695 - glusterd crashed and core dumped, when the network interface is down
glusterd crashed and core dumped, when the network interface is down
Status: POST
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
3.2
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Atin Mukherjee
SATHEESARAN
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-15 04:31 EDT by SATHEESARAN
Modified: 2017-07-31 02:34 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
glusterd coredump from the node (333.13 KB, application/octet-stream)
2017-06-15 04:53 EDT, SATHEESARAN
no flags Details

  None (edit)
Description SATHEESARAN 2017-06-15 04:31:42 EDT
Description of problem:
-----------------------
In the node that hosts the bricks for gluster volumes, when the network interface is down and glusterd is restarted, glusterd crashes and coredumps

Version-Release number of selected component (if applicable):
--------------------------------------------------------------
RHGS 3.1.3 ( glusterfs-3.7.9-12.elrhgs )
RHGS 3.2.0 ( glusterfs-3.8.4-18.el7rhgs )
RHGS 3.2.0 async ( glusterfs-3.8.4-18.4.el7rhgs )

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Create a Trusted Storage Pool ( gluster cluster )
2. Create a volume of any type
3. Select the node in the cluster that hosts the 'brick'
4. Using console access of the node,bring down the network interface on that node.
5. Restart glusterd on that node

Actual results:
---------------
glusterd crashed and coredumped

Expected results:
-----------------
glusterd should not crash on restart with such occasion of network interface down
Comment 1 SATHEESARAN 2017-06-15 04:47:52 EDT
gdb backtrace
--------------

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal 11, Segmentation fault.
#0  x86_64_fallback_frame_state (context=0x7f91c95eee00, context=0x7f91c95eee00, fs=0x7f91c95eeef0) at ./md-unwind-support.h:58
58	  if (*(unsigned char *)(pc+0) == 0x48


gdb backtrace from all threads
------------------------------
(gdb) t a a bt

Thread 7 (Thread 0x7f91d9e91780 (LWP 14715)):
#0  0x00007f91d882143d in write () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f91d99e1475 in sys_write (fd=<optimized out>, buf=<optimized out>, count=<optimized out>) at syscall.c:270
#2  0x00007f91d9eaf539 in glusterfs_process_volfp (ctx=ctx@entry=0x7f91dac5b010, fp=fp@entry=0x7f91daca52d0) at glusterfsd.c:2299
#3  0x00007f91d9eaf69d in glusterfs_volumes_init (ctx=ctx@entry=0x7f91dac5b010) at glusterfsd.c:2336
#4  0x00007f91d9eabace in main (argc=5, argv=<optimized out>) at glusterfsd.c:2448

Thread 6 (Thread 0x7f91cec83700 (LWP 14719)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f91d99f2538 in syncenv_task (proc=proc@entry=0x7f91daca1530) at syncop.c:603
#2  0x00007f91d99f3380 in syncenv_processor (thdata=0x7f91daca1530) at syncop.c:695
#3  0x00007f91d881adc5 in start_thread (arg=0x7f91cec83700) at pthread_create.c:308
#4  0x00007f91d815f76d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 5 (Thread 0x7f91c9df3700 (LWP 14934)):
#0  0x00007f91d8154e2d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f91cb59dda9 in poll (__timeout=-1, __nfds=2, __fds=0x7f91c9df2e80) at /usr/include/bits/poll2.h:46
#2  socket_poller (ctx=0x7f91dad607d0) at socket.c:2500
#3  0x00007f91d881adc5 in start_thread (arg=0x7f91c9df3700) at pthread_create.c:308
#4  0x00007f91d815f76d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 4 (Thread 0x7f91d0486700 (LWP 14716)):
#0  0x00007f91d8821bdd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f91d99c6fe6 in gf_timer_proc (data=0x7f91daca0b70) at timer.c:176
#2  0x00007f91d881adc5 in start_thread (arg=0x7f91d0486700) at pthread_create.c:308
#3  0x00007f91d815f76d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 3 (Thread 0x7f91cfc85700 (LWP 14717)):
#0  0x00007f91d8822101 in do_sigwait (sig=0x7f91cfc84e1c, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:61
#1  __sigwait (set=set@entry=0x7f91cfc84e20, sig=sig@entry=0x7f91cfc84e1c) at ../sysdeps/unix/sysv/linux/sigwait.c:99
#2  0x00007f91d9eaebfb in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2055
#3  0x00007f91d881adc5 in start_thread (arg=0x7f91cfc85700) at pthread_create.c:308
#4  0x00007f91d815f76d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 2 (Thread 0x7f91cf484700 (LWP 14718)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f91d99f2538 in syncenv_task (proc=proc@entry=0x7f91daca1170) at syncop.c:603
#2  0x00007f91d99f3380 in syncenv_processor (thdata=0x7f91daca1170) at syncop.c:695
#3  0x00007f91d881adc5 in start_thread (arg=0x7f91cf484700) at pthread_create.c:308
#4  0x00007f91d815f76d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 1 (Thread 0x7f91c95f2700 (LWP 15480)):
#0  x86_64_fallback_frame_state (context=0x7f91c95eee00, context=0x7f91c95eee00, fs=0x7f91c95eeef0) at ./md-unwind-support.h:58
#1  uw_frame_state_for (context=context@entry=0x7f91c95eee00, fs=fs@entry=0x7f91c95eeef0) at ../../../libgcc/unwind-dw2.c:1253
#2  0x00007f91cc50a019 in _Unwind_Backtrace (trace=0x7f91d81734f0 <backtrace_helper>, trace_argument=0x7f91c95ef0b0) at ../../../libgcc/unwind.inc:290
#3  0x00007f91d8173666 in __GI___backtrace (array=array@entry=0x7f91c95ef0f0, size=size@entry=200) at ../sysdeps/x86_64/backtrace.c:109
#4  0x00007f91d99b9ce2 in _gf_msg_backtrace_nomem (level=level@entry=GF_LOG_ALERT, stacksize=stacksize@entry=200) at logging.c:1094
#5  0x00007f91d99c3884 in gf_print_trace (signum=<optimized out>, ctx=<optimized out>) at common-utils.c:755
---Type <return> to continue, or q <return> to quit---
#6  <signal handler called>
#7  strchrnul () at ../sysdeps/x86_64/strchrnul.S:33
#8  0x00007f91d80af1c2 in __find_specmb (format=0x7f91ce232210 <Address 0x7f91ce232210 out of bounds>) at printf-parse.h:109
#9  _IO_vfprintf_internal (s=s@entry=0x7f91c95f07e0, format=format@entry=0x7f91ce232210 <Address 0x7f91ce232210 out of bounds>, ap=ap@entry=0x7f91c95f09d8) at vfprintf.c:1308
#10 0x00007f91d8176a45 in __GI___vasprintf_chk (result_ptr=result_ptr@entry=0x7f91c95f09b8, flags=flags@entry=1, 
    format=format@entry=0x7f91ce232210 <Address 0x7f91ce232210 out of bounds>, args=args@entry=0x7f91c95f09d8) at vasprintf_chk.c:66
#11 0x00007f91d99bad54 in vasprintf (__ap=0x7f91c95f09d8, __fmt=0x7f91ce232210 <Address 0x7f91ce232210 out of bounds>, __ptr=0x7f91c95f09b8) at /usr/include/bits/stdio2.h:210
#12 _gf_msg (domain=0x7f91dacaa4c0 "management", file=0x7f91ce253f3a <Address 0x7f91ce253f3a out of bounds>, function=0x7f91ce2543b0 <Address 0x7f91ce2543b0 out of bounds>, line=664, 
    level=GF_LOG_ERROR, errnum=22, trace=1, msgid=101172, fmt=0x7f91ce232210 <Address 0x7f91ce232210 out of bounds>) at logging.c:2069
#13 0x00007f91ce20f3ae in ?? ()
#14 0x00007f9100000001 in ?? ()
#15 0x0000000000018b34 in ?? ()
#16 0x00007f91ce232210 in ?? ()
#17 0x0000000000000000 in ?? ()
(gdb) [Thread debugging using libthread_db enabled]
Undefined command: "".  Try "help".
(gdb) Using host libthread_db library "/lib64/libthread_db.so.1".
Undefined command: "Using".  Try "help".
(gdb) Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
/root/was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.: No such file or directory.
(gdb) Program terminated with signal 11, Segmentation fault.
Undefined command: "Program".  Try "help".
(gdb) #0  x86_64_fallback_frame_state (context=0x7f91c95eee00, context=0x7f91c95eee00, fs=0x7f91c95eeef0) at ./md-unwind-support.h:58
(gdb) 58  if (*(unsigned char *)(pc+0) == 0x48
Undefined command: "58".  Try "help".
(gdb)
Comment 2 SATHEESARAN 2017-06-15 04:48:55 EDT
snip from glusterd logs:
------------------------
[2017-06-14 08:22:10.739716] I [MSGID: 106004] [glusterd-handler.c:5808:__glusterd_peer_rpc_notify] 0-management: Peer <10.70.36.74> (<0c2f8929-3a24-4b33-95ea-9810b98f0027>), in state <
Peer in Cluster>, has disconnected from glusterd.
[2017-06-14 08:22:10.740199] C [MSGID: 106002] [glusterd-server-quorum.c:347:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume AppDisksVol. Stopping local br
icks.
[2017-06-14 08:22:10.831748] C [MSGID: 106002] [glusterd-server-quorum.c:347:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume BootDisksVol. Stopping local b
ricks.
[2017-06-14 08:22:10.831815] E [MSGID: 106187] [glusterd-store.c:4417:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore
[2017-06-14 08:22:10.831871] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2017-06-14 08:22:10.831885] E [MSGID: 101066] [graph.c:324:glusterfs_graph_init] 0-management: initializing translator failed
[2017-06-14 08:22:10.831896] E [MSGID: 101176] [graph.c:673:glusterfs_graph_activate] 0-graph: init failed
[2017-06-14 08:22:10.831919] E [glusterd-peer-utils.c:153:glusterd_hostname_to_uuid] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x554a4) [0x7f3f69c9f4a4] -->/usr/lib64/glus
terfs/3.8.4/xlator/mgmt/glusterd.so(+0x43be0) [0x7f3f69c8dbe0] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x102ad6) [0x7f3f69d4cad6] ) 0-: Assertion failed: priv
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2017-06-14 08:22:10
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8.4
Comment 3 SATHEESARAN 2017-06-15 04:53:08 EDT
Created attachment 1287961 [details]
glusterd coredump from the node
Comment 6 Gaurav Yadav 2017-07-31 02:34:15 EDT
As Atin has already mentioned that glusterd is not able to resolve the bricks on glusterd restart when network interface is down.

We have a similar kind of bug 1472267, which has been addressed and below is the upstream patch for the same.
Upstream patch: https://review.gluster.org/#/c/17813/

Note You need to log in before you can comment on or make changes to this bug.