Bug 1722829 - glusterd crashed while regaining quorum for the volume
Summary: glusterd crashed while regaining quorum for the volume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.5
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: RHGS 3.5.0
Assignee: Sanju
QA Contact: Bala Konda Reddy M
URL:
Whiteboard:
Depends On:
Blocks: 1696809
TreeView+ depends on / blocked
 
Reported: 2019-06-21 12:25 UTC by Kshithij Iyer
Modified: 2019-10-30 12:22 UTC (History)
8 users (show)

Fixed In Version: glusterfs-6.0-7
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-30 12:22:00 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:3249 0 None None None 2019-10-30 12:22:22 UTC

Description Kshithij Iyer 2019-06-21 12:25:51 UTC
Description of problem:
glusterd crashed while regaining quorum for the volume.

● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: failed (Result: signal) since Thu 2019-06-20 18:57:27 IST; 4min 3s ago
     Docs: man:glusterd(8)
  Process: 1817 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1827 (code=killed, signal=ABRT)
    Tasks: 0
   CGroup: /system.slice/glusterd.service

Jun 20 18:57:27 dhcp35-110.lab.eng.blr.redhat.com glusterd[1827]: setfsid 1
Jun 20 18:57:27 dhcp35-110.lab.eng.blr.redhat.com glusterd[1827]: spinlock 1
Jun 20 18:57:27 dhcp35-110.lab.eng.blr.redhat.com glusterd[1827]: epoll.h 1
Jun 20 18:57:27 dhcp35-110.lab.eng.blr.redhat.com glusterd[1827]: xattr.h 1
Jun 20 18:57:27 dhcp35-110.lab.eng.blr.redhat.com glusterd[1827]: st_atim....
Jun 20 18:57:27 dhcp35-110.lab.eng.blr.redhat.com glusterd[1827]: package-...
Jun 20 18:57:27 dhcp35-110.lab.eng.blr.redhat.com glusterd[1827]: ---------
Jun 20 18:57:27 dhcp35-110.lab.eng.blr.redhat.com systemd[1]: glusterd.ser...
Jun 20 18:57:27 dhcp35-110.lab.eng.blr.redhat.com systemd[1]: Unit gluster...
Jun 20 18:57:27 dhcp35-110.lab.eng.blr.redhat.com systemd[1]: glusterd.ser...
Hint: Some lines were ellipsized, use -l to show in full.
################################################################################
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal 6, Aborted.
#0  0x00007f09052a1377 in raise () from /lib64/libc.so.6
################################################################################
(gdb) bt
#0  0x00007f09052a1377 in raise () from /lib64/libc.so.6
#1  0x00007f09052a2a68 in abort () from /lib64/libc.so.6
#2  0x00007f09052e3ec7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f09052ea804 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f09052ec8fb in _int_free () from /lib64/libc.so.6
#5  0x00007f08fada4a57 in glusterd_brickinfo_delete (
    brickinfo=0x7f08e46bad60) at glusterd-utils.c:1006
#6  0x00007f08fada4b15 in glusterd_volume_brickinfos_delete (
    volinfo=volinfo@entry=0x7f08e469c440) at glusterd-utils.c:1024
#7  0x00007f08fada4c3e in glusterd_volinfo_delete (
    volinfo=volinfo@entry=0x7f08e469c440) at glusterd-utils.c:1052
#8  0x00007f08fada4e38 in glusterd_volinfo_unref (
    volinfo=volinfo@entry=0x7f08e469c440) at glusterd-utils.c:640
#9  0x00007f08fada4e73 in glusterd_volinfo_remove (
    volinfo=volinfo@entry=0x7f08e469c440) at glusterd-utils.c:1038
#10 0x00007f08fadb6784 in glusterd_delete_volume (volinfo=0x7f08e469c440)
    at glusterd-utils.c:8324
#11 0x00007f08fae33ea6 in glusterd_op_delete_volume (
    dict=dict@entry=0x7f08ec4b1488) at glusterd-volume-ops.c:2948
#12 0x00007f08fad9684c in glusterd_op_commit_perform (
    op=GD_OP_DELETE_VOLUME, dict=dict@entry=0x7f08ec4b1488,
    op_errstr=op_errstr@entry=0x7f08e84028e8,
    rsp_dict=rsp_dict@entry=0x7f08ec453848) at glusterd-op-sm.c:6124
#13 0x00007f08fada07d2 in glusterd_op_ac_commit_op (event=0x7f08ec20a3c0,
    ctx=0x7f08ec02c470) at glusterd-op-sm.c:5860
---Type <return> to continue, or q <return> to quit---
#14 0x00007f08fad9d514 in glusterd_op_sm () at glusterd-op-sm.c:8210
#15 0x00007f08fad7619e in __glusterd_handle_commit_op (
    req=req@entry=0x7f08e800a6e8) at glusterd-handler.c:1176
#16 0x00007f08fad7ddce in glusterd_big_locked_handler (
    req=0x7f08e800a6e8,
    actor_fn=0x7f08fad76010 <__glusterd_handle_commit_op>)
    at glusterd-handler.c:83
#17 0x00007f0906ca4610 in synctask_wrap () at syncop.c:367
#18 0x00007f09052b3180 in ?? () from /lib64/libc.so.6
#19 0x0000000000000000 in ?? ()
################################################################################
(gdb) t a a bt

Thread 9 (Thread 0x7f08fda3f700 (LWP 1829)):
#0  0x00007f0905aab3c1 in sigwait () from /lib64/libpthread.so.0
#1  0x000055b3b926043b in glusterfs_sigwaiter (arg=<optimized out>)
    at glusterfsd.c:2370
#2  0x00007f0905aa3ea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f09053698cd in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f08f7d89700 (LWP 2153)):
#0  0x00007f0905aa7a35 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x00007f08fae476ab in hooks_worker (args=<optimized out>)
    at glusterd-hooks.c:527
#2  0x00007f0905aa3ea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f09053698cd in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f0907164780 (LWP 1827)):
#0  0x00007f0905aa5017 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f0906cca608 in event_dispatch_epoll (event_pool=0x55b3bb02a5b0)
    at event-epoll.c:846
#2  0x000055b3b925c9b5 in main (argc=5, argv=<optimized out>)
    at glusterfsd.c:2866

Thread 6 (Thread 0x7f08f6d87700 (LWP 4019)):
#0  0x00007f0905aa7de2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
---Type <return> to continue, or q <return> to quit---
   from /lib64/libpthread.so.0
#1  0x00007f0906ca69a0 in syncenv_task (proc=proc@entry=0x55b3bb033390)
    at syncop.c:612
#2  0x00007f0906ca7850 in syncenv_processor (thdata=0x55b3bb033390)
    at syncop.c:679
#3  0x00007f0905aa3ea5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f09053698cd in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f08fe240700 (LWP 1828)):
#0  0x00007f0905aaae9d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f0906c74426 in gf_timer_proc (data=0x55b3bb032480)
    at timer.c:194
#2  0x00007f0905aa3ea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f09053698cd in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f08f7588700 (LWP 2154)):
#0  0x00007f0905369ea3 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f0906ccb1f0 in event_dispatch_epoll_worker (
    data=0x55b3c02fb830) at event-epoll.c:751
#2  0x00007f0905aa3ea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f09053698cd in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f08fd23e700 (LWP 1830)):
#0  0x00007f090533084d in nanosleep () from /lib64/libc.so.6
#1  0x00007f09053306e4 in sleep () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
#2  0x00007f0906c916ad in pool_sweeper (arg=<optimized out>)
    at mem-pool.c:454
#3  0x00007f0905aa3ea5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f09053698cd in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f08fba3b700 (LWP 1833)):
#0  0x00007f0905360993 in select () from /lib64/libc.so.6
#1  0x00007f0906ce5994 in runner (arg=0x55b3bb036d70)
    at ../../contrib/timer-wheel/timer-wheel.c:186
#2  0x00007f0905aa3ea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f09053698cd in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f08fc23c700 (LWP 1832)):
#0  0x00007f09052a1377 in raise () from /lib64/libc.so.6
#1  0x00007f09052a2a68 in abort () from /lib64/libc.so.6
#2  0x00007f09052e3ec7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f09052ea804 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f09052ec8fb in _int_free () from /lib64/libc.so.6
#5  0x00007f08fada4a57 in glusterd_brickinfo_delete (
    brickinfo=0x7f08e46bad60) at glusterd-utils.c:1006
#6  0x00007f08fada4b15 in glusterd_volume_brickinfos_delete (
    volinfo=volinfo@entry=0x7f08e469c440) at glusterd-utils.c:1024
#7  0x00007f08fada4c3e in glusterd_volinfo_delete (
    volinfo=volinfo@entry=0x7f08e469c440) at glusterd-utils.c:1052
#8  0x00007f08fada4e38 in glusterd_volinfo_unref (
---Type <return> to continue, or q <return> to quit---
    volinfo=volinfo@entry=0x7f08e469c440) at glusterd-utils.c:640
#9  0x00007f08fada4e73 in glusterd_volinfo_remove (
    volinfo=volinfo@entry=0x7f08e469c440) at glusterd-utils.c:1038
#10 0x00007f08fadb6784 in glusterd_delete_volume (volinfo=0x7f08e469c440)
    at glusterd-utils.c:8324
#11 0x00007f08fae33ea6 in glusterd_op_delete_volume (
    dict=dict@entry=0x7f08ec4b1488) at glusterd-volume-ops.c:2948
#12 0x00007f08fad9684c in glusterd_op_commit_perform (
    op=GD_OP_DELETE_VOLUME, dict=dict@entry=0x7f08ec4b1488,
    op_errstr=op_errstr@entry=0x7f08e84028e8,
    rsp_dict=rsp_dict@entry=0x7f08ec453848) at glusterd-op-sm.c:6124
#13 0x00007f08fada07d2 in glusterd_op_ac_commit_op (event=0x7f08ec20a3c0,
    ctx=0x7f08ec02c470) at glusterd-op-sm.c:5860
#14 0x00007f08fad9d514 in glusterd_op_sm () at glusterd-op-sm.c:8210
#15 0x00007f08fad7619e in __glusterd_handle_commit_op (
    req=req@entry=0x7f08e800a6e8) at glusterd-handler.c:1176
#16 0x00007f08fad7ddce in glusterd_big_locked_handler (
    req=0x7f08e800a6e8,
    actor_fn=0x7f08fad76010 <__glusterd_handle_commit_op>)
    at glusterd-handler.c:83
#17 0x00007f0906ca4610 in synctask_wrap () at syncop.c:367
#18 0x00007f09052b3180 in ?? () from /lib64/libc.so.6
#19 0x0000000000000000 in ?? ()
################################################################################
Version-Release number of selected component (if applicable):
glusterfs-6.0-6

How reproducible:
3/5

Steps to Reproduce:
https://github.com/gluster/glusto-tests/blob/master/tests/functional/glusterd/test_quorum_syslog.py

Actual results:
glusterd crashed on one node with core. 

Expected results:
glusterd shouldn't crash

Additional info:

Comment 9 SATHEESARAN 2019-06-28 02:05:44 UTC
@Kshithij, Good that the sosreports are working.
Thanks for collecting all the sosreports.

I see there are around 11 sosreports. As this is too much of informatrion, tt would be easy(quicker) for anyone, 
if you directly attach the glusterd.log from the machine, glusterd was crashed.
This is addition to the sosreports that you made available.

And also point to the exact machine( with details ) and sosreport collected from that machine, 
where glusterd has crashed.

Comment 10 Atin Mukherjee 2019-06-28 04:41:18 UTC
Sanju mentioned to me yesterday that this is no longer reproducible with latest RHGS 3.5.0 HEAD. Does that still stand Sanju? And if so, I'd request to run this test with latest glusterfs-6.0-7 build.

Comment 18 errata-xmlrpc 2019-10-30 12:22:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249


Note You need to log in before you can comment on or make changes to this bug.