1598345 – gluster get-state command is crashing glusterd process when geo-replication is configured

Bug 1598345 - gluster get-state command is crashing glusterd process when geo-replication is configured

Summary: gluster get-state command is crashing glusterd process when geo-replication i...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Sanju
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1626043 (view as bug list)
Depends On:	1578716
Blocks:	1517422 1518276
TreeView+	depends on / blocked

Reported:	2018-07-05 07:46 UTC by Sanju
Modified:	2020-01-09 17:42 UTC (History)
CC List:	13 users (show)
Fixed In Version:	glusterfs-6.0
Clone Of:	1578716
Environment:
Last Closed:	2018-10-23 15:13:25 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Sanju 2018-07-05 07:50:04 UTC

Description of problem:
  If I try to import Gluster cluster with configured geo-replication into 
  RHGS WA, glusterd (on the master side of geo-replication) immediately
  crashes.
How reproducible:
  100%


Steps to Reproduce:
1. Prepare two Gluster clusters with 6 storage nodes peer one cluster
    (usm1* and usm2* clusters in my case)
2. Create one Distributed-Replicated volume on each cluster.
    (named volume_alpha_distrep_6x2 in my case, see gdeploy config [1])
3. Configure geo-replication between the volumes.
    I've used following gdeploy config:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  # cat geo-replication.conf
    [hosts]
    usm1-gl1.example.com

    [geo-replication]
    action=create
    mastervol=usm1-gl1.example.com:volume_alpha_distrep_6x2
    slavevol=usm2-gl1.example.com:volume_alpha_distrep_6x2
    slavenodes=usm2-gl1.example.com,usm2-gl2.example.com,usm2-gl3.example.com,usm2-gl4.example.com,usm2-gl5.example.com,usm2-gl6.example.com
    force=yes

  # gdeploy -c geo-replication.conf 
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  

4. Install and configure RHGS WA (aka Tendrl) Server.
5. Start Import process for the first gluster cluster into RHGS WA.
6. Check glusterd state on the storage servers.

Actual results:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  # systemctl status glusterd
    ● glusterd.service - GlusterFS, a clustered file-system server
       Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
       Active: failed (Result: signal) since Wed 2018-05-16 04:26:06 EDT; 35min ago
     Main PID: 12035 (code=killed, signal=ABRT)
       CGroup: /system.slice/glusterd.service
               ├─15243 /usr/sbin/glusterfsd -s usm1-gl1.example.com --volfile-id volume_alpha_distrep_6x2.usm1-gl1.u...
               ├─18072 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glu...
               ├─18280 /usr/bin/python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/mnt/brick_alpha_distrep_1/1 --path=/mnt/brick_alpha...
               ├─18341 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/mnt/brick_alpha_distrep_1/1 --path=/mnt/brick_alpha_distrep_...
               ├─18342 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/mnt/brick_alpha_distrep_1/1 --path=/mnt/brick_alpha_distrep_...
               ├─18345 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/mnt/brick_alpha_distrep_1/1 --path=/mnt/brick_alpha_distrep_...
               ├─18346 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/mnt/brick_alpha_distrep_1/1 --path=/mnt/brick_alpha_distrep_...
               ├─18359 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMast...
               ├─18381 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMast...
               ├─18395 /usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-replication/volume_alpha_distrep_6x2/ssh%3A%2F%2F...
               └─18396 /usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-replication/volume_alpha_distrep_6x2/ssh%3A%2F%2F...

    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: setfsid 1
    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: spinlock 1
    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: epoll.h 1
    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: xattr.h 1
    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: st_atim.tv_nsec 1
    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: package-string: glusterfs 3.12.2
    May 16 04:26:06 usm1-gl1.example.com glusterd[12035]: ---------
    May 16 04:26:06 usm1-gl1.example.com systemd[1]: glusterd.service: main process exited, code=killed, status=6/ABRT
    May 16 04:26:06 usm1-gl1.example.com systemd[1]: Unit glusterd.service entered failed state.
    May 16 04:26:06 usm1-gl1.example.com systemd[1]: glusterd.service failed.

  # file /core.*
    /core.12035: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/glusterd', platform: 'x86_64'

  # gdb /usr/sbin/glusterfsd /core.12035 
    GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
    Copyright (C) 2013 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
    done.
    
    warning: core file may not match specified executable file.
    [New LWP 12039]
    [New LWP 12037]
    [New LWP 12038]
    [New LWP 12216]
    [New LWP 12217]
    [New LWP 12036]
    [New LWP 12040]
    [New LWP 12035]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
    Program terminated with signal 6, Aborted.
    #0  0x00007fc3d425c207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
    56	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
    Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 device-mapper-event-libs-1.02.146-4.el7.x86_64 device-mapper-libs-1.02.146-4.el7.x86_64 elfutils-libelf-0.170-4.el7.x86_64 elfutils-libs-0.170-4.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-12.el7.x86_64 libsepol-2.5-8.1.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 lvm2-libs-2.02.177-4.el7.x86_64 pcre-8.32-17.el7.x86_64 systemd-libs-219-57.el7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 xz-libs-5.2.2-1.el7.x86_64
    (gdb) t a a bt
    
    Thread 8 (Thread 0x7fc3d60e3780 (LWP 12035)):
    #0  0x00007fc3d4a5cf47 in pthread_join (threadid=140478813824768, thread_return=thread_return@entry=0x0) at pthread_join.c:92
    #1  0x00007fc3d5c5bb38 in event_dispatch_epoll (event_pool=0x563d5085fa30) at event-epoll.c:746
    #2  0x0000563d5023d2a7 in main (argc=5, argv=<optimized out>) at glusterfsd.c:2550
    
    Thread 7 (Thread 0x7fc3cb1f8700 (LWP 12040)):
    #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
    #1  0x00007fc3d5c38e88 in syncenv_task (proc=proc@entry=0x563d50867e50) at syncop.c:603
    #2  0x00007fc3d5c39d50 in syncenv_processor (thdata=0x563d50867e50) at syncop.c:695
    #3  0x00007fc3d4a5bdd5 in start_thread (arg=0x7fc3cb1f8700) at pthread_create.c:308
    #4  0x00007fc3d4324b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
    
    Thread 6 (Thread 0x7fc3cd1fc700 (LWP 12036)):
    #0  0x00007fc3d4a62eed in nanosleep () at ../sysdeps/unix/syscall-template.S:81
    #1  0x00007fc3d5c0b9f6 in gf_timer_proc (data=0x563d50867270) at timer.c:174
    #2  0x00007fc3d4a5bdd5 in start_thread (arg=0x7fc3cd1fc700) at pthread_create.c:308
    #3  0x00007fc3d4324b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
    
    Thread 5 (Thread 0x7fc3c5cbe700 (LWP 12217)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x00007fc3d5c369b3 in __synclock_lock (lock=lock@entry=0x7fc3d5f7b838) at syncop.c:935
    #2  0x00007fc3d5c3a066 in synclock_lock (lock=lock@entry=0x7fc3d5f7b838) at syncop.c:961
    #3  0x00007fc3ca686a4b in glusterd_big_locked_notify (rpc=0x7fc3c0004910, mydata=0x7fc3c0003810, event=RPC_CLNT_DISCONNECT, data=0x0, notify_fn=0x7fc3ca690830 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:69
    #4  0x00007fc3d59c542b in rpc_clnt_handle_disconnect (conn=0x7fc3c0004940, clnt=0x7fc3c0004910) at rpc-clnt.c:876
    #5  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fc3c0004940, event=<optimized out>, data=0x7fc3c0004b40) at rpc-clnt.c:939
    #6  0x00007fc3d59c1393 in rpc_transport_notify (this=this@entry=0x7fc3c0004b40, event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x7fc3c0004b40) at rpc-transport.c:538
    #7  0x00007fc3c78d2bdf in socket_event_poll_err (idx=<optimized out>, gen=<optimized out>, this=0x7fc3c0004b40) at socket.c:1206
    #8  socket_event_handler (fd=8, idx=<optimized out>, gen=<optimized out>, data=0x7fc3c0004b40, poll_in=<optimized out>, poll_out=0, poll_err=0) at socket.c:2476
    #9  0x00007fc3d5c5b504 in event_dispatch_epoll_handler (event=0x7fc3c5cbde80, event_pool=0x563d5085fa30) at event-epoll.c:583
    #10 event_dispatch_epoll_worker (data=0x563d508bb3f0) at event-epoll.c:659
    #11 0x00007fc3d4a5bdd5 in start_thread (arg=0x7fc3c5cbe700) at pthread_create.c:308
    #12 0x00007fc3d4324b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
    
    Thread 4 (Thread 0x7fc3c64bf700 (LWP 12216)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x00007fc3ca74604b in hooks_worker (args=<optimized out>) at glusterd-hooks.c:529
    #2  0x00007fc3d4a5bdd5 in start_thread (arg=0x7fc3c64bf700) at pthread_create.c:308
    #3  0x00007fc3d4324b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
    
    Thread 3 (Thread 0x7fc3cc1fa700 (LWP 12038)):
    #0  0x00007fc3d42eb4fd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
    #1  0x00007fc3d42eb394 in __sleep (seconds=0, seconds@entry=30) at ../sysdeps/unix/sysv/linux/sleep.c:137
    #2  0x00007fc3d5c2622d in pool_sweeper (arg=<optimized out>) at mem-pool.c:481
    #3  0x00007fc3d4a5bdd5 in start_thread (arg=0x7fc3cc1fa700) at pthread_create.c:308
    #4  0x00007fc3d4324b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
    
    Thread 2 (Thread 0x7fc3cc9fb700 (LWP 12037)):
    #0  0x00007fc3d4a63411 in do_sigwait (sig=0x7fc3cc9fae1c, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:61
    #1  __sigwait (set=set@entry=0x7fc3cc9fae20, sig=sig@entry=0x7fc3cc9fae1c) at ../sysdeps/unix/sysv/linux/sigwait.c:99
    #2  0x0000563d5024058b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2137
    #3  0x00007fc3d4a5bdd5 in start_thread (arg=0x7fc3cc9fb700) at pthread_create.c:308
    #4  0x00007fc3d4324b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
    
    ---Type <return> to continue, or q <return> to quit---
    Thread 1 (Thread 0x7fc3cb9f9700 (LWP 12039)):
    #0  0x00007fc3d425c207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
    #1  0x00007fc3d425d8f8 in __GI_abort () at abort.c:90
    #2  0x00007fc3d429ecc7 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fc3d43b0cf8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
    #3  0x00007fc3d42a7429 in malloc_printerr (ar_ptr=0x7fc3d45ec760 <main_arena>, ptr=<optimized out>, str=0x7fc3d43b0e00 "double free or corruption (out)", action=3) at malloc.c:5025
    #4  _int_free (av=0x7fc3d45ec760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3847
    #5  0x00007fc3d5bf47dd in data_destroy (data=<optimized out>) at dict.c:227
    #6  0x00007fc3d5bf5220 in dict_destroy (this=<optimized out>) at dict.c:589
    #7  0x00007fc3d5bf54fc in dict_unref (this=<optimized out>) at dict.c:648
    #8  0x00007fc3ca6849a6 in glusterd_print_gsync_status_by_vol (volinfo=<optimized out>, fp=0x7fc3c0013300) at glusterd-handler.c:5188
    #9  glusterd_get_state (dict=0x7fc3c000ab00, req=0x7fc3bc0018e0) at glusterd-handler.c:5883
    #10 __glusterd_handle_get_state (req=req@entry=0x7fc3bc0018e0) at glusterd-handler.c:5997
    #11 0x00007fc3ca686abe in glusterd_big_locked_handler (req=0x7fc3bc0018e0, actor_fn=0x7fc3ca683180 <__glusterd_handle_get_state>) at glusterd-handler.c:82
    #12 0x00007fc3d5c368b0 in synctask_wrap () at syncop.c:375
    #13 0x00007fc3d426dfc0 in ?? () from /lib64/libc.so.6
    #14 0x0000000000000000 in ?? ()
    (gdb) 
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  

Expected results:
  glusterd shouldn't crash

Comment 2 Sanju 2018-07-05 07:53:06 UTC

upstream patch: https://review.gluster.org/20461

Comment 3 Worker Ant 2018-07-06 06:10:39 UTC

COMMIT: https://review.gluster.org/20461 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: Fix glusterd crash

Problem: gluster get-state command is crashing glusterd process, when
geo-replication session is configured.

Cause: Crash is happening due to the double free of memory. In
glusterd_print_gsync_status_by_vol we are calling dict_unref(), which
will free all the keys and values in the dictionary. Before calling
dict_unref(), glusterd_print_gsync_status_by_vol is calling 
glusterd_print_gsync_status(). glusterd_print_gsync_status is freeing
up values in the dictionary and again when dict_unref() is called, it
tries to free up the values which are already freed.

Solution: Remove the code which will free the memory in
glusterd_print_gsync_status function.

Fixes: bz#1598345
Change-Id: Id3d8aae109f377b462bbbdb96a8e3c5f6b0be752
Signed-off-by: Sanju Rakonde <srakonde>

Comment 4 Worker Ant 2018-10-07 10:29:13 UTC

REVIEW: https://review.gluster.org/21359 (tests: add get-state command to test) posted (#1) for review on master by Sanju Rakonde

Comment 5 Worker Ant 2018-10-08 03:26:14 UTC

COMMIT: https://review.gluster.org/21359 committed in master by "Sanju Rakonde" <srakonde> with a commit message- tests: add get-state command to test

When geo-replication session is running, run
"gluster get-state" command to test.

https://review.gluster.org/#/c/glusterfs/+/20461/
patch fixes glusterd crash, when we run get-state
command with geo-rep session configured.
Adding the test now.

Fixes: bz#1598345
Change-Id: I56283fba2c782f83669923ddfa4af3400255fed6
Signed-off-by: Sanju Rakonde <srakonde>

Comment 6 Shyamsundar 2018-10-23 15:13:25 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 7 Shyamsundar 2019-03-25 16:30:27 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 8 Sanju 2019-09-26 12:37:40 UTC

*** Bug 1626043 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.