Bug 1655827 - [Glusterd]: Glusterd crash while expanding volumes using heketi
Summary: [Glusterd]: Glusterd crash while expanding volumes using heketi
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Sanju
QA Contact:
URL:
Whiteboard:
Depends On: 1652466
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-04 03:24 UTC by Sanju
Modified: 2020-01-09 17:21 UTC (History)
12 users (show)

Fixed In Version: glusterfs-6.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1652466
Environment:
Last Closed: 2019-03-25 16:32:19 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 21762 0 None Merged glusterd: migrating rebalance commands to mgmt_v3 framework 2018-12-18 04:02:41 UTC

Description Sanju 2018-12-04 03:24:41 UTC
+++ This bug was initially created as a clone of Bug #1652466 +++

Description of problem:
=======================
Core generated:
--------------
-libs-2.02.180-10.el7_6.2.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 systemd-libs-219-62.el7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) t a a bt

Thread 8 (Thread 0x7f52674e5700 (LWP 4136)):
#0  0x00007f5270d4dd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5271f26178 in syncenv_task (proc=proc@entry=0x55c43087f630) at syncop.c:603
#2  0x00007f5271f27040 in syncenv_processor (thdata=0x55c43087f630) at syncop.c:695
#3  0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5270611ead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f5262210700 (LWP 4618)):
#0  0x00007f5270d4d965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5266a329bb in hooks_worker (args=<optimized out>) at glusterd-hooks.c:529
#2  0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5270611ead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f5267ce6700 (LWP 4135)):
#0  0x00007f5270d4dd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5271f26178 in syncenv_task (proc=proc@entry=0x55c43087f270) at syncop.c:603
#2  0x00007f5271f27040 in syncenv_processor (thdata=0x55c43087f270) at syncop.c:695
#3  0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5270611ead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f5268ce8700 (LWP 4133)):
#0  0x00007f5270d51361 in sigwait () from /lib64/libpthread.so.0
#1  0x000055c42f15052b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2137
#2  0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5270611ead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f52684e7700 (LWP 4134)):
#0  0x00007f52705d8e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f52705d8cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f5271f1350d in pool_sweeper (arg=<optimized out>) at mem-pool.c:481
#3  0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5270611ead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f52694e9700 (LWP 4132)):
#0  0x00007f5270d50e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f5271ef8c96 in gf_timer_proc (data=0x55c43087ea50) at timer.c:174
#2  0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5270611ead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f52723d0780 (LWP 4130)):
#0  0x00007f5270d4af47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f5271f48e78 in event_dispatch_epoll (event_pool=0x55c430877210) at event-epoll.c:746
#2  0x000055c42f14d247 in main (argc=5, argv=<optimized out>) at glusterfsd.c:2550

Thread 1 (Thread 0x7f5261a0f700 (LWP 4619)):
#0  0x00007f5271f1318d in __gf_free (free_ptr=0x7f524c00efe0) at mem-pool.c:315
#1  0x00007f5271ee18cd in data_destroy (data=<optimized out>) at dict.c:227
#2  0x00007f5271ee51a9 in dict_get_str (this=<optimized out>, key=<optimized out>, str=<optimized out>) at dict.c:2398
#3  0x00007f52669b0b3e in glusterd_volume_rebalance_use_rsp_dict (aggr=0x7f524c01b2b0, rsp_dict=0x7f525400b6b0) at glusterd-utils.c:10951
#4  0x00007f52669c3c0c in __glusterd_commit_op_cbk (req=req@entry=0x7f5254008160, iov=iov@entry=0x7f52540081a0, count=count@entry=1, myframe=myframe@entry=0x7f525400c200) at glusterd-rpc-ops.c:1443
#5  0x00007f52669c560a in glusterd_big_locked_cbk (req=0x7f5254008160, iov=0x7f52540081a0, count=1, myframe=0x7f525400c200, fn=0x7f52669c3590 <__glusterd_commit_op_cbk>) at glusterd-rpc-ops.c:223
#6  0x00007f5271cb2960 in rpc_clnt_handle_reply (clnt=clnt@entry=0x55c43094a9f0, pollin=pollin@entry=0x7f525400e210) at rpc-clnt.c:778
#7  0x00007f5271cb2d03 in rpc_clnt_notify (trans=<optimized out>, mydata=0x55c43094aa20, event=<optimized out>, data=0x7f525400e210) at rpc-clnt.c:971
#8  0x00007f5271caea73 in rpc_transport_notify (this=this@entry=0x55c43094ac20, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f525400e210) at rpc-transport.c:538
---Type <return> to continue, or q <return> to quit---wq
#9  0x00007f52639a6576 in socket_event_poll_in (this=this@entry=0x55c43094ac20, notify_handled=<optimized out>) at socket.c:2322
#10 0x00007f52639a8b1c in socket_event_handler (fd=14, idx=3, gen=1, data=0x55c43094ac20, poll_in=1, poll_out=0, poll_err=0) at socket.c:2474
#11 0x00007f5271f48844 in event_dispatch_epoll_handler (event=0x7f5261a0ee80, event_pool=0x55c430877210) at event-epoll.c:583
#12 event_dispatch_epoll_worker (data=0x55c43094a350) at event-epoll.c:659
#13 0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f5270611ead in clone () from /lib64/libc.so.6
(gdb) quit



glusterd log file:
------------------
2018-11-22 07:27:47.174562] W [glusterd-locks.c:622:glusterd_mgmt_v3_lock] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xeb67f) [0x7f5266a3967f] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.
so(+0xeacbd) [0x7f5266a38cbd] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe9d06) [0x7f5266a37d06] ) 0-management: Lock for vol_app-storage_newslave_7c882699-ee27-11e8-b813-52540018d110 held by 8fb5
f147-402f-4e20-9d62-2dd35c48ae59
[2018-11-22 07:27:47.174583] E [MSGID: 106119] [glusterd-locks.c:430:glusterd_mgmt_v3_lock_entity] 0-management: Failed to acquire lock for vol vol_app-storage_newslave_7c882699-ee27-11e8-b813-52540018d110 on be
half of f76f399b-cc3a-4f12-80b8-39b1794836e9.
[2018-11-22 07:27:47.174597] E [MSGID: 106146] [glusterd-locks.c:524:glusterd_multiple_mgmt_v3_lock] 0-management: Unable to lock all vol
[2018-11-22 07:27:47.174610] E [MSGID: 106119] [glusterd-mgmt.c:721:glusterd_mgmt_v3_initiate_lockdown] 0-management: Failed to acquire mgmt_v3 locks on localhost
[2018-11-22 07:27:47.174624] E [MSGID: 106120] [glusterd-mgmt.c:2188:glusterd_mgmt_v3_initiate_all_phases] 0-management: mgmt_v3 lockdown failed.
[2018-11-22 07:27:52.069017] I [glusterd-locks.c:732:gd_mgmt_v3_unlock_timer_cbk] 0-management: In gd_mgmt_v3_unlock_timer_cbk
The message "I [MSGID: 106495] [glusterd-handler.c:3152:__glusterd_handle_getwd] 0-glusterd: Received getwd req" repeated 31 times between [2018-11-22 07:27:35.357344] and [2018-11-22 07:29:28.920574]
[2018-11-22 07:29:34.611260] I [MSGID: 106495] [glusterd-handler.c:3152:__glusterd_handle_getwd] 0-glusterd: Received getwd req
[2018-11-22 07:29:46.927667] I [MSGID: 106482] [glusterd-brick-ops.c:448:__glusterd_handle_add_brick] 0-management: Received add brick req
[2018-11-22 07:29:47.065273] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a445) [0x7f5266988445] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe44bd) [0x7f5266
a324bd] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f5271f3a225] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=vol_app-storage_newslave_7c
882699-ee27-11e8-b813-52540018d110 --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd
[2018-11-22 07:29:47.065375] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it
[2018-11-22 07:29:47.088692] I [glusterd-utils.c:6327:glusterd_brick_start] 0-management: starting a fresh brick process for brick /var/lib/heketi/mounts/vg_d0dc6c236d9c156b03a03b103a2796f9/brick_fb0c0d7df83567f
eb320da855cef32dd/brick
[2018-11-22 07:29:47.131032] I [MSGID: 106143] [glusterd-pmap.c:282:pmap_registry_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_d0dc6c236d9c156b03a03b103a2796f9/brick_fb0c0d7df83567feb320da855cef32dd/bric
k on port 49156



Version-Release number of selected component (if applicable):
=============================================================
[root@dhcp35-226 /]# rpm -qa | grep gluster
glusterfs-libs-3.12.2-27.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-27.el7rhgs.x86_64
gluster-block-0.2.1-28.el7rhgs.x86_64
glusterfs-cli-3.12.2-27.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-client-xlators-3.12.2-27.el7rhgs.x86_64
glusterfs-server-3.12.2-27.el7rhgs.x86_64
glusterfs-rdma-3.12.2-27.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-fuse-3.12.2-27.el7rhgs.x86_64
python2-gluster-3.12.2-27.el7rhgs.x86_64
glusterfs-debuginfo-3.12.2-27.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.2.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
glusterfs-api-3.12.2-27.el7rhgs.x86_64
glusterfs-3.12.2-27.el7rhgs.x86_64




Steps to Reproduce:
===================
1. Create a pvc - 1x3
2. Edit the pvc to expand it from 1x3 to 2x3
3. Edit the pvc again to expand it from 2x3 to 3x3 --> failed


Actual results:
==============
Glusterd crashed on one node, volume expansion failed, core was generated


Expected results:
================
There should be no crash and volume expansion should succeed 


Will attach sosreports.


--- Additional comment from Sanju on 2018-11-22 15:37:49 IST ---

Rochelle,

Can you please let us know the time of crash? That will really helps us to match with the logs.

Thanks,
Sanju

--- Additional comment from Rochelle on 2018-11-22 15:56:29 IST ---

Hi Sanju,

The time of the crash was : Nov 22 13:00

Additional info :
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f5271f1318d in __gf_free (free_ptr=0x7f524c00efe0) at mem-pool.c:315
315	        GF_ASSERT (GF_MEM_TRAILER_MAGIC ==


Thanks,
Rochelle

--- Additional comment from Atin Mukherjee on 2018-11-22 17:54:12 IST ---

Nithya has agreed to take a look at the core and we have asked the system details to be passed over an email.

--- Additional comment from Sweta Anandpara on 2018-11-22 18:25:42 IST ---

Hey Karthick, have we seen/hit something like this in 3.11.1 testing so far?

--- Additional comment from krishnaram Karthick on 2018-11-22 23:33:11 IST ---

(In reply to Sweta Anandpara from comment #6)
> Hey Karthick, have we seen/hit something like this in 3.11.1 testing so far?

The expansion tests are not yet run on 3.11.1. if we had hit, there would be a bug already :)

--- Additional comment from Nithya Balachandran on 2018-11-23 09:37:11 IST ---

<snip>

Steps to Reproduce:
===================
1. Create a pvc - 1x3
2. Edit the pvc to expand it from 1x3 to 2x3
3. Edit the pvc again to expand it from 2x3 to 3x3 --> failed

</snip>


What were the exact commands run?
What errors do you see on the terminal?

--- Additional comment from Rochelle on 2018-11-23 11:01:41 IST ---

Hi Nithya, 

Yes these were the commands that were run. 

I executed these with oc create commands. 
The pvc was getting created and the first expansion happened without an issue. (1x3 -> 2x3)
There were no errors seen on the terminal. 
When I checked gluster peer status I saw that glusterd crashed on one of the nodes in use.

Also Nithya, I will need my system as early as possible since I have only one CRS set up and will be blocked until then. 

Regards,
Rochelle

--- Additional comment from Nithya Balachandran on 2018-11-23 11:59:16 IST ---

(In reply to Rochelle from comment #9)
> Hi Nithya, 
> 
> Yes these were the commands that were run. 
> 
> I executed these with oc create commands. 
> The pvc was getting created and the first expansion happened without an
> issue. (1x3 -> 2x3)
> There were no errors seen on the terminal. 
> When I checked gluster peer status I saw that glusterd crashed on one of the
> nodes in use.
> 
> Also Nithya, I will need my system as early as possible since I have only
> one CRS set up and will be blocked until then. 
> 
> Regards,
> Rochelle

I mean what were the exact commands run - the description only states what was done.

--- Additional comment from Nithya Balachandran on 2018-11-23 12:01:19 IST ---


> Also Nithya, I will need my system as early as possible since I have only
> one CRS set up and will be blocked until then. 
> 


As long as the server is not updated with a newer build and the core is not deleted, you can continue your tests. My looking at the core should not prevent the tests from being run.

--- Additional comment from Rochelle on 2018-11-23 12:44:49 IST ---

(In reply to Nithya Balachandran from comment #10)
> (In reply to Rochelle from comment #9)
> > Hi Nithya, 
> > 
> > Yes these were the commands that were run. 
> > 
> > I executed these with oc create commands. 
> > The pvc was getting created and the first expansion happened without an
> > issue. (1x3 -> 2x3)
> > There were no errors seen on the terminal. 
> > When I checked gluster peer status I saw that glusterd crashed on one of the
> > nodes in use.
> > 
> > Also Nithya, I will need my system as early as possible since I have only
> > one CRS set up and will be blocked until then. 
> > 
> > Regards,
> > Rochelle
> 
> I mean what were the exact commands run - the description only states what
> was done.


For a pvc we need a storage class. There was already a storage class present in my system that I was using. (I will add the attachment - glusterfs-storageClass.sh script)

For creating a pvc, there's a script (attached - pvc-create.sh)


For expanding a pvc, some prerequisites :
1. get into /etc/origin/master/master-config on the OpenShift master
2. If the feature gates ExpandPersistentVolumes, and the admissionconfig PersistentVolumeClaimResize are not enabled, then edit the master.conf file 
(I will attach this file too)
3. restart Openshift master: 

 /usr/local/bin/master-restart api; /usr/local/bin/master-restart controllers


Steps 1-3 are done only in the beginning and dont have to be done everytime the volume is to be expanded. 



I ran this : sh pvc-create.sh slave 10

This created a 10 GB volume. 
For expansion I have to edit the pvc: (changed the storage to 15Gi)

oc edit pvc slave  --> opens the file and I edit the storage field under spec as shown :
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 15Gi 

When I edited the file again, editing was successful but it failed to expand the volume

--- Additional comment from Rochelle on 2018-11-23 12:46 IST ---



--- Additional comment from Rochelle on 2018-11-23 12:47 IST ---



--- Additional comment from Rochelle on 2018-11-23 12:47 IST ---



--- Additional comment from Rochelle on 2018-11-23 12:48:52 IST ---

(In reply to Nithya Balachandran from comment #11)
> > Also Nithya, I will need my system as early as possible since I have only
> > one CRS set up and will be blocked until then. 
> > 
> 
> 
> As long as the server is not updated with a newer build and the core is not
> deleted, you can continue your tests. My looking at the core should not
> prevent the tests from being run.

Noted Nithya, I'll proceed then.
Thank you.

--- Additional comment from Nithya Balachandran on 2018-11-23 14:42:26 IST ---

Initial RCA based on my understanding of glusterd:
--------------------------------------------------

(gdb) bt
#0  0x00007f5271f1318d in __gf_free (free_ptr=0x7f524c00efe0) at mem-pool.c:315
#1  0x00007f5271ee18cd in data_destroy (data=<optimized out>) at dict.c:227
#2  0x00007f5271ee51a9 in dict_get_str (this=<optimized out>, key=<optimized out>, str=<optimized out>) at dict.c:2398
#3  0x00007f52669b0b3e in glusterd_volume_rebalance_use_rsp_dict (aggr=0x7f524c01b2b0, rsp_dict=0x7f525400b6b0) at glusterd-utils.c:10951
#4  0x00007f52669c3c0c in __glusterd_commit_op_cbk (req=req@entry=0x7f5254008160, iov=iov@entry=0x7f52540081a0, count=count@entry=1, 
    myframe=myframe@entry=0x7f525400c200) at glusterd-rpc-ops.c:1443
#5  0x00007f52669c560a in glusterd_big_locked_cbk (req=0x7f5254008160, iov=0x7f52540081a0, count=1, myframe=0x7f525400c200, 
    fn=0x7f52669c3590 <__glusterd_commit_op_cbk>) at glusterd-rpc-ops.c:223
#6  0x00007f5271cb2960 in rpc_clnt_handle_reply (clnt=clnt@entry=0x55c43094a9f0, pollin=pollin@entry=0x7f525400e210) at rpc-clnt.c:778
#7  0x00007f5271cb2d03 in rpc_clnt_notify (trans=<optimized out>, mydata=0x55c43094aa20, event=<optimized out>, data=0x7f525400e210) at rpc-clnt.c:971
#8  0x00007f5271caea73 in rpc_transport_notify (this=this@entry=0x55c43094ac20, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f525400e210)
    at rpc-transport.c:538
#9  0x00007f52639a6576 in socket_event_poll_in (this=this@entry=0x55c43094ac20, notify_handled=<optimized out>) at socket.c:2322
#10 0x00007f52639a8b1c in socket_event_handler (fd=14, idx=3, gen=1, data=0x55c43094ac20, poll_in=1, poll_out=0, poll_err=0) at socket.c:2474
#11 0x00007f5271f48844 in event_dispatch_epoll_handler (event=0x7f5261a0ee80, event_pool=0x55c430877210) at event-epoll.c:583
#12 event_dispatch_epoll_worker (data=0x55c43094a350) at event-epoll.c:659
#13 0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f5270611ead in clone () from /lib64/libc.so.6


(gdb) f 0
#0  0x00007f5271f1318d in __gf_free (free_ptr=0x7f524c00efe0) at mem-pool.c:315
315	        GF_ASSERT (GF_MEM_TRAILER_MAGIC ==
(gdb) 


(gdb) p  *(uint32_t *)((char *)free_ptr + header->size)
Cannot access memory at address 0xfea49800fb10

(gdb) p/x *header
$15 = {type = 0x4c01ab70, size = 0x7f524c000b30, mem_acct = 0x55c430893d60, magic = 0xcafebabe, padding = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 
    0x0}}
(gdb) 

Size and type are wrong. mem_acct seems ok. The incorrect value for size causes the process to try to access invalid memory and crash.


(gdb) f 2
#2  0x00007f5271ee51a9 in dict_get_str (this=<optimized out>, key=<optimized out>, str=<optimized out>) at dict.c:2398
2398	                data_unref (data);
(gdb) p/x *data
$35 = {is_static = 0x0, is_const = 0x0, len = 0xbabababa, data = 0x7f524c00efe0, refcount = 0x0, lock = {spinlock = 0x0, mutex = {__data = {__lock = 0x0, __count = 0x0, __owner = 0x0, __nusers = 0x0, __kind = 0xffffffff, __spins = 0x0, __elision = 0x0, __list = {__prev = 0x0, __next = 0x0}}, __size = {  0x0 <repeats 16 times>, 0xff, 0xff, 0xff, 0xff, 0x0 <repeats 20 times>}, __align = 0x0}}}
(gdb) 

data->len is 0xbabababa amd data->refcount = 0. This means that data has already been freed. As this was accessed after taking a ref, this should not happen unless the data was ref-ed after it was already destroyed.
As dict_get_str does not use data->len, it crashes only when trying to free data.

(gdb) x/s data->data
0x7f524c00efe0:	"vol_app-storage_slave_0eff73a0-ed6d-11e8-b813-52540018d110"

It looks like data has originally held a volume name. However, the latest messages in glusterd.log refer to a different volume (vol_app-storage_newslave_7c882699-ee27-11e8-b813-52540018d110). I do not know if they map to the pvc expansion operations performed.


[2018-11-22 07:29:49.339801] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe49fa) [0x7f5266a329fa] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe44bd) [0x7f5266a324bd] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f5271f3a225] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/post/S13create-subdir-mounts.sh --volname=vol_app-storage_newslave_7c882699-ee27-11e8-b813-52540018d110 --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd
[2018-11-22 07:29:50.171157] I [MSGID: 106539] [glusterd-utils.c:12168:glusterd_generate_and_set_task_id] 0-management: Generated task-id 7d94b8da-9b2c-4430-a8fd-6e1c0117865c for key rebalance-id
[2018-11-22 07:29:55.185936] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
The message "I [MSGID: 106495] [glusterd-handler.c:3152:__glusterd_handle_getwd] 0-glusterd: Received getwd req" repeated 4 times between [2018-11-22 07:29:34.611260] and [2018-11-22 07:29:55.195696]



Getting back to the core:

 

(gdb) f 3
#3  0x00007f52669b0b3e in glusterd_volume_rebalance_use_rsp_dict (aggr=0x7f524c01b2b0, rsp_dict=0x7f525400b6b0) at glusterd-utils.c:10951
10951	        ret = dict_get_str (ctx_dict, "volname", &volname);
(gdb) p *ctx_dict
$37 = {is_static = 0 '\000', hash_size = 1, count = 6, refcount = 0, members = 0x7f524c01b308, members_list = 0x7f524c050f30, extra_free = 0x0, extra_stdfree = 0x0,   lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __elision = 0, __list = {__prev = 0x0,  __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}}, members_internal = 0x7f524c050f30, 
  free_pair = {hash_next = 0x0, prev = 0x7f524c01da40, next = 0x0, value = 0x7f524c00f660, key = 0x7f524c004d20 "originator_uuid", key_hash = 1606637616}, 
  free_pair_in_use = _gf_true, max_count = 6}



It looks like ctx_dict has already been freed (refcount is 0) which is what caused the process to crash. 


When I checked on my test setup, the ctx_dict (same as aggr) for a rebalance start command shows me the following:


(gdb) pdict aggr
[1](rebalance-id::e63b846e-e6f0-4050-a589-958c202383dd)
[2](vol-id::03a6c664-85f9-4d28-a412-278cf172f9be)
[3](commit-hash::3753705314)
[4](originator_uuid::#ced6d6f6e648a290f9cbf2caed)
[5](transaction_id::98478d0364045c494d878ad57f8a2)
[6](node-uuid::#ced6d6f6e648a290f9cbf2caed)
[7](volname::vol1)
[8](rebalance-command::1)
[9](cmd-str::v rebalance vol1 start)
Done



Checking ctx_dict for clues just in case the data has not been overwritten:

(gdb) pdict ctx_dict
[1](cmd-str::)
[2](cmd::)
[3](volname::)
[4](sync-mgmt-operation::)
[5](transaction_id::)
[6](originator_uuid::)
Done

Checking the cmd-str:


(gdb) p *ctx_dict->members_list
$96 = {hash_next = 0x7f524c0128e0, prev = 0x0, next = 0x7f524c0128e0, value = 0x7f524c00f480, key = 0x7f524c04e110 "cmd-str", key_hash = 3482475205}
(gdb) p *(data_t*)0x7f524c00f480
$97 = {is_static = 0 '\000', is_const = 0 '\000', len = -1162167622, 
  data = 0x7f524c01abc0 "volume status vol_app-storage_slave_0eff73a0-ed6d-11e8-b813-52540018d110 detail", refcount = 0, lock = {spinlock = 0, mutex = {__data = {
        __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
      __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}}}
(gdb) 

This maps to the command:
"volume status vol_app-storage_slave_0eff73a0-ed6d-11e8-b813-52540018d110 detail"

So the ctx_dict does not map to a rebalance command.


From the glusterd.log just before the crash:
[2018-11-22 07:30:00.241009] E [glusterd-rpc-ops.c:1431:__glusterd_commit_op_cbk] (-->/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90) [0x7f5271cb2960] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x7760a) [0x7f52669c560a] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x759d3) [0x7f52669c39d3] ) 0-: Assertion failed: rsp.op == txn_op_info.op


Checking the rsp.op and txn_op_info.op:


(gdb) p/d GD_OP_REBALANCE
$75 = 19
(gdb) p rsp.op
$76 = 19
(gdb) p txn_op_info.op
$77 = 18
(gdb) p/d GD_OP_STATUS_VOLUME
$81 = 18


The txn_op_info clearly does not match the request being handled. This confirms that glusterd is operating on the wrong txn info.


(gdb) f 4
#4  0x00007f52669c3c0c in __glusterd_commit_op_cbk (req=req@entry=0x7f5254008160, iov=iov@entry=0x7f52540081a0, count=count@entry=1, 
    myframe=myframe@entry=0x7f525400c200) at glusterd-rpc-ops.c:1443
1443	
(gdb) p/x *txn_id
$39 = {0x44, 0xb6, 0x69, 0xfd, 0x53, 0x55, 0x4a, 0x49, 0xbd, 0xba, 0x98, 0x55, 0x15, 0x6e, 0x1e, 0x9}
(gdb) 



Checking the value of dict:


(gdb) f 4
#4  0x00007f52669c3c0c in __glusterd_commit_op_cbk (req=req@entry=0x7f5254008160, iov=iov@entry=0x7f52540081a0, count=count@entry=1, 
    myframe=myframe@entry=0x7f525400c200) at glusterd-rpc-ops.c:1443
1443	                        ret = glusterd_volume_rebalance_use_rsp_dict (txn_op_info.op_ctx, dict);

(gdb) p *dict
$71 = {is_static = 0 '\000', hash_size = 1, count = 1, refcount = 1, members = 0x7f525400b708, members_list = 0x7f525400b710, extra_free = 0x0, 
  extra_stdfree = 0x7f5254004a70 "", lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, 
        __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, members_internal = 0x7f525400b710, free_pair = {
    hash_next = 0x0, prev = 0x0, next = 0x0, value = 0x7f5254025e60, key = 0x7f5254011430 "transaction_id", key_hash = 3395357142}, free_pair_in_use = _gf_true, 
  max_count = 1}

The only data is the transaction_id.

(gdb) p/x *(uuid_t*)(dict->members_list->value->data)
$82 = {0x44, 0xb6, 0x69, 0xfd, 0x53, 0x55, 0x4a, 0x49, 0xbd, 0xba, 0x98, 0x55, 0x15, 0x6e, 0x1e, 0x9}


This matches the txn_id from the frame->cookie in __glusterd_commit_op_cbk.




Checking the value of transaction-id in ctx_info just in case (this may not be correct if the data struct was reused so please don't pay too much attention to this)


(gdb) f 3
#3  0x00007f52669b0b3e in glusterd_volume_rebalance_use_rsp_dict (aggr=0x7f524c01b2b0, rsp_dict=0x7f525400b6b0) at glusterd-utils.c:10951
10951	        ret = dict_get_str (ctx_dict, "volname", &volname);
(gdb) p * aggr->members_internal->next->next->next->next
$61 = {hash_next = 0x7f524c01b310, prev = 0x7f524c053d20, next = 0x7f524c01b310, value = 0x7f524c024ee0, key = 0x7f524c00e280 "transaction_id", key_hash = 3395357142}
(gdb) p/x *(uuid_t*)0x7f524c024ee0
$62 = {0x0, 0x0, 0x0, 0x0, 0xba, 0xba, 0xba, 0xba, 0xa0, 0x71, 0x1, 0x4c, 0x52, 0x7f, 0x0, 0x0}
(gdb) p/x *(data_t*)0x7f524c024ee0
$63 = {is_static = 0x0, is_const = 0x0, len = 0xbabababa, data = 0x7f524c0171a0, refcount = 0x0, lock = {spinlock = 0x0, mutex = {__data = {__lock = 0x0, 
        __count = 0x0, __owner = 0x0, __nusers = 0x0, __kind = 0xffffffff, __spins = 0x0, __elision = 0x0, __list = {__prev = 0x0, __next = 0x0}}, __size = {
        0x0 <repeats 16 times>, 0xff, 0xff, 0xff, 0xff, 0x0 <repeats 20 times>}, __align = 0x0}}}
(gdb) p/x *(uuid_t*)0x7f524c0171a0,
A syntax error in expression, near `'.
(gdb) p/x *(uuid_t*)0x7f524c0171a0
$64 = {0x3d, 0x52, 0x6f, 0xb2, 0x40, 0xbb, 0x41, 0x13, 0xb6, 0xc9, 0xae, 0x77, 0x5, 0x55, 0x56, 0xa8}


The value in ctx_dict does not match the txn_id for the rebalance operation.


The glusterd team needs to see why glusterd is using the wrong txn info here. Assigning this to Atin for now.

--- Additional comment from Nithya Balachandran on 2018-11-23 19:09:23 IST ---

The following code in glusterd_clear_txn_opinfo () looks problematic :

        if (txn_op_info.op_ctx)                                                 
                dict_unref (txn_op_info.op_ctx);     <--- At this point the txn_op_info with txn_id is still present and accessible.                           
                                                                                
        dict_del(priv->glusterd_txn_opinfo, uuid_utoa (*txn_id));               
    


This means that a thread can access the txn_op_info but the op_ctx will have been freed by this time.


There is also no ref taken on txn_op_info or txn_op_info.op_ctx to prevent it from being deleted while it is being accessed.

--- Additional comment from Sweta Anandpara on 2018-11-28 10:51:14 IST ---

Subsequent test results have resulted in this issue only once till now. And hence it is decided to not propose it as a blocker for the current release - BU2. 

@Atin, could someone please take a look at why this is happening.. an initial rca would help us take an informed call.

--- Additional comment from Atin Mukherjee on 2018-11-28 12:04:31 IST ---

The initial RCA has been already captured and thanks to Nithya for that work. When two op-sm transactions collide for some reason one of the transaction overrides the other one and we end up processing a response with a different transaction id which was meant for rebalance but for volume status details. I'm still doing a thorough code reading to understand where this bug is, as of now no conclusive evidence. We'd definitely like to address this in next release either through (a) fixing this bug or (b) moving both rebalance and profile commands to move to mgmt_v3 framework.

Comment 1 Worker Ant 2018-12-04 03:28:37 UTC
REVIEW: https://review.gluster.org/21762 (glusterd: migrating rebalance commands to mgmt_v3 framework) posted (#2) for review on master by Sanju Rakonde

Comment 2 Worker Ant 2018-12-18 04:02:40 UTC
REVIEW: https://review.gluster.org/21762 (glusterd: migrating rebalance commands to mgmt_v3 framework) posted (#11) for review on master by Atin Mukherjee

Comment 3 Shyamsundar 2019-03-25 16:32:19 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.