1652466 – [Glusterd]: Glusterd crash while expanding volumes using heketi

Bug 1652466 - [Glusterd]: Glusterd crash while expanding volumes using heketi

Summary: [Glusterd]: Glusterd crash while expanding volumes using heketi

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.z Batch Update 3
Assignee:	Atin Mukherjee
QA Contact:	Rochelle
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1655827
TreeView+	depends on / blocked

Reported:	2018-11-22 07:47 UTC by Rochelle
Modified:	2019-03-12 09:17 UTC (History)
CC List:	12 users (show)
Fixed In Version:	glusterfs-3.12.2-33
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1655827 (view as bug list)
Environment:
Last Closed:	2019-02-04 07:41:26 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Glusterfs-storage class (763 bytes, text/plain) 2018-11-23 07:16 UTC, Rochelle	no flags	Details
To create a pvc (421 bytes, text/plain) 2018-11-23 07:17 UTC, Rochelle	no flags	Details
master-config.yaml (5.85 KB, text/plain) 2018-11-23 07:17 UTC, Rochelle	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1651547	1	None	None	None	2024-09-18 00:49:36 UTC
Red Hat Product Errata	RHBA-2019:0263	0	None	None	None	2019-02-04 07:41:37 UTC

Internal Links: 1651547

Description Rochelle 2018-11-22 07:47:25 UTC

Description of problem:
=======================
Core generated:
--------------
-libs-2.02.180-10.el7_6.2.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 systemd-libs-219-62.el7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) t a a bt

Thread 8 (Thread 0x7f52674e5700 (LWP 4136)):
#0  0x00007f5270d4dd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5271f26178 in syncenv_task (proc=proc@entry=0x55c43087f630) at syncop.c:603
#2  0x00007f5271f27040 in syncenv_processor (thdata=0x55c43087f630) at syncop.c:695
#3  0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5270611ead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f5262210700 (LWP 4618)):
#0  0x00007f5270d4d965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5266a329bb in hooks_worker (args=<optimized out>) at glusterd-hooks.c:529
#2  0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5270611ead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f5267ce6700 (LWP 4135)):
#0  0x00007f5270d4dd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5271f26178 in syncenv_task (proc=proc@entry=0x55c43087f270) at syncop.c:603
#2  0x00007f5271f27040 in syncenv_processor (thdata=0x55c43087f270) at syncop.c:695
#3  0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5270611ead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f5268ce8700 (LWP 4133)):
#0  0x00007f5270d51361 in sigwait () from /lib64/libpthread.so.0
#1  0x000055c42f15052b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2137
#2  0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5270611ead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f52684e7700 (LWP 4134)):
#0  0x00007f52705d8e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f52705d8cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f5271f1350d in pool_sweeper (arg=<optimized out>) at mem-pool.c:481
#3  0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5270611ead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f52694e9700 (LWP 4132)):
#0  0x00007f5270d50e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f5271ef8c96 in gf_timer_proc (data=0x55c43087ea50) at timer.c:174
#2  0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5270611ead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f52723d0780 (LWP 4130)):
#0  0x00007f5270d4af47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f5271f48e78 in event_dispatch_epoll (event_pool=0x55c430877210) at event-epoll.c:746
#2  0x000055c42f14d247 in main (argc=5, argv=<optimized out>) at glusterfsd.c:2550

Thread 1 (Thread 0x7f5261a0f700 (LWP 4619)):
#0  0x00007f5271f1318d in __gf_free (free_ptr=0x7f524c00efe0) at mem-pool.c:315
#1  0x00007f5271ee18cd in data_destroy (data=<optimized out>) at dict.c:227
#2  0x00007f5271ee51a9 in dict_get_str (this=<optimized out>, key=<optimized out>, str=<optimized out>) at dict.c:2398
#3  0x00007f52669b0b3e in glusterd_volume_rebalance_use_rsp_dict (aggr=0x7f524c01b2b0, rsp_dict=0x7f525400b6b0) at glusterd-utils.c:10951
#4  0x00007f52669c3c0c in __glusterd_commit_op_cbk (req=req@entry=0x7f5254008160, iov=iov@entry=0x7f52540081a0, count=count@entry=1, myframe=myframe@entry=0x7f525400c200) at glusterd-rpc-ops.c:1443
#5  0x00007f52669c560a in glusterd_big_locked_cbk (req=0x7f5254008160, iov=0x7f52540081a0, count=1, myframe=0x7f525400c200, fn=0x7f52669c3590 <__glusterd_commit_op_cbk>) at glusterd-rpc-ops.c:223
#6  0x00007f5271cb2960 in rpc_clnt_handle_reply (clnt=clnt@entry=0x55c43094a9f0, pollin=pollin@entry=0x7f525400e210) at rpc-clnt.c:778
#7  0x00007f5271cb2d03 in rpc_clnt_notify (trans=<optimized out>, mydata=0x55c43094aa20, event=<optimized out>, data=0x7f525400e210) at rpc-clnt.c:971
#8  0x00007f5271caea73 in rpc_transport_notify (this=this@entry=0x55c43094ac20, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f525400e210) at rpc-transport.c:538
---Type <return> to continue, or q <return> to quit---wq
#9  0x00007f52639a6576 in socket_event_poll_in (this=this@entry=0x55c43094ac20, notify_handled=<optimized out>) at socket.c:2322
#10 0x00007f52639a8b1c in socket_event_handler (fd=14, idx=3, gen=1, data=0x55c43094ac20, poll_in=1, poll_out=0, poll_err=0) at socket.c:2474
#11 0x00007f5271f48844 in event_dispatch_epoll_handler (event=0x7f5261a0ee80, event_pool=0x55c430877210) at event-epoll.c:583
#12 event_dispatch_epoll_worker (data=0x55c43094a350) at event-epoll.c:659
#13 0x00007f5270d49dd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f5270611ead in clone () from /lib64/libc.so.6
(gdb) quit



glusterd log file:
------------------
2018-11-22 07:27:47.174562] W [glusterd-locks.c:622:glusterd_mgmt_v3_lock] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xeb67f) [0x7f5266a3967f] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.
so(+0xeacbd) [0x7f5266a38cbd] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe9d06) [0x7f5266a37d06] ) 0-management: Lock for vol_app-storage_newslave_7c882699-ee27-11e8-b813-52540018d110 held by 8fb5
f147-402f-4e20-9d62-2dd35c48ae59
[2018-11-22 07:27:47.174583] E [MSGID: 106119] [glusterd-locks.c:430:glusterd_mgmt_v3_lock_entity] 0-management: Failed to acquire lock for vol vol_app-storage_newslave_7c882699-ee27-11e8-b813-52540018d110 on be
half of f76f399b-cc3a-4f12-80b8-39b1794836e9.
[2018-11-22 07:27:47.174597] E [MSGID: 106146] [glusterd-locks.c:524:glusterd_multiple_mgmt_v3_lock] 0-management: Unable to lock all vol
[2018-11-22 07:27:47.174610] E [MSGID: 106119] [glusterd-mgmt.c:721:glusterd_mgmt_v3_initiate_lockdown] 0-management: Failed to acquire mgmt_v3 locks on localhost
[2018-11-22 07:27:47.174624] E [MSGID: 106120] [glusterd-mgmt.c:2188:glusterd_mgmt_v3_initiate_all_phases] 0-management: mgmt_v3 lockdown failed.
[2018-11-22 07:27:52.069017] I [glusterd-locks.c:732:gd_mgmt_v3_unlock_timer_cbk] 0-management: In gd_mgmt_v3_unlock_timer_cbk
The message "I [MSGID: 106495] [glusterd-handler.c:3152:__glusterd_handle_getwd] 0-glusterd: Received getwd req" repeated 31 times between [2018-11-22 07:27:35.357344] and [2018-11-22 07:29:28.920574]
[2018-11-22 07:29:34.611260] I [MSGID: 106495] [glusterd-handler.c:3152:__glusterd_handle_getwd] 0-glusterd: Received getwd req
[2018-11-22 07:29:46.927667] I [MSGID: 106482] [glusterd-brick-ops.c:448:__glusterd_handle_add_brick] 0-management: Received add brick req
[2018-11-22 07:29:47.065273] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a445) [0x7f5266988445] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe44bd) [0x7f5266
a324bd] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f5271f3a225] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=vol_app-storage_newslave_7c
882699-ee27-11e8-b813-52540018d110 --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd
[2018-11-22 07:29:47.065375] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it
[2018-11-22 07:29:47.088692] I [glusterd-utils.c:6327:glusterd_brick_start] 0-management: starting a fresh brick process for brick /var/lib/heketi/mounts/vg_d0dc6c236d9c156b03a03b103a2796f9/brick_fb0c0d7df83567f
eb320da855cef32dd/brick
[2018-11-22 07:29:47.131032] I [MSGID: 106143] [glusterd-pmap.c:282:pmap_registry_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_d0dc6c236d9c156b03a03b103a2796f9/brick_fb0c0d7df83567feb320da855cef32dd/bric
k on port 49156



Version-Release number of selected component (if applicable):
=============================================================
[root@dhcp35-226 /]# rpm -qa | grep gluster
glusterfs-libs-3.12.2-27.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-27.el7rhgs.x86_64
gluster-block-0.2.1-28.el7rhgs.x86_64
glusterfs-cli-3.12.2-27.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-client-xlators-3.12.2-27.el7rhgs.x86_64
glusterfs-server-3.12.2-27.el7rhgs.x86_64
glusterfs-rdma-3.12.2-27.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-fuse-3.12.2-27.el7rhgs.x86_64
python2-gluster-3.12.2-27.el7rhgs.x86_64
glusterfs-debuginfo-3.12.2-27.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.2.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
glusterfs-api-3.12.2-27.el7rhgs.x86_64
glusterfs-3.12.2-27.el7rhgs.x86_64




Steps to Reproduce:
===================
1. Create a pvc - 1x3
2. Edit the pvc to expand it from 1x3 to 2x3
3. Edit the pvc again to expand it from 2x3 to 3x3 --> failed


Actual results:
==============
Glusterd crashed on one node, volume expansion failed, core was generated


Expected results:
================
There should be no crash and volume expansion should succeed 


Will attach sosreports.

Comment 13 Rochelle 2018-11-23 07:16:52 UTC

Created attachment 1508217 [details]
Glusterfs-storage class

Comment 14 Rochelle 2018-11-23 07:17:20 UTC

Created attachment 1508218 [details]
To create a pvc

Comment 15 Rochelle 2018-11-23 07:17:56 UTC

Created attachment 1508219 [details]
master-config.yaml

Comment 22 Sanju 2018-12-04 11:19:44 UTC

upstream patch: https://review.gluster.org/#/c/glusterfs/+/21762/

Comment 26 Sanju 2018-12-18 09:36:01 UTC

downstream patch: https://code.engineering.redhat.com/gerrit/158917

Comment 27 Atin Mukherjee 2018-12-18 12:31:24 UTC

(In reply to Sanju from comment #26)
> downstream patch: https://code.engineering.redhat.com/gerrit/158917

One more downstream only patch https://code.engineering.redhat.com/gerrit/#/c/158943/ required.

Comment 50 errata-xmlrpc 2019-02-04 07:41:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0263

Note You need to log in before you can comment on or make changes to this bug.