Bug 1459895

Summary:	Brick Multiplexing: Gluster volume start force complains with command "Error : Request timed out" when there are multiple volumes
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	glusterd	Assignee:	Sanju <srakonde>
Status:	CLOSED ERRATA	QA Contact:	Rajesh Madaka <rmadaka>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.3	CC:	amukherj, rhinduja, rhs-bugs, rmadaka, sheggodu, storage-qa-internal, vbellur
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	brick-multiplexing
Fixed In Version:	glusterfs-3.12.2-1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1499509 1526373 (view as bug list)		Environment:
Last Closed:	2018-09-04 06:32:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1499509
Bug Blocks:	1503134, 1526373

Description Nag Pavan Chilakam 2017-06-08 13:20:15 UTC

Description of problem:
=====================
On a setup with hosting many similar volumes of same type, after a brick is brought down , a vol start force of the higher ascending order, will result in the command to "Error : Request timed out"



Version-Release number of selected component (if applicable):
=======
3.8.4-27

How reproducible:
=========
always

Steps to Reproduce:
==================
1.enabled brick mux, create about 25 volumes all 1x3  say test3_1,test3_2...test3_25
2.now kill the brick pid of b1 using kill -9
3.now start a volume using force, say vol start force test3_22(it must be in the later order of vol list)
4. It can  be noticed the command throws "Error : Request timed out"

Also the glusterd log shows  cleaning of dead brick during this command of vol start force, hence could be the reason for the delay


(note: my volumes are like test3_{31..61})


[2017-06-08 13:11:28.576259] I [MSGID: 106499] [glusterd-handler.c:4369:__glusterd_handle_status_volume] 0-management: Received status volume req for volume test3_47
[2017-06-08 13:11:32.144875] I [MSGID: 106062] [glusterd-volume-ops.c:2617:glusterd_op_start_volume] 0-management: Global dict not present.
[2017-06-08 13:11:47.148857] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick32/test3_32
[2017-06-08 13:12:02.153117] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick33/test3_33
[2017-06-08 13:12:17.157265] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick34/test3_34
[2017-06-08 13:12:32.161184] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick35/test3_35
[2017-06-08 13:12:47.164727] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick36/test3_36
[2017-06-08 13:13:02.168547] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick37/test3_37
[2017-06-08 13:13:17.172140] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick38/test3_38
[2017-06-08 13:13:32.175261] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick39/test3_39
[2017-06-08 13:13:47.178395] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick40/test3_40
[2017-06-08 13:14:02.181805] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick41/test3_41
[2017-06-08 13:14:17.186312] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick42/test3_42
[2017-06-08 13:14:32.189988] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick43/test3_43
[2017-06-08 13:14:47.193370] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick44/test3_44
[2017-06-08 13:15:02.197284] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick45/test3_45
[2017-06-08 13:15:17.200764] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick46/test3_46
[2017-06-08 13:15:32.204242] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick48/test3_48
[2017-06-08 13:15:47.207613] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick49/test3_49
[2017-06-08 13:16:02.211184] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick50/test3_50
[2017-06-08 13:16:17.215043] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick51/test3_51
[2017-06-08 13:16:32.218976] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick60/test3_60
[2017-06-08 13:16:47.222271] I [glusterd-utils.c:5229:find_compat_brick_in_vol] 0-management: cleaning up dead brick 10.70.35.45:/rhs/brick61/test3_61
[2017-06-08 13:16:47.222872] I [MSGID: 106144] [glusterd-pmap.c:377:pmap_registry_remove] 0-pmap: removing brick /rhs/brick47/test3_47 on port 49152
[2017-06-08 13:16:47.228031] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - dis


Actual results:


Expected results:


Additional info:

Comment 2 Atin Mukherjee 2017-06-09 06:24:22 UTC

This only happens when the brick process was killed with SIGKILL, not SIGTERM. My guess is given the brick process was killed with SIGKILL signal the signal handler wasn't invoked and hence the further cleanup wasn't triggered due to which we ended up with a stale socket file and this is the reason we see a constant series of stale disconnect (filed as a separate issue 1459900). Now looking at the process stack below thread 8 captures the stale disconnect where as thread 8 tries to attach to a brick which is again wrong as we should have spawned a brick given there are no bricks running. But this happened because glusterd didn't receive pmap_signout event as the brick was killed abruptly and for multiplexed bricks glusterd doesn't receive disconnect events the brickinfo status was never set to stopped.

Thread 8 (Thread 0x7f435db30700 (LWP 2934)):
#0  0x00007f4365ae1bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f4366cb5e13 in __synclock_lock (lock=lock@entry=0x7f4366fdd7f8) at syncop.c:927
#2  0x00007f4366cb89d6 in synclock_lock (lock=lock@entry=0x7f4366fdd7f8) at syncop.c:953
#3  0x00007f4361c4967b in glusterd_big_locked_notify (rpc=0x7f4358062f90, mydata=0x7f435800e6f0, 
    event=RPC_CLNT_DISCONNECT, data=0x0, notify_fn=0x7f4361c45f90 <__glusterd_brick_rpc_notify>)
    at glusterd-handler.c:68
#4  0x00007f4366a4cd2b in rpc_clnt_handle_disconnect (conn=0x7f4358062fc0, clnt=0x7f4358062f90) at rpc-clnt.c:876
#5  rpc_clnt_notify (trans=<optimized out>, mydata=0x7f4358062fc0, event=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-clnt.c:939
#6  0x00007f4366a49433 in rpc_transport_notify (this=this@entry=0x7f43581610a0, 
    event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x7f43581610a0) at rpc-transport.c:538
#7  0x00007f435f52942f in socket_event_poll_err (idx=<optimized out>, gen=<optimized out>, this=0x7f43581610a0)
    at socket.c:1206
#8  socket_event_handler (fd=6, idx=<optimized out>, gen=<optimized out>, data=0x7f43581610a0, 
    poll_in=<optimized out>, poll_out=4, poll_err=16) at socket.c:2476
#9  0x00007f4366cd4a9d in event_dispatch_epoll_handler (event=0x7f435db2fe90, event_pool=0x13c7080)
    at event-epoll.c:572
#10 event_dispatch_epoll_worker (data=0x141f850) at event-epoll.c:648
#11 0x00007f4365adc5ba in start_thread () from /lib64/libpthread.so.0
#12 0x00007f43653b57cd in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f435e331700 (LWP 2933)):
#0  0x00007f4365ae1bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f4361cf0deb in hooks_worker (args=<optimized out>) at glusterd-hooks.c:529
#2  0x00007f4365adc5ba in start_thread () from /lib64/libpthread.so.0
#3  0x00007f43653b57cd in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f43627ad700 (LWP 2805)):
#0  0x00007f436537b14d in nanosleep () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
#1  0x00007f436537b09a in sleep () from /lib64/libc.so.6
#2  0x00007f4361c689ef in find_compat_brick_in_vol (conf=conf@entry=0x7f4366fb4050, 
    srch_vol=srch_vol@entry=0x7f435801ab10, comp_vol=comp_vol@entry=0x7f4358133a30, 
    brickinfo=brickinfo@entry=0x7f4358144230) at glusterd-utils.c:5236
#3  0x00007f4361c72e33 in find_compatible_brick (other_vol_p=<synthetic pointer>, brickinfo=0x7f4358144230, 
    volinfo=0x7f4358133a30, conf=0x7f4366fb4050) at glusterd-utils.c:5302
#4  glusterd_brick_start (volinfo=volinfo@entry=0x7f4358133a30, brickinfo=brickinfo@entry=0x7f4358144230, 
    wait=wait@entry=_gf_true) at glusterd-utils.c:5486
#5  0x00007f4361cdfce1 in glusterd_start_volume (volinfo=0x7f4358133a30, flags=<optimized out>, 
    wait=wait@entry=_gf_true) at glusterd-volume-ops.c:2500
#6  0x00007f4361ce00b6 in glusterd_op_start_volume (dict=dict@entry=0x7f434c1810d0, 
    op_errstr=op_errstr@entry=0x7f4350204e08) at glusterd-volume-ops.c:2604
#7  0x00007f4361d16a7b in gd_mgmt_v3_commit_fn (op=op@entry=GD_OP_START_VOLUME, dict=dict@entry=0x7f434c1810d0, 
    op_errstr=op_errstr@entry=0x7f4350204e08, op_errno=op_errno@entry=0x7f4350204dfc, 
    rsp_dict=rsp_dict@entry=0x7f434c092960) at glusterd-mgmt.c:319
#8  0x00007f4361d19854 in glusterd_mgmt_v3_commit (op=op@entry=GD_OP_START_VOLUME, 
    op_ctx=op_ctx@entry=0x7f434c180570, req_dict=0x7f434c1810d0, op_errstr=op_errstr@entry=0x7f4350204e08, 
    op_errno=op_errno@entry=0x7f4350204dfc, txn_generation=txn_generation@entry=0) at glusterd-mgmt.c:1524
#9  0x00007f4361d1b3c0 in glusterd_mgmt_v3_initiate_all_phases (req=req@entry=0x7f43500031d0, 
    op=op@entry=GD_OP_START_VOLUME, dict=0x7f434c180570) at glusterd-mgmt.c:2141
#10 0x00007f4361cd9362 in __glusterd_handle_cli_start_volume (req=req@entry=0x7f43500031d0)
    at glusterd-volume-ops.c:551
#11 0x00007f4361c496ed in glusterd_big_locked_handler (req=0x7f43500031d0, 
    actor_fn=0x7f4361cd9100 <__glusterd_handle_cli_start_volume>) at glusterd-handler.c:81
#12 0x00007f4366cb5d20 in synctask_wrap () at syncop.c:375
#13 0x00007f43652fb2c0 in ?? () from /lib64/libc.so.6
#14 0x0000000000000000 in ?? ()


We'd need to see how we can handle the disconnect part better for multiplexed bricks so that this scenario can be handled, but certainly not in this release as per our agreement.

Comment 4 Atin Mukherjee 2017-08-17 10:45:02 UTC

The time out is seen because of all the brickinfo's are still marked in started state and while finding a compatible brick, we're waiting for one of the brick process to come up which is never the case here as we are just starting a single volume and rest all bricks are down. This is because when a brick process goes down the brick for which the process came up for the first time gets a rpc disconnect notification and the brickinfo->status is marked as stopped, however for the other bricks which were attached to the same process continue to remain in the started state.

While processing the disconnect event, if we can identify the list of bricks which are attached to the same process (probably through glusterd_brick_proc_t?) and mark the respective bricks as stopped, this issue can be avoided.

Samikshan - can this be implemented?

Comment 5 Samikshan Bairagya 2017-09-04 11:13:09 UTC

(In reply to Atin Mukherjee from comment #4)
> The time out is seen because of all the brickinfo's are still marked in
> started state and while finding a compatible brick, we're waiting for one of
> the brick process to come up which is never the case here as we are just
> starting a single volume and rest all bricks are down. This is because when
> a brick process goes down the brick for which the process came up for the
> first time gets a rpc disconnect notification and the brickinfo->status is
> marked as stopped, however for the other bricks which were attached to the
> same process continue to remain in the started state.
> 
> While processing the disconnect event, if we can identify the list of bricks
> which are attached to the same process (probably through
> glusterd_brick_proc_t?) and mark the respective bricks as stopped, this
> issue can be avoided.
> 
> Samikshan - can this be implemented?

This can be implemented.

Comment 6 Atin Mukherjee 2017-10-09 05:18:01 UTC

upstream patch : https://review.gluster.org/#/c/18444/

Comment 11 Rajesh Madaka 2018-02-21 07:38:40 UTC

-> Tried this scenario as mentioned in above description.
-> Created 3 volumes all 1x3 on 3 node cluster
-> killed the brick pid using kill -9 on one of the node
-> Then started one volume with force option
-> Didn't encounter  "Error : Request timed out"
-> Able to start volume using "gluster vol start replica3_18 force"
-> Not showing any "cleaning of dead brick" message in logs.
-> Moving this bug to verified state.


verified this bug with below version:

glusterfs-3.12.2-4

Comment 12 errata-xmlrpc 2018-09-04 06:32:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607