1898784 – Optimize friend handshake code to avoid call_bail in brick_mux environment

Bug 1898784 - Optimize friend handshake code to avoid call_bail in brick_mux environment

Summary: Optimize friend handshake code to avoid call_bail in brick_mux environment

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.5
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.5.z Batch Update 4
Assignee:	Mohit Agrawal
QA Contact:	milind
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-18 04:14 UTC by Mohit Agrawal
Modified:	2021-04-29 07:21 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-6.0-50
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-29 07:21:03 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2021:1462	0	None	None	None	2021-04-29 07:21:19 UTC

Description Mohit Agrawal 2020-11-18 04:14:32 UTC

During glusterd handshake glusterd received a volume dictionary
from peer end to compare the own volume dictionary data.If the options
are differ it sets the key to recognize volume options are changed
and call import syntask to delete/start the volume.In brick_mux
environment while number of volumes are high(5k) the dict api in function
glusterd_compare_friend_volume takes time because the function
glusterd_handle_friend_req saves all peer volume data in a single dictionary.
Due to time taken by the function glusterd_handle_friend RPC requests receives
a call_bail from a peer end gluster(CLI) won't be able to show volume status.

Reproducer
1) Setup 5100 distributed volumes 3x1
2) Enable brick_mux
3) Start all the volumes
4) Kill all gluster processes on 3rd node
5) Run a loop to update volume option on a 1st node
   for i in {1..5100}; do gluster v set vol$i performance.open-behind off; done
6) Start the glusterd process on the 3rd node
   Wait to finish handshake and check there should not be any call_bail 
   message in the logs

Comment 1 Mohit Agrawal 2020-11-18 04:16:19 UTC

The upstream patch link is under review process
https://github.com/gluster/glusterfs/issues/1613

Comment 17 errata-xmlrpc 2021-04-29 07:21:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1462

Note You need to log in before you can comment on or make changes to this bug.