Bug 1194166

Summary:	epoll: configuration default for {client,server}.event-threads option not honoured
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	krishnan parthasarathi <kparthas>
Component:	core	Assignee:	krishnan parthasarathi <kparthas>
Status:	CLOSED ERRATA	QA Contact:	SATHEESARAN <sasundar>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	rhgs-3.0	CC:	annair, kparthas, nsathyan, rcyriac, rhs-bugs, rjoseph, storage-qa-internal
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.0.4
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.6.0.46-1	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-03-26 06:36:12 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1104462
Bug Blocks:	1182947

Description krishnan parthasarathi 2015-02-19 09:07:32 UTC

Description of problem:
Users can configure the no. of worker poller threads for a volume via the following volume options.

client.event-threads - for GlusterFS native mount, gluster-NFS server and other glusterfs daemons that load the protocol/client translator.

server.event-threads - for brick processes

Sub-problem 1:
--------------
The default for both of the above options is 2. In this release, we can observe that only one poller worker thread is waiting for network events by default.

Using the following bash function, we can verify the above claim. It doubles up as a means to verify the fix.

/* Returns a line containing "SyS_epoll_wait+0xb5/0xe0"
   for every epoll thread running in a process.
   Usage: nepoll <pid>
*/
function nepoll ()
{
    local pid=$1;
    for i in $(ls /proc/$pid/task);
    do
        cat /proc/$pid/task/$i/stack | grep epoll_wait;
    done
}


With this helper you can count the no. of epoll threads in a process
at any given time, by doing "# nepoll <pid> | wc -l".

Sample output:
[root@trantor codebase]# nepoll 3685
[<ffffffff81208c25>] SyS_epoll_wait+0xb5/0xe0

Sub-problem 2:
--------------
After setting the client.event-threads to 5 (for e.g), volume-reset doesn't bring down the count back to 2 (viz. the default) and instead remains at 5.

Sample output:
[root@trantor codebase]# nepoll 3685
[<ffffffff81208c25>] SyS_epoll_wait+0xb5/0xe0
[<ffffffff81208c25>] SyS_epoll_wait+0xb5/0xe0
[<ffffffff81208c25>] SyS_epoll_wait+0xb5/0xe0
[<ffffffff81208c25>] SyS_epoll_wait+0xb5/0xe0
[<ffffffff81208c25>] SyS_epoll_wait+0xb5/0xe0

Version-Release number of selected component (if applicable):
glusterfs-3.6.0.44-1

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 krishnan parthasarathi 2015-02-19 10:34:09 UTC

NB Version no. in the description should have been glusterfs-3.6.0.45-1.

Comment 4 SATHEESARAN 2015-03-05 11:23:14 UTC

By default, server.event-threads and client.event-threads are set to 2. That solves the subproblem-1 as mentioned in comment0

For subproblem-2, as mentioned in comment0, resetting the client.event-threads or server.event-threads doesn't reduce the count of epoll threads to the default value of 2, unless an IO is performed on that mount.

Comment 5 SATHEESARAN 2015-03-05 11:24:22 UTC

Removing the need-info, as it was set mistakenly

Comment 6 SATHEESARAN 2015-03-11 07:56:17 UTC

Verified with glusterfs-3.6.0.48-1.el6rhs with the following steps :

1. Create a replicate volume and started it
2. Fuse mounted the volume with RHEL 6.5
3. Executed the script mentioned in comment0 with pid of fuse mount
4. Executed the script mentioned in comment0 with pid of brick process

Observed:
There are 2 epoll_wait threads, which should be the default.

Also executed the test cases in https://tcms.engineering.redhat.com/plan/17096/rhs-glusterfs-epoll-configuration

The problem that was found was - resetting the {server,client}.event-threads, doesn't reduce the epoll_thread count to 2, unless IO was performed from the mount.
I have raised a seperate bug - https://bugzilla.redhat.com/show_bug.cgi?id=1200679
 to track that issue.

Marking this bug as VERIFIED based on above observation

Comment 8 errata-xmlrpc 2015-03-26 06:36:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0682.html