Bug 1194166

Summary: epoll: configuration default for {client,server}.event-threads option not honoured
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: krishnan parthasarathi <kparthas>
Component: coreAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: unspecified Docs Contact:
Priority: high    
Version: rhgs-3.0CC: annair, kparthas, nsathyan, rcyriac, rhs-bugs, rjoseph, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.0.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.6.0.46-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-26 06:36:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1104462    
Bug Blocks: 1182947    

Description krishnan parthasarathi 2015-02-19 09:07:32 UTC
Description of problem:
Users can configure the no. of worker poller threads for a volume via the following volume options.

client.event-threads - for GlusterFS native mount, gluster-NFS server and other glusterfs daemons that load the protocol/client translator.

server.event-threads - for brick processes

Sub-problem 1:
--------------
The default for both of the above options is 2. In this release, we can observe that only one poller worker thread is waiting for network events by default.

Using the following bash function, we can verify the above claim. It doubles up as a means to verify the fix.

/* Returns a line containing "SyS_epoll_wait+0xb5/0xe0"
   for every epoll thread running in a process.
   Usage: nepoll <pid>
*/
function nepoll ()
{
    local pid=$1;
    for i in $(ls /proc/$pid/task);
    do
        cat /proc/$pid/task/$i/stack | grep epoll_wait;
    done
}


With this helper you can count the no. of epoll threads in a process
at any given time, by doing "# nepoll <pid> | wc -l".

Sample output:
[root@trantor codebase]# nepoll 3685
[<ffffffff81208c25>] SyS_epoll_wait+0xb5/0xe0

Sub-problem 2:
--------------
After setting the client.event-threads to 5 (for e.g), volume-reset doesn't bring down the count back to 2 (viz. the default) and instead remains at 5.

Sample output:
[root@trantor codebase]# nepoll 3685
[<ffffffff81208c25>] SyS_epoll_wait+0xb5/0xe0
[<ffffffff81208c25>] SyS_epoll_wait+0xb5/0xe0
[<ffffffff81208c25>] SyS_epoll_wait+0xb5/0xe0
[<ffffffff81208c25>] SyS_epoll_wait+0xb5/0xe0
[<ffffffff81208c25>] SyS_epoll_wait+0xb5/0xe0

Version-Release number of selected component (if applicable):
glusterfs-3.6.0.44-1

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 krishnan parthasarathi 2015-02-19 10:34:09 UTC
NB Version no. in the description should have been glusterfs-3.6.0.45-1.

Comment 4 SATHEESARAN 2015-03-05 11:23:14 UTC
By default, server.event-threads and client.event-threads are set to 2. That solves the subproblem-1 as mentioned in comment0

For subproblem-2, as mentioned in comment0, resetting the client.event-threads or server.event-threads doesn't reduce the count of epoll threads to the default value of 2, unless an IO is performed on that mount.

Comment 5 SATHEESARAN 2015-03-05 11:24:22 UTC
Removing the need-info, as it was set mistakenly

Comment 6 SATHEESARAN 2015-03-11 07:56:17 UTC
Verified with glusterfs-3.6.0.48-1.el6rhs with the following steps :

1. Create a replicate volume and started it
2. Fuse mounted the volume with RHEL 6.5
3. Executed the script mentioned in comment0 with pid of fuse mount
4. Executed the script mentioned in comment0 with pid of brick process

Observed:
There are 2 epoll_wait threads, which should be the default.

Also executed the test cases in https://tcms.engineering.redhat.com/plan/17096/rhs-glusterfs-epoll-configuration

The problem that was found was - resetting the {server,client}.event-threads, doesn't reduce the epoll_thread count to 2, unless IO was performed from the mount.
I have raised a seperate bug - https://bugzilla.redhat.com/show_bug.cgi?id=1200679
 to track that issue.

Marking this bug as VERIFIED based on above observation

Comment 8 errata-xmlrpc 2015-03-26 06:36:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0682.html