Bug 1327751
Summary: | glusterd memory overcommit | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Dustin Black <dblack> |
Component: | glusterd | Assignee: | Kaushal <kaushal> |
Status: | CLOSED ERRATA | QA Contact: | Byreddy <bsrirama> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.1 | CC: | amukherj, annair, asrivast, bkunal, bordas.csaba, bugs, kaushal, rcyriac, rhinduja, rhs-bugs, rnalakka, ryanlee, sankarshan, storage-qa-internal, vbellur |
Target Milestone: | --- | Keywords: | Triaged, ZStream |
Target Release: | RHGS 3.1.3 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.7.9-4 | Doc Type: | Bug Fix |
Doc Text: |
When encrypted connections are enabled, a separate thread is used for each connection. These threads were not being cleaned up correctly after the connections ended, which led to a gradual increase in GlusterFS memory consumption over time, especially in the glusterd process. These disconnected threads are now cleaned up once a minute, avoiding the problem with memory usage.
|
Story Points: | --- |
Clone Of: | 1268125 | Environment: | |
Last Closed: | 2016-06-23 05:18:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1268125, 1331289 | ||
Bug Blocks: | 1311817 |
Description
Dustin Black
2016-04-15 20:20:25 UTC
When using TLS, glusterfs launches a new thread for each connection. I suspect that the threads are not being reaped correctly causing the mem allocation to balloon, particularly VIRT. Testing has confirmed that the threads launched for TLS connections are not being reaped, leading to the gradual increase in memory used by GlusterD. This should happen with bricks as well, when using IO encryption, but it's not as noticeable as in GlusterD. GlusterD gets a lot of short-lived connections from clients for volfile and portmap requests. This leaves a lot of dead threads in GlusterD which are waiting to be reaped (pthread_joined). We're figuring out how to correctly solve the problem of reaping threads, as we don't see a straight forward method right away. We also have a discussion ongoing in the upstream mailing lists, discussing TLS and threads in GlusterFS [1]. This discusses existing issues we found in TLS implementation in GlusterFS, including the issue of thread reaping. [1] https://www.gluster.org/pipermail/gluster-devel/2016-April/049116.html Upstream patch http://review.gluster.org/14101 posted for review. upstream mainline patch : http://review.gluster.org/14101 release-3.7 patch : http://review.gluster.org/14143 Downstream patch : https://code.engineering.redhat.com/gerrit/#/c/73681 Verified this bug using the build "glusterfs-3.7.9-4" After doing mount and umount of volume for 300 times, the memory footprint with and without fix is below. Without Fix: ----------- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5257 root 20 0 41.496g 153400 4480 S 0.0 8.1 0:30.08 glusterd. With Fix: --------- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4245 root 20 0 1459040 74808 4752 S 0.0 4.0 0:40.21 glusterd With the fix, memory over commit is not happening, moving to verified state. Changing Status to ON_QA for hotfix testing. Moving this to verified state as the issue is fixed in the downstream 3.1.3 build. Please use needinfo flags for tracking the progress on hotfix. I am not sure on the candidate for needinfo. Setting it for Anoop. @Anoop : Please change it to appropriate person in case it is needed. Customer has agreed to wait for few weeks until we release 3.1.3. Thus, we do not require hotfix. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |