Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 742431

Summary: modclusterd memory footprint is growing over time
Product: Red Hat Enterprise Linux 6 Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: clustermonAssignee: Jan Pokorný [poki] <jpokorny>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.1CC: bbrock, c.handel, cluster-maint, edamato, james.brown, jpokorny, jwest, kabbott, rmunilla, rsteiger, sbradley, tao, uwe.knop
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: modcluster-0.16.2-16.el6 Doc Type: Bug Fix
Doc Text:
Cause * trigger unknown, presumably uncommon event/attribute of the environment Consequence * outgoing queues in inter-nodes communication are growing over time Fix * better balanced inter-nodes communication + restriction of the queues Result * resources utilization kept at reasonable level * possible queues interventions logged in /var/log/clumond.log
Story Points: ---
Clone Of: 618321 Environment:
Last Closed: 2012-06-20 11:57:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 618321    
Bug Blocks: 756082    
Attachments:
Description Flags
[PATCH 1/6] fix bz742431: clarify recv/read_restart+send/write_restart
none
[PATCH 2/6] fix bz742431: introduce per-peer outgoing queue pruning
none
[PATCH 3/6] fix bz742431: limit peer's send() to one message only
none
[PATCH 4/6] fix bz742431: read all available with peer's receive()
none
[PATCH 5/6] fix bz742431: split+restructure poll handling in communicator
none
[PATCH 6/6] fix bz742431: turn off Nagle's alg. in peers' communication
none
bz742431: additional performance improvement patch [1/2]
none
bz742431: additional performance improvement patch [2/2]
none
bz742431: additional fix for a minor memory leak none

Comment 1 Fabio Massimo Di Nitto 2011-09-30 06:00:25 UTC
According to:

https://www.redhat.com/archives/linux-cluster/2011-September/msg00067.html

this issue exists in RHEL6 too.

Comment 5 Jan Pokorný [poki] 2011-11-24 15:37:19 UTC
Created attachment 535954 [details]
[PATCH 1/6] fix bz742431: clarify recv/read_restart+send/write_restart

Comment 6 Jan Pokorný [poki] 2011-11-24 15:39:45 UTC
Created attachment 535955 [details]
[PATCH 2/6] fix bz742431: introduce per-peer outgoing queue pruning

Comment 7 Jan Pokorný [poki] 2011-11-24 15:40:45 UTC
Created attachment 535956 [details]
[PATCH 3/6] fix bz742431: limit peer's send() to one message only

Comment 8 Jan Pokorný [poki] 2011-11-24 15:42:05 UTC
Created attachment 535957 [details]
[PATCH 4/6] fix bz742431: read all available with peer's receive()

Comment 9 Jan Pokorný [poki] 2011-11-24 15:43:10 UTC
Created attachment 535959 [details]
[PATCH 5/6] fix bz742431: split+restructure poll handling in communicator

Comment 10 Jan Pokorný [poki] 2011-11-24 15:44:15 UTC
Created attachment 535961 [details]
[PATCH 6/6] fix bz742431: turn off Nagle's alg. in peers' communication

Comment 11 Jan Pokorný [poki] 2011-11-24 15:46:20 UTC
Created attachment 535963 [details]
bz742431: additional performance improvement patch [1/2]

Comment 12 Jan Pokorný [poki] 2011-11-24 15:47:10 UTC
Created attachment 535964 [details]
bz742431: additional performance improvement patch [2/2]

Comment 14 Radek Steiger 2011-11-28 18:17:07 UTC
As per Comment https://bugzilla.redhat.com/show_bug.cgi?id=618321#c75 acking this for QA using an artificial test as described.

Comment 15 Jan Pokorný [poki] 2011-12-06 21:06:05 UTC
Created attachment 541598 [details]
bz742431: additional fix for a minor memory leak

Original patch attachment 529083 [details] (accidentally posted
by bug 618321 whereas it should have been here) revisited.

Recap: the leaking triggered with connections to /var/run/clumond.sock
       (2 B per connection IIRC, incomparable with that big memory issue)

Comment 18 Jan Pokorný [poki] 2012-04-27 13:58:05 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
* trigger unknown, presumably uncommon event/attribute of the environment
Consequence
* outgoing queues in inter-nodes communication are growing over time
Fix
* better balanced inter-nodes communication + restriction of the queues
Result
* resources utilization kept at reasonable level
* possible queues interventions logged in /var/log/clumond.log

Comment 25 errata-xmlrpc 2012-06-20 11:57:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0750.html