Bug 618321
Summary: | modclusterd memory footprint is growing over time | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Shane Bradley <sbradley> | |
Component: | clustermon | Assignee: | Jan Pokorný [poki] <jpokorny> | |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 5.5 | CC: | bbrock, c.handel, cluster-maint, djansa, edamato, james.brown, jwest, kabbott, rdassen, rmunilla, rsteiger, tao, uwe.knop, vleduc | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | modcluster-0.12.1-7.el5 | Doc Type: | Bug Fix | |
Doc Text: |
Cause
* trigger unknown, presumably uncommon event/attribute of the environment
Consequence
* outgoing queues in inter-nodes communication are growing over time
Fix
* better balanced inter-nodes communication + restriction of the queues
Result
* resources utilization kept at reasonable level
* possible queues interventions logged in /var/log/clumond.log
|
Story Points: | --- | |
Clone Of: | ||||
: | 742431 (view as bug list) | Environment: | ||
Last Closed: | 2012-02-21 06:49:32 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 742431, 758797 | |||
Attachments: |
Description
Shane Bradley
2010-07-26 16:42:02 UTC
*** Bug 607602 has been marked as a duplicate of this bug. *** Created attachment 458835 [details]
sosreport
Anything of interest from the tests that were left running over the weekend? Cheers, Karl Abbott, RHCE Technical Account Manager No. The engineering team was not able to reproduce any memory leakage. I've asked eng to work with SEG to see if they can reproduce this, but as of now we can't find where the problem might be. One possible problem is a scheduling issue. The modclusterd program is run in the "Other" (SCHED_OTHER) scheduling task list; if there is a scheduling issue, this all may well improve or go away by changing the default priority into one of the other scheduling queues. A program I wrote a long time ago can be used to switch the priority queue of modclusterd. It is located here: http://people.redhat.com/lhh/prio.tar.gz Compile it (cd prio; make), and you can then change modclusterd's scheduling queue to FIFO or RR by running: ./prio set `pidof modclusterd` rr 1 or (for SCHED_FIFO): ./prio set `pidof modclusterd` fifo 1 To check the priority of modclusterd: ./prio `pidof modclusterd` To set it back to normal, run: ./prio set `pidof modclusterd` other 0 Created attachment 534468 [details] [PATCH 2/6] fix bz618321: introduce per-peer outgoing queue pruning There is a new "_prune_peer_queues" attribute serving as a flag to mark per-peer outgoing queues eligible for pruning [*]. It is _set_ in "update" method called by the Monitor every ca. 5 seconds (its intentional iteration period). The flag is _cleared_ after every iteration of appending particular XML status update ("message") from the global queue to the peer-local queues, i.e., the first such iteration may lead to pruning the queues if flag previously set. This (at least partially) ensures the queues are not accumulated infinitely under jarring conditions. As the bool value is switched "atomically" from our point of view and in addition, we do not require absolute sychronization, accesses to the flag are not guarded by mutex (pros: no blocking) leading to "react immediatelly (in the mentioned loops)" behavior. Additionally, merge "update_peers" and "send" into a single method as these are always used together (also avoids split of mutex usage). [*] with pruning, I mean "keep possible half-proceeded XML status update in, but drop any subsequent ones" Created attachment 534469 [details] [PATCH 3/6] fix bz618321: limit peer's send() to one message only The commented out assert is meant as note that such condition should always hold (contrary to previous explicit check, which was a no-op anyway). Created attachment 534470 [details] [PATCH 4/6] fix bz618321: read all available with peer's receive() The preprocessor conditional is kept for easy switch when needed. Created attachment 534471 [details] [PATCH 5/6] fix bz618321: split+restructure poll handling in communicator The old structure: 1. server socket - POLLIN - POLLERR | POLLHUP | POLLNVAL 2. client sockets ** POLLIN or POLLERR | POLLHUP | POLLNVAL or POLLOUT The new structure: 1. server socket -> handle_server_socket() - POLLIN (accept) - POLLERR | POLLHUP | POLLNVAL 2. client sockets -> handle_client_socket() - POLLERR | POLLNVAL - POLLIN - POLLOUT - POLLHUP Now it is worth to change "poll_data[i].events = POLLOUT" to "poll_data[i].events |= POLLOUT" as these do not appear as mutually exclusive now (one go can cover them both). Also for client sockets -- POLLIN, add an optimization that _delivery_point.msg_arrived() is called for the last received message (if any) and not for all of them (see in-code comment). The preprocessor conditional is kept for easy switch when needed. Additionally, use "const" for method arguments when desirable and handle previously suppressed exceptions. Created attachment 534472 [details] [PATCH 6/6] fix bz618321: turn off Nagle's alg. in peers' communication The reason behind that is that we are sending the whole messages (cluster XML updates) and want to achieve immediate transport to the other peer that conversely wants to read the whole message. Also expose respective Socket's methods as in "nonblocking" case. Also remove duplicate "nonblocking" setting (will be set in Peer's constructor anyway). Created attachment 534482 [details] [PATCH 1/6] fix bz618321: clarify recv/read_restart+send/write_restart In fact, {read,write}_restart will never return -EAGAIN/-EWOULDBLOCK (and never did before). Patches with cloned bug 742431 (RHEL 6) also apply: * attachment 535963 [details]: bz742431: additional performance improvement patch [1/2] * attachment 535964 [details]: bz742431: additional performance improvement patch [2/2] Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause * trigger unknown, presumably uncommon event/attribute of the environment Consequence * outgoing queues in inter-nodes communication are growing over time Fix * better balanced inter-nodes communication + restriction of the queues Result * resources utilization kept at reasonable level * possible queues interventions logged in /var/log/clumond.log Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0292.html *** Bug 1219866 has been marked as a duplicate of this bug. *** |