Red Hat Bugzilla – Bug 618321
modclusterd memory footprint is growing over time
Last modified: 2016-04-26 09:58:21 EDT
Description of problem: On certain machines customer is seeing the process modclusterd consume and hold on to large portoins of memory. This appears to be a memory leak of some sort. Customer has seen this on a couple of machines and uses modclusterd for snmp monitoring. Here is example output of what they are seeing: $ cat uptime 15:25:46 up 69 days, 9:53, 3 users, load average: 1.05, 0.25, 0.08 $ grep modclusterd ps USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 7422 0.4 41.3 4394244 720792 ? S<sl May11 487:50 modclusterd The only way to release this memory is hard kill(-9) modclusterd process. This should be done before using the init scripts because the init scripts fail to shutdown the process. Version-Release number of selected component (if applicable): modcluster-0.12.1-2.el5-x86_64 How reproducible: Not easily. I do believe that snmp might be involved in this. Steps to Reproduce: 1. Enable modclusterd 2. Monitor memory footprint of modclusterd Actual results: The amount of memory that modclustered holds grows very large over time. Expected results: The amount of memory that modclustered holds should be small and stable. Additional info: Reviewed some graphs for this modclusterd process and it does show a linear consumpution of memory.
*** Bug 607602 has been marked as a duplicate of this bug. ***
Created attachment 458835 [details] sosreport
Anything of interest from the tests that were left running over the weekend? Cheers, Karl Abbott, RHCE Technical Account Manager
No. The engineering team was not able to reproduce any memory leakage. I've asked eng to work with SEG to see if they can reproduce this, but as of now we can't find where the problem might be.
One possible problem is a scheduling issue. The modclusterd program is run in the "Other" (SCHED_OTHER) scheduling task list; if there is a scheduling issue, this all may well improve or go away by changing the default priority into one of the other scheduling queues. A program I wrote a long time ago can be used to switch the priority queue of modclusterd. It is located here: http://people.redhat.com/lhh/prio.tar.gz Compile it (cd prio; make), and you can then change modclusterd's scheduling queue to FIFO or RR by running: ./prio set `pidof modclusterd` rr 1 or (for SCHED_FIFO): ./prio set `pidof modclusterd` fifo 1 To check the priority of modclusterd: ./prio `pidof modclusterd` To set it back to normal, run: ./prio set `pidof modclusterd` other 0
Created attachment 534468 [details] [PATCH 2/6] fix bz618321: introduce per-peer outgoing queue pruning There is a new "_prune_peer_queues" attribute serving as a flag to mark per-peer outgoing queues eligible for pruning [*]. It is _set_ in "update" method called by the Monitor every ca. 5 seconds (its intentional iteration period). The flag is _cleared_ after every iteration of appending particular XML status update ("message") from the global queue to the peer-local queues, i.e., the first such iteration may lead to pruning the queues if flag previously set. This (at least partially) ensures the queues are not accumulated infinitely under jarring conditions. As the bool value is switched "atomically" from our point of view and in addition, we do not require absolute sychronization, accesses to the flag are not guarded by mutex (pros: no blocking) leading to "react immediatelly (in the mentioned loops)" behavior. Additionally, merge "update_peers" and "send" into a single method as these are always used together (also avoids split of mutex usage). [*] with pruning, I mean "keep possible half-proceeded XML status update in, but drop any subsequent ones"
Created attachment 534469 [details] [PATCH 3/6] fix bz618321: limit peer's send() to one message only The commented out assert is meant as note that such condition should always hold (contrary to previous explicit check, which was a no-op anyway).
Created attachment 534470 [details] [PATCH 4/6] fix bz618321: read all available with peer's receive() The preprocessor conditional is kept for easy switch when needed.
Created attachment 534471 [details] [PATCH 5/6] fix bz618321: split+restructure poll handling in communicator The old structure: 1. server socket - POLLIN - POLLERR | POLLHUP | POLLNVAL 2. client sockets ** POLLIN or POLLERR | POLLHUP | POLLNVAL or POLLOUT The new structure: 1. server socket -> handle_server_socket() - POLLIN (accept) - POLLERR | POLLHUP | POLLNVAL 2. client sockets -> handle_client_socket() - POLLERR | POLLNVAL - POLLIN - POLLOUT - POLLHUP Now it is worth to change "poll_data[i].events = POLLOUT" to "poll_data[i].events |= POLLOUT" as these do not appear as mutually exclusive now (one go can cover them both). Also for client sockets -- POLLIN, add an optimization that _delivery_point.msg_arrived() is called for the last received message (if any) and not for all of them (see in-code comment). The preprocessor conditional is kept for easy switch when needed. Additionally, use "const" for method arguments when desirable and handle previously suppressed exceptions.
Created attachment 534472 [details] [PATCH 6/6] fix bz618321: turn off Nagle's alg. in peers' communication The reason behind that is that we are sending the whole messages (cluster XML updates) and want to achieve immediate transport to the other peer that conversely wants to read the whole message. Also expose respective Socket's methods as in "nonblocking" case. Also remove duplicate "nonblocking" setting (will be set in Peer's constructor anyway).
Created attachment 534482 [details] [PATCH 1/6] fix bz618321: clarify recv/read_restart+send/write_restart In fact, {read,write}_restart will never return -EAGAIN/-EWOULDBLOCK (and never did before).
Patches with cloned bug 742431 (RHEL 6) also apply: * attachment 535963 [details]: bz742431: additional performance improvement patch [1/2] * attachment 535964 [details]: bz742431: additional performance improvement patch [2/2]
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause * trigger unknown, presumably uncommon event/attribute of the environment Consequence * outgoing queues in inter-nodes communication are growing over time Fix * better balanced inter-nodes communication + restriction of the queues Result * resources utilization kept at reasonable level * possible queues interventions logged in /var/log/clumond.log
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0292.html
*** Bug 1219866 has been marked as a duplicate of this bug. ***