| Summary: | corosync memory footprint increases on every node rejoin | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jaroslav Kortus <jkortus> | ||||||
| Component: | corosync | Assignee: | Jan Friesse <jfriesse> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 6.8 | CC: | ccaulfie, cluster-maint, jkortus, jruemker | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | corosync-1.4.7-5.el6 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Cause:
User rejoins node.
Consequence:
Some buffers in corosync are not freed so memory consumption grows.
Fix:
Make sure all buffers are fixed.
Result:
No memory is leaked.
|
Story Points: | --- | ||||||
| Clone Of: | |||||||||
| : | 1306349 (view as bug list) | Environment: | |||||||
| Last Closed: | 2016-05-10 19:43:04 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1306349 | ||||||||
| Attachments: |
|
||||||||
|
Description
Jaroslav Kortus
2016-02-05 17:46:12 UTC
Jaroslav, can you please attach config file and corosync.log from both nodes? Is this behavior new in 1.4.7-4 or it was also in 1.4.7-2? Can you try corosync without cman so we can reduce scope to corosync only (and not cman)? Created attachment 1122175 [details]
cluster logs (crm_report)
I've attached crm_report, which I hope has all useful info in one package. The same behaviour can be observed using corosync-1.4.7-2.el6.x86_64 (RHEL 6.7). Diff after 10 iterations (1.4.7-2): VmRSS: 59844 kB VmRSS: 80404 kB Ok, so it looks like pretty minimal two node cluster. Can you please try corosync without cman so we can reduce scope to corosync only (and not cman)? Also pcsd was hitting following glibc bug: https://bugzilla.redhat.com/show_bug.cgi?id=1102739 So maybe it's same problem. Created attachment 1122820 [details]
Proposed patch
totempg: Fix memory leak
Previously there were two free lists. One for operational and one for
transitional state. Because every node starts in transitional state and
always ends in the operational state, assembly was always put to normal
state free list and never in transitional free list, so new assembly
structure was always allocated after new node connected.
Solution is to have only one free list.
*** Bug 1309809 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0753.html |