Bug 788924 - When entering recovery repeatedly there is a memory leak
Summary: When entering recovery repeatedly there is a memory leak
Keywords:
Status: CLOSED EOL
Alias: None
Product: Corosync Cluster Engine
Classification: Retired
Component: totem
Version: 1.3
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Jan Friesse
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-09 09:33 UTC by John Thompson
Modified: 2020-03-27 19:13 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-27 19:13:00 UTC


Attachments (Terms of Use)

Description John Thompson 2012-02-09 09:33:09 UTC
Description of problem:
If corosync runs over a lossy network it can enter recovery mode repeatedly.

It can also fail to recover properly, re-entering gather/commit/recovery from the recovery mode. After entering recovery a couple of hundred times, corosync has used up a lot memory.

If recovery is entered again (without entering OPERATIONAL), without transmitting all of the retrans_message_queue, then we get some memory loss when we reinitialize the queues in memb_state_recovery_enter(). 

Also deliver_messages_from_recovery_to_regular() has a small memory leak if the message wasn't added to the regular_sort_queue.

Version-Release number of selected component (if applicable):
1.3.4

How reproducible:
Pretty reproducible in a lossy network

Steps to Reproduce:
1. Setup a 4 node cluster
2. Make the network it operates lossy (either from the corosync parameters or some other mechanism)
3. Wait overnight - 8 hours.  
  
Actual results:
Corosync memory usage will have increased a lot.

Expected results:
Corosync memory usage should be relatively static.

Additional info:
We have a patch I have sent to the mailing list that resolves these two problems for us.


Note You need to log in before you can comment on or make changes to this bug.