Bug 788924

Summary: When entering recovery repeatedly there is a memory leak
Product: [Retired] Corosync Cluster Engine Reporter: John Thompson <thompa26>
Component: totemAssignee: Jan Friesse <jfriesse>
Status: CLOSED EOL QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 1.3CC: jfriesse
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-27 19:13:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description John Thompson 2012-02-09 09:33:09 UTC
Description of problem:
If corosync runs over a lossy network it can enter recovery mode repeatedly.

It can also fail to recover properly, re-entering gather/commit/recovery from the recovery mode. After entering recovery a couple of hundred times, corosync has used up a lot memory.

If recovery is entered again (without entering OPERATIONAL), without transmitting all of the retrans_message_queue, then we get some memory loss when we reinitialize the queues in memb_state_recovery_enter(). 

Also deliver_messages_from_recovery_to_regular() has a small memory leak if the message wasn't added to the regular_sort_queue.

Version-Release number of selected component (if applicable):
1.3.4

How reproducible:
Pretty reproducible in a lossy network

Steps to Reproduce:
1. Setup a 4 node cluster
2. Make the network it operates lossy (either from the corosync parameters or some other mechanism)
3. Wait overnight - 8 hours.  
  
Actual results:
Corosync memory usage will have increased a lot.

Expected results:
Corosync memory usage should be relatively static.

Additional info:
We have a patch I have sent to the mailing list that resolves these two problems for us.