Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 788924

Summary:	When entering recovery repeatedly there is a memory leak
Product:	[Retired] Corosync Cluster Engine	Reporter:	John Thompson <thompa26>
Component:	totem	Assignee:	Jan Friesse <jfriesse>
Status:	CLOSED EOL	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	1.3	CC:	jfriesse
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-03-27 19:13:00 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description John Thompson 2012-02-09 09:33:09 UTC

Description of problem:
If corosync runs over a lossy network it can enter recovery mode repeatedly.

It can also fail to recover properly, re-entering gather/commit/recovery from the recovery mode. After entering recovery a couple of hundred times, corosync has used up a lot memory.

If recovery is entered again (without entering OPERATIONAL), without transmitting all of the retrans_message_queue, then we get some memory loss when we reinitialize the queues in memb_state_recovery_enter(). 

Also deliver_messages_from_recovery_to_regular() has a small memory leak if the message wasn't added to the regular_sort_queue.

Version-Release number of selected component (if applicable):
1.3.4

How reproducible:
Pretty reproducible in a lossy network

Steps to Reproduce:
1. Setup a 4 node cluster
2. Make the network it operates lossy (either from the corosync parameters or some other mechanism)
3. Wait overnight - 8 hours.  
  
Actual results:
Corosync memory usage will have increased a lot.

Expected results:
Corosync memory usage should be relatively static.

Additional info:
We have a patch I have sent to the mailing list that resolves these two problems for us.