809577 – Memory leak in vgremove

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 809577 - Memory leak in vgremove

Summary: Memory leak in vgremove

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	6.3
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Zdenek Kabelac
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-04-03 16:58 UTC by Nenad Peric
Modified:	2012-04-20 16:45 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-04-20 12:24:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Nenad Peric 2012-04-03 16:58:58 UTC

Description of problem:

While doing a removal of VG with many LVs, vgremove works very slowly and eats up RAM.


Version-Release number of selected component (if applicable):

  cmirror-2.02.96-0.92.el6                                                    
  device-mapper-1.02.75-0.92.el6                                              
  device-mapper-event-1.02.75-0.92.el6                                        
  device-mapper-event-libs-1.02.75-0.92.el6                                   
  device-mapper-libs-1.02.75-0.92.el6                                         
  lvm2-2.02.96-0.92.el6                                                       
  lvm2-cluster-2.02.96-0.92.el6                                               
  lvm2-libs-2.02.96-0.92.el6           



How reproducible:

When having a large amount of LVs, every time. 

Steps to Reproduce:
1. Created VG and a lot of LVs or snapshots inside:

SCENARIO - [many_snaps]
Create 500 snapshots of an origin volume
Recreating VG and PVs to increase metadata size
Making origin volume
Making 500 snapshots of origin volume

Created 350 snapshots before we ran out of space in VG on the system I was testing on. 

2. Remove the VG with -ff

vgremove -ff snapper

  
Actual results:

The vgremove process is very slow and the memory increase is substantial:

Logical volume "500_36" successfully removed


Cpu(s): 15.0%us, 68.0%sy,  0.0%ni,  0.9%id,  3.4%wa,  0.0%hi, 12.7%si,  0.0%st
Mem:   5861712k total,  2642804k used,  3218908k free,   143680k buffers
Swap:  2064376k total,        0k used,  2064376k free,  1073084k cached

  PID PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                    
20123  2 -18  735m 623m 3512 S 37.6 10.9   2:19.64 vgremove                   
 1371  0 -20     0    0    0 S  6.7  0.0   6:08.84 iscsi_q_9                  
 3811  2 -18  705m 187m  97m S  6.5  3.3   2:43.24 dmeventd       
.
.
.
.
.

 Logical volume "500_210" successfully removed

Cpu(s): 11.9%us, 34.0%sy,  0.0%ni, 12.3%id, 33.4%wa,  0.0%hi,  8.5%si,  0.0%st
Mem:   5861712k total,  4431296k used,  1430416k free,   143740k buffers
Swap:  2064376k total,        0k used,  2064376k free,  1108032k cached

  PID PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                   
20123  2 -18 2602m 2.4g 3512 S 14.3 43.5   7:02.78 vgremove                  
 1371  0 -20     0    0    0 S  6.0  0.0   7:24.99 iscsi_q_9                 
 3811  2 -18  748m 225m  97m S  1.4  3.9   3:16.58 dmeventd                  

It kept increasing till it removed all the LVs. 

Luckily 6GB of ram was enough to remove the 350 LVs which were created previously. 


Expected results:

Be less hungry.

Comment 2 Nenad Peric 2012-04-03 17:44:38 UTC

Reproduced with:

lvm2-libs-2.02.95-3.el6.x86_64
lvm2-cluster-2.02.95-3.el6.x86_64
lvm2-2.02.95-3.el6.x86_64
cmirror-2.02.95-3.el6.x86_64
device-mapper-1.02.74-3.el6.x86_64
device-mapper-libs-1.02.74-3.el6.x86_64
device-mapper-event-1.02.74-3.el6.x86_64
device-mapper-event-libs-1.02.74-3.el6.x86_64

Created only 200 snaps of origin this time.
tried deleting the VG with 

vgremove -ff snapper

Did go a bit faster but the memory consumption was still increasing with every
removal.

Here is the report around mid-way:

Logical volume "500_130" successfully removed

Cpu(s): 15.5%us, 45.5%sy,  0.0%ni,  6.1%id, 21.9%wa,  0.0%hi, 11.1%si,  0.0%st
Mem:   5861712k total,  1763340k used,  4098372k free,   142620k buffers
Swap:  2064376k total,        0k used,  2064376k free,   231816k cached

PID   PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                   
10547  2 -18 1032m 920m 3512 S 15.9 16.1   1:33.69 vgremove                  
 1324  0 -20     0    0    0 S  7.6  0.0   1:18.31 iscsi_q_8                 
 2874  2 -18  683m 151m  97m S  2.0  2.6   0:44.83 dmeventd

Comment 3 Zdenek Kabelac 2012-04-03 19:05:42 UTC

The problem could be probably related to assumption that using something like 200 old snapshots is something 'well' supported by lvm2 - but in fact that is only theoretically usable - the table construction related to process such beast is rather very ugly and as such there was no time spent to optimize this rather not really usable case - I think even 20 snaps of the same origin are well beyond any practical use if old-style snaps are used for this.

Another issue is to optimize removal of more devices at once - this is something considered for 6.4.

For now there is every device removed uniquely - which is very slow if there are hundred or even thousands devices - and it's extremely slow for old snapshots.
And also quite annoying in case we want to drop i.e. whole thin pool - which should ideally deactivate all thin volumes and remove all entries from metadata - but for now there will be a large set of this writes and table updates.

Comment 4 Milan Broz 2012-04-20 12:20:50 UTC

So it is not leak, it is just extreme case which will not work anyway with the old snapshot implementation (or will be terribly slow).

Comment 5 RHEL Program Management 2012-04-20 12:24:52 UTC

Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 6 Alasdair Kergon 2012-04-20 13:41:58 UTC

This bugzilla is just confirming known limitations of the tools.
Both problems are already being tracked and solved elsewhere.  (Multiple snaps, now using thin provisioning; improved tool speed when handling multiple LVs at once.)

Note You need to log in before you can comment on or make changes to this bug.