Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 809577 - Memory leak in vgremove
Summary: Memory leak in vgremove
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.3
Hardware: x86_64
OS: Linux
Target Milestone: rc
: ---
Assignee: Zdenek Kabelac
QA Contact: Cluster QE
Depends On:
TreeView+ depends on / blocked
Reported: 2012-04-03 16:58 UTC by Nenad Peric
Modified: 2012-04-20 16:45 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2012-04-20 12:24:52 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Nenad Peric 2012-04-03 16:58:58 UTC
Description of problem:

While doing a removal of VG with many LVs, vgremove works very slowly and eats up RAM.

Version-Release number of selected component (if applicable):


How reproducible:

When having a large amount of LVs, every time. 

Steps to Reproduce:
1. Created VG and a lot of LVs or snapshots inside:

SCENARIO - [many_snaps]
Create 500 snapshots of an origin volume
Recreating VG and PVs to increase metadata size
Making origin volume
Making 500 snapshots of origin volume

Created 350 snapshots before we ran out of space in VG on the system I was testing on. 

2. Remove the VG with -ff

vgremove -ff snapper

Actual results:

The vgremove process is very slow and the memory increase is substantial:

Logical volume "500_36" successfully removed

Cpu(s): 15.0%us, 68.0%sy,  0.0%ni,  0.9%id,  3.4%wa,  0.0%hi, 12.7%si,  0.0%st
Mem:   5861712k total,  2642804k used,  3218908k free,   143680k buffers
Swap:  2064376k total,        0k used,  2064376k free,  1073084k cached

  PID PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                    
20123  2 -18  735m 623m 3512 S 37.6 10.9   2:19.64 vgremove                   
 1371  0 -20     0    0    0 S  6.7  0.0   6:08.84 iscsi_q_9                  
 3811  2 -18  705m 187m  97m S  6.5  3.3   2:43.24 dmeventd       

 Logical volume "500_210" successfully removed

Cpu(s): 11.9%us, 34.0%sy,  0.0%ni, 12.3%id, 33.4%wa,  0.0%hi,  8.5%si,  0.0%st
Mem:   5861712k total,  4431296k used,  1430416k free,   143740k buffers
Swap:  2064376k total,        0k used,  2064376k free,  1108032k cached

  PID PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                   
20123  2 -18 2602m 2.4g 3512 S 14.3 43.5   7:02.78 vgremove                  
 1371  0 -20     0    0    0 S  6.0  0.0   7:24.99 iscsi_q_9                 
 3811  2 -18  748m 225m  97m S  1.4  3.9   3:16.58 dmeventd                  

It kept increasing till it removed all the LVs. 

Luckily 6GB of ram was enough to remove the 350 LVs which were created previously. 

Expected results:

Be less hungry.

Comment 2 Nenad Peric 2012-04-03 17:44:38 UTC
Reproduced with:


Created only 200 snaps of origin this time.
tried deleting the VG with 

vgremove -ff snapper

Did go a bit faster but the memory consumption was still increasing with every

Here is the report around mid-way:

Logical volume "500_130" successfully removed

Cpu(s): 15.5%us, 45.5%sy,  0.0%ni,  6.1%id, 21.9%wa,  0.0%hi, 11.1%si,  0.0%st
Mem:   5861712k total,  1763340k used,  4098372k free,   142620k buffers
Swap:  2064376k total,        0k used,  2064376k free,   231816k cached

PID   PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                   
10547  2 -18 1032m 920m 3512 S 15.9 16.1   1:33.69 vgremove                  
 1324  0 -20     0    0    0 S  7.6  0.0   1:18.31 iscsi_q_8                 
 2874  2 -18  683m 151m  97m S  2.0  2.6   0:44.83 dmeventd

Comment 3 Zdenek Kabelac 2012-04-03 19:05:42 UTC
The problem could be probably related to assumption that using something like 200 old snapshots is something 'well' supported by lvm2 - but in fact that is only theoretically usable - the table construction related to process such beast is rather very ugly and as such there was no time spent to optimize this rather not really usable case - I think even 20 snaps of the same origin are well beyond any practical use if old-style snaps are used for this.

Another issue is to optimize removal of more devices at once - this is something considered for 6.4.

For now there is every device removed uniquely - which is very slow if there are hundred or even thousands devices - and it's extremely slow for old snapshots.
And also quite annoying in case we want to drop i.e. whole thin pool - which should ideally deactivate all thin volumes and remove all entries from metadata - but for now there will be a large set of this writes and table updates.

Comment 4 Milan Broz 2012-04-20 12:20:50 UTC
So it is not leak, it is just extreme case which will not work anyway with the old snapshot implementation (or will be terribly slow).

Comment 5 RHEL Program Management 2012-04-20 12:24:52 UTC
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 6 Alasdair Kergon 2012-04-20 13:41:58 UTC
This bugzilla is just confirming known limitations of the tools.
Both problems are already being tracked and solved elsewhere.  (Multiple snaps, now using thin provisioning; improved tool speed when handling multiple LVs at once.)

Note You need to log in before you can comment on or make changes to this bug.