Bug 809577 - Memory leak in vgremove
Memory leak in vgremove
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.3
x86_64 Linux
high Severity unspecified
: rc
: ---
Assigned To: Zdenek Kabelac
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-03 12:58 EDT by Nenad Peric
Modified: 2012-04-20 12:45 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-04-20 08:24:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Nenad Peric 2012-04-03 12:58:58 EDT
Description of problem:

While doing a removal of VG with many LVs, vgremove works very slowly and eats up RAM.


Version-Release number of selected component (if applicable):

  cmirror-2.02.96-0.92.el6                                                    
  device-mapper-1.02.75-0.92.el6                                              
  device-mapper-event-1.02.75-0.92.el6                                        
  device-mapper-event-libs-1.02.75-0.92.el6                                   
  device-mapper-libs-1.02.75-0.92.el6                                         
  lvm2-2.02.96-0.92.el6                                                       
  lvm2-cluster-2.02.96-0.92.el6                                               
  lvm2-libs-2.02.96-0.92.el6           



How reproducible:

When having a large amount of LVs, every time. 

Steps to Reproduce:
1. Created VG and a lot of LVs or snapshots inside:

SCENARIO - [many_snaps]
Create 500 snapshots of an origin volume
Recreating VG and PVs to increase metadata size
Making origin volume
Making 500 snapshots of origin volume

Created 350 snapshots before we ran out of space in VG on the system I was testing on. 

2. Remove the VG with -ff

vgremove -ff snapper

  
Actual results:

The vgremove process is very slow and the memory increase is substantial:

Logical volume "500_36" successfully removed


Cpu(s): 15.0%us, 68.0%sy,  0.0%ni,  0.9%id,  3.4%wa,  0.0%hi, 12.7%si,  0.0%st
Mem:   5861712k total,  2642804k used,  3218908k free,   143680k buffers
Swap:  2064376k total,        0k used,  2064376k free,  1073084k cached

  PID PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                    
20123  2 -18  735m 623m 3512 S 37.6 10.9   2:19.64 vgremove                   
 1371  0 -20     0    0    0 S  6.7  0.0   6:08.84 iscsi_q_9                  
 3811  2 -18  705m 187m  97m S  6.5  3.3   2:43.24 dmeventd       
.
.
.
.
.

 Logical volume "500_210" successfully removed

Cpu(s): 11.9%us, 34.0%sy,  0.0%ni, 12.3%id, 33.4%wa,  0.0%hi,  8.5%si,  0.0%st
Mem:   5861712k total,  4431296k used,  1430416k free,   143740k buffers
Swap:  2064376k total,        0k used,  2064376k free,  1108032k cached

  PID PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                   
20123  2 -18 2602m 2.4g 3512 S 14.3 43.5   7:02.78 vgremove                  
 1371  0 -20     0    0    0 S  6.0  0.0   7:24.99 iscsi_q_9                 
 3811  2 -18  748m 225m  97m S  1.4  3.9   3:16.58 dmeventd                  

It kept increasing till it removed all the LVs. 

Luckily 6GB of ram was enough to remove the 350 LVs which were created previously. 


Expected results:

Be less hungry.
Comment 2 Nenad Peric 2012-04-03 13:44:38 EDT
Reproduced with:

lvm2-libs-2.02.95-3.el6.x86_64
lvm2-cluster-2.02.95-3.el6.x86_64
lvm2-2.02.95-3.el6.x86_64
cmirror-2.02.95-3.el6.x86_64
device-mapper-1.02.74-3.el6.x86_64
device-mapper-libs-1.02.74-3.el6.x86_64
device-mapper-event-1.02.74-3.el6.x86_64
device-mapper-event-libs-1.02.74-3.el6.x86_64

Created only 200 snaps of origin this time.
tried deleting the VG with 

vgremove -ff snapper

Did go a bit faster but the memory consumption was still increasing with every
removal.

Here is the report around mid-way:

Logical volume "500_130" successfully removed

Cpu(s): 15.5%us, 45.5%sy,  0.0%ni,  6.1%id, 21.9%wa,  0.0%hi, 11.1%si,  0.0%st
Mem:   5861712k total,  1763340k used,  4098372k free,   142620k buffers
Swap:  2064376k total,        0k used,  2064376k free,   231816k cached

PID   PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                   
10547  2 -18 1032m 920m 3512 S 15.9 16.1   1:33.69 vgremove                  
 1324  0 -20     0    0    0 S  7.6  0.0   1:18.31 iscsi_q_8                 
 2874  2 -18  683m 151m  97m S  2.0  2.6   0:44.83 dmeventd
Comment 3 Zdenek Kabelac 2012-04-03 15:05:42 EDT
The problem could be probably related to assumption that using something like 200 old snapshots is something 'well' supported by lvm2 - but in fact that is only theoretically usable - the table construction related to process such beast is rather very ugly and as such there was no time spent to optimize this rather not really usable case - I think even 20 snaps of the same origin are well beyond any practical use if old-style snaps are used for this.

Another issue is to optimize removal of more devices at once - this is something considered for 6.4.

For now there is every device removed uniquely - which is very slow if there are hundred or even thousands devices - and it's extremely slow for old snapshots.
And also quite annoying in case we want to drop i.e. whole thin pool - which should ideally deactivate all thin volumes and remove all entries from metadata - but for now there will be a large set of this writes and table updates.
Comment 4 Milan Broz 2012-04-20 08:20:50 EDT
So it is not leak, it is just extreme case which will not work anyway with the old snapshot implementation (or will be terribly slow).
Comment 5 RHEL Product and Program Management 2012-04-20 08:24:52 EDT
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.
Comment 6 Alasdair Kergon 2012-04-20 09:41:58 EDT
This bugzilla is just confirming known limitations of the tools.
Both problems are already being tracked and solved elsewhere.  (Multiple snaps, now using thin provisioning; improved tool speed when handling multiple LVs at once.)

Note You need to log in before you can comment on or make changes to this bug.