Hide Forgot
Description of problem: While doing a removal of VG with many LVs, vgremove works very slowly and eats up RAM. Version-Release number of selected component (if applicable): cmirror-2.02.96-0.92.el6 device-mapper-1.02.75-0.92.el6 device-mapper-event-1.02.75-0.92.el6 device-mapper-event-libs-1.02.75-0.92.el6 device-mapper-libs-1.02.75-0.92.el6 lvm2-2.02.96-0.92.el6 lvm2-cluster-2.02.96-0.92.el6 lvm2-libs-2.02.96-0.92.el6 How reproducible: When having a large amount of LVs, every time. Steps to Reproduce: 1. Created VG and a lot of LVs or snapshots inside: SCENARIO - [many_snaps] Create 500 snapshots of an origin volume Recreating VG and PVs to increase metadata size Making origin volume Making 500 snapshots of origin volume Created 350 snapshots before we ran out of space in VG on the system I was testing on. 2. Remove the VG with -ff vgremove -ff snapper Actual results: The vgremove process is very slow and the memory increase is substantial: Logical volume "500_36" successfully removed Cpu(s): 15.0%us, 68.0%sy, 0.0%ni, 0.9%id, 3.4%wa, 0.0%hi, 12.7%si, 0.0%st Mem: 5861712k total, 2642804k used, 3218908k free, 143680k buffers Swap: 2064376k total, 0k used, 2064376k free, 1073084k cached PID PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20123 2 -18 735m 623m 3512 S 37.6 10.9 2:19.64 vgremove 1371 0 -20 0 0 0 S 6.7 0.0 6:08.84 iscsi_q_9 3811 2 -18 705m 187m 97m S 6.5 3.3 2:43.24 dmeventd . . . . . Logical volume "500_210" successfully removed Cpu(s): 11.9%us, 34.0%sy, 0.0%ni, 12.3%id, 33.4%wa, 0.0%hi, 8.5%si, 0.0%st Mem: 5861712k total, 4431296k used, 1430416k free, 143740k buffers Swap: 2064376k total, 0k used, 2064376k free, 1108032k cached PID PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20123 2 -18 2602m 2.4g 3512 S 14.3 43.5 7:02.78 vgremove 1371 0 -20 0 0 0 S 6.0 0.0 7:24.99 iscsi_q_9 3811 2 -18 748m 225m 97m S 1.4 3.9 3:16.58 dmeventd It kept increasing till it removed all the LVs. Luckily 6GB of ram was enough to remove the 350 LVs which were created previously. Expected results: Be less hungry.
Reproduced with: lvm2-libs-2.02.95-3.el6.x86_64 lvm2-cluster-2.02.95-3.el6.x86_64 lvm2-2.02.95-3.el6.x86_64 cmirror-2.02.95-3.el6.x86_64 device-mapper-1.02.74-3.el6.x86_64 device-mapper-libs-1.02.74-3.el6.x86_64 device-mapper-event-1.02.74-3.el6.x86_64 device-mapper-event-libs-1.02.74-3.el6.x86_64 Created only 200 snaps of origin this time. tried deleting the VG with vgremove -ff snapper Did go a bit faster but the memory consumption was still increasing with every removal. Here is the report around mid-way: Logical volume "500_130" successfully removed Cpu(s): 15.5%us, 45.5%sy, 0.0%ni, 6.1%id, 21.9%wa, 0.0%hi, 11.1%si, 0.0%st Mem: 5861712k total, 1763340k used, 4098372k free, 142620k buffers Swap: 2064376k total, 0k used, 2064376k free, 231816k cached PID PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10547 2 -18 1032m 920m 3512 S 15.9 16.1 1:33.69 vgremove 1324 0 -20 0 0 0 S 7.6 0.0 1:18.31 iscsi_q_8 2874 2 -18 683m 151m 97m S 2.0 2.6 0:44.83 dmeventd
The problem could be probably related to assumption that using something like 200 old snapshots is something 'well' supported by lvm2 - but in fact that is only theoretically usable - the table construction related to process such beast is rather very ugly and as such there was no time spent to optimize this rather not really usable case - I think even 20 snaps of the same origin are well beyond any practical use if old-style snaps are used for this. Another issue is to optimize removal of more devices at once - this is something considered for 6.4. For now there is every device removed uniquely - which is very slow if there are hundred or even thousands devices - and it's extremely slow for old snapshots. And also quite annoying in case we want to drop i.e. whole thin pool - which should ideally deactivate all thin volumes and remove all entries from metadata - but for now there will be a large set of this writes and table updates.
So it is not leak, it is just extreme case which will not work anyway with the old snapshot implementation (or will be terribly slow).
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.
This bugzilla is just confirming known limitations of the tools. Both problems are already being tracked and solved elsewhere. (Multiple snaps, now using thin provisioning; improved tool speed when handling multiple LVs at once.)