Description of problem: flow: live storage migration of 100g disk (migration back to original domain) affect on user: disk is multiplied as snapshot is not removed repro steps: 1) run VM 2) install O.S 3) migrate disk from source to dest 4) migrate disk from dest to source 5) check disk size - 202g since disk is basically LV, I checked LV size and noticed 2 LVs: [root@puma31 tmp]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert 00d2102e-fd5e-4dd1-adad-0a6b2bf3949f 08221c4d-f68b-4085-ab47-9c1d4b1b70d0 -wi------ 100.00g 2bfb1a73-4546-40f6-a5cc-17763e4c159f 08221c4d-f68b-4085-ab47-9c1d4b1b70d0 -wi------ 102.00g followed live migration thread on vdsm and noticed an lvremove command issues for the specific lv: Thread-1592::DEBUG::2013-03-20 21:09:24,396::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm lvremove --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 w rite_cache_state=0 disable_after_error_count=3 filter = [ \'a%3514f0c542a000602|3514f0c542a000603|3514f0c542a000604|3514f0c542a000605|3514f0c542a000606|3514f0c542a000607|3514f0c542a000608|3514f0c542a000609|3514f0c 542a00060a|3514f0c542a00060b%\', \'r%.*%\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " -f --autobackup n 4b14d3b3-7da4-4ed5-88d0-bfb3 1f3b8df7/2bfb1a73-4546-40f6-a5cc-17763e4c159f 4b14d3b3-7da4-4ed5-88d0-bfb31f3b8df7/00d2102e-fd5e-4dd1-adad-0a6b2bf3949f' (cwd None) Thread-1592::DEBUG::2013-03-20 21:09:24,632::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ' WARNING: This metadata update is NOT backed up\n WARNING: This metadata update is NOT backed up\n'; <rc> = 0 from some unknown reason, lv was not removed. note that during storage live migration, VM was hibernated and restored.
Created attachment 713446 [details] engine, vdsm and lvm
(In reply to comment #0) > Description of problem: > > flow: live storage migration of 100g disk (migration back to original domain) > > affect on user: disk is multiplied as snapshot is not removed > > repro steps: > > 1) run VM > 2) install O.S > 3) migrate disk from source to dest > 4) migrate disk from dest to source > 5) check disk size - 202g > > since disk is basically LV, I checked LV size and noticed 2 LVs: > > [root@puma31 tmp]# lvs > LV VG > Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert > 00d2102e-fd5e-4dd1-adad-0a6b2bf3949f 08221c4d-f68b-4085-ab47-9c1d4b1b70d0 > -wi------ 100.00g > 2bfb1a73-4546-40f6-a5cc-17763e4c159f 08221c4d-f68b-4085-ab47-9c1d4b1b70d0 > -wi------ 102.00g > > followed live migration thread on vdsm and noticed an lvremove command > issues for the specific lv: > > Thread-1592::DEBUG::2013-03-20 > 21:09:24,396::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n > /sbin/lvm lvremove --config " devices { preferred_names = > [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 w > rite_cache_state=0 disable_after_error_count=3 filter = [ > \'a%3514f0c542a000602|3514f0c542a000603|3514f0c542a000604|3514f0c542a000605|3 > 514f0c542a000606|3514f0c542a000607|3514f0c542a000608|3514f0c542a000609|3514f0 > c > 542a00060a|3514f0c542a00060b%\', \'r%.*%\' ] } global { locking_type=1 > prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 > retain_days = 0 } " -f --autobackup n 4b14d3b3-7da4-4ed5-88d0-bfb3 > 1f3b8df7/2bfb1a73-4546-40f6-a5cc-17763e4c159f > 4b14d3b3-7da4-4ed5-88d0-bfb31f3b8df7/00d2102e-fd5e-4dd1-adad-0a6b2bf3949f' 1. This is for the 2 LVs on the source VG (4b14d3b3-7da4-4ed5-88d0-bfb31f3b8df7) while the lvs you list above are on the destination VG (08221c4d-f68b-4085-ab47-9c1d4b1b70d0) so I'm not sure what you're saying didn't work here. 2. From the 2 lvs listed I'm guessing the first one is the raw disk and the second one is the cow layer created for the migration? Need to check why the cow layer is so large. Fede? > (cwd None) > Thread-1592::DEBUG::2013-03-20 > 21:09:24,632::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ' > WARNING: This metadata update is NOT backed up\n WARNING: This metadata > update is NOT backed up\n'; <rc> > = 0 > > from some unknown reason, lv was not removed. > > note that during storage live migration, VM was hibernated and restored.
(In reply to comment #2) > (In reply to comment #0) > > Description of problem: > > > > flow: live storage migration of 100g disk (migration back to original domain) > > > > affect on user: disk is multiplied as snapshot is not removed > > > > repro steps: > > > > 1) run VM > > 2) install O.S > > 3) migrate disk from source to dest > > 4) migrate disk from dest to source > > 5) check disk size - 202g > > > > since disk is basically LV, I checked LV size and noticed 2 LVs: > > > > [root@puma31 tmp]# lvs > > LV VG > > Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert > > 00d2102e-fd5e-4dd1-adad-0a6b2bf3949f 08221c4d-f68b-4085-ab47-9c1d4b1b70d0 > > -wi------ 100.00g > > 2bfb1a73-4546-40f6-a5cc-17763e4c159f 08221c4d-f68b-4085-ab47-9c1d4b1b70d0 > > -wi------ 102.00g > > > > followed live migration thread on vdsm and noticed an lvremove command > > issues for the specific lv: > > > > Thread-1592::DEBUG::2013-03-20 > > 21:09:24,396::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n > > /sbin/lvm lvremove --config " devices { preferred_names = > > [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 w > > rite_cache_state=0 disable_after_error_count=3 filter = [ > > \'a%3514f0c542a000602|3514f0c542a000603|3514f0c542a000604|3514f0c542a000605|3 > > 514f0c542a000606|3514f0c542a000607|3514f0c542a000608|3514f0c542a000609|3514f0 > > c > > 542a00060a|3514f0c542a00060b%\', \'r%.*%\' ] } global { locking_type=1 > > prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 > > retain_days = 0 } " -f --autobackup n 4b14d3b3-7da4-4ed5-88d0-bfb3 > > 1f3b8df7/2bfb1a73-4546-40f6-a5cc-17763e4c159f > > 4b14d3b3-7da4-4ed5-88d0-bfb31f3b8df7/00d2102e-fd5e-4dd1-adad-0a6b2bf3949f' > > 1. This is for the 2 LVs on the source VG > (4b14d3b3-7da4-4ed5-88d0-bfb31f3b8df7) while the lvs you list above are on > the destination VG (08221c4d-f68b-4085-ab47-9c1d4b1b70d0) so I'm not sure > what you're saying didn't work here. > 2. From the 2 lvs listed I'm guessing the first one is the raw disk and the > second one is the cow layer created for the migration? Need to check why > the cow layer is so large. Fede? It looks like there's a race with the volume statistics. Basically if a volume extension request arrives when the VmStatsThread didn't update the apparentsize then we request an extension on top of the original raw volume size. Note the apparentsize reported (107374182400 == 100Gb) and the extension request (104448 == 102Gb): Thread-13987::DEBUG::2013-03-20 18:56:29,992::BindingXMLRPC::913::vds::(wrapper) client [10.35.161.6]::call vmSnapshot with ('9cd58a26-9188-4894-a7ae-0c8ea767e043', [{'baseVolumeID': '00d2102e-fd5e-4dd1-adad-0a6b2bf3949f', 'domainID': '4b14d3b3-7da4-4ed5-88d0-bfb31f3b8df7', 'volumeID': '2bfb1a73-4546-40f6-a5cc-17763e4c159f', 'imageID': '195c2ec0-b5bb-4278-9978-f66de0ba2063'}]) {} ... Thread-13978::DEBUG::2013-03-20 18:56:06,271::BindingXMLRPC::920::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Up', 'username': 'Unknown', 'memUsage': '0', 'acpiEnable': 'true', 'pid': '14274', 'displayIp': '0', 'displayPort': u'5900', 'session': 'Unknown', 'displaySecurePort': u'5901', 'timeOffset': '-43200', 'hash': '-3545512217363422887', 'balloonInfo': {'balloon_max': 4194304, 'balloon_cur': 4194304}, 'pauseCode': 'NOERR', 'clientIp': '10.35.200.31', 'kvmEnable': 'true', 'network': {u'vnet0': {'macAddr': '00:1a:4a:23:a1:1e', 'rxDropped': '0', 'rxErrors': '0', 'txDropped': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'unknown', 'speed': '1000', 'name': u'vnet0'}}, 'vmId': '9cd58a26-9188-4894-a7ae-0c8ea767e043', 'displayType': 'qxl', 'cpuUser': '43.22', 'disks': {u'vda': {'readLatency': '243008', 'apparentsize': '107374182400', 'writeLatency': '73893423', 'imageID': '195c2ec0-b5bb-4278-9978-f66de0ba2063', 'flushLatency': '100700', 'readRate': '176523.01', 'truesize': '107374182400', 'writeRate': '13688918.23'}, u'hdc': {'readLatency': '6835', 'apparentsize': '0', 'writeLatency': '0', 'flushLatency': '0', 'readRate': '7.24', 'truesize': '0', 'writeRate': '0.00'}}, 'monitorResponse': '0', 'statsAge': '0.54', 'elapsedTime': '61', 'vmType': 'kvm', 'cpuSys': '6.82', 'appsList': [], 'guestIPs': ''}]} VM Channels Listener::DEBUG::2013-03-20 18:56:06,431::vmChannels::60::vds::(_handle_timeouts) Timeout on fileno 17. ... Thread-13987::DEBUG::2013-03-20 18:56:40,273::BindingXMLRPC::920::vds::(wrapper) return vmSnapshot with {'status': {'message': 'Done', 'code': 0}, 'quiesce': False} ... Thread-13998::DEBUG::2013-03-20 18:56:50,528::vm::830::vm.Vm::(extendDriveVolume) vmId=`9cd58a26-9188-4894-a7ae-0c8ea767e043`::Requesting an extension for the volume replication: {'newSize': 104448, 'domainID': '08221c4d-f68b-4085-ab47-9c1d4b1b70d0', 'name': u'vda', 'volumeID': '2bfb1a73-4546-40f6-a5cc-17763e4c159f', 'imageID': '195c2ec0-b5bb-4278-9978-f66de0ba2063', 'poolID': 'a31dbbd1-d491-4420-9747-85bbbe15cc25'}
verified on sf16 with vdsm-4.10.2-18.0.el6ev.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0886.html