Hide Forgot
Description of problem: During VM installation, migrate from src to dst, before migration finishes, qemu runs into ENOSPAC in src host, installation continues after enlarging the image by 1G, issue 'info migrate' in src monitor, it shows migration failed, launch dst qemu-kvm again, run into ENOSPAC second time in src qemu-kvm when migrating from src to dst again, enlarge image by 20G, issue 'c' and this time qemu claims migration completed but dst qemu-kvm still in paused status and if issue 'c', qemu will claim 'No space left on device (28)'. In fact, image is already more than 20G. Version-Release number of selected component (if applicable): # rpm -qa|grep qemu-kvm qemu-kvm-debuginfo-0.12.1.2-2.148.el6.x86_64 qemu-kvm-0.12.1.2-2.148.el6.x86_64 qemu-kvm-tools-0.12.1.2-2.148.el6.x86_64 # uname -r 2.6.32-118.el6.x86_64 CLI: # /usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 4096 -smp 2 -cpu cpu64-rhel6 -name rhel -uuid `uuidgen` -rtc base=localtime,clock=vm,driftfix=slew -no-kvm-pit-reinjection -boot dc -drive file=/dev/s2/test-enospac,if=none,id=drive-ide0-0-0,media=disk,format=qcow2,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet1,vhost=on -device rtl8139,netdev=hostnet1,id=net1,mac=52:54:40:11:67:13 -usb -device usb-tablet,id=input1 -spice port=8800,disable-ticketing -vga qxl -monitor stdio -balloon none -cdrom /mnt/image/Windows_Server_2008_R2_GA_EN.iso -incoming tcp:0:5890 How reproducible: 2/2 Steps to Reproduce: 1. 1.1 On a src host, create a 512m size space # lvcreate -n test-enospac -L 512m s2 issue lvscan on dst host, 1.2 On dst host, issue # lvscan ACTIVE '/dev/s2/test-enospac' [512.00 MiB] inherit 1.3 create an image in src host: # qemu-img create -f qcow2 test-enospac 20G Formatting 'test-enospac', fmt=qcow2 size=21474836480 encryption=off cluster_size=0 2. Install guest in src host and hit ENOSPAC (qemu) info status VM status: running (qemu) block I/O error in device 'drive-ide0-0-0': No space left on device (28) handle_dev_input: stop 3. 3.1 Extend image by 2G in src host # lvextend -L +2G /dev/s2/test-enospac Found duplicate PV L2iTsxVknfTDmPqMGPXo0ujp6uomRePl: using /dev/sdc not /dev/sdb Extending logical volume test-enospac to 2.50 GiB Logical volume test-enospac successfully resized 3.2 Issue lvscan in dst host: ACTIVE '/dev/s2/test-enospac' [2.50 GiB] inherit 3.3 issue cont in src monitor, continue installation (qemu) c handle_dev_input: start 4. Do migration from src to dst (qemu) migrate -d tcp:xxxx:5880 (qemu) info migrate Migration status: active transferred ram: 116872 kbytes remaining ram: 4208988 kbytes total ram: 4325768 kbytes 5. During migration, hit ENOSPAC again, then enlarge image by 15G in src host and run lvscan in dst host. (qemu) info migrate Migration status: active transferred ram: 2456208 kbytes remaining ram: 2397944 kbytes total ram: 4325768 kbytes (qemu) block I/O error in device 'drive-ide0-0-0': No space left on device (28) handle_dev_input: stop # lvextend -L +15G /dev/s2/test-enospac Found duplicate PV L2iTsxVknfTDmPqMGPXo0ujp6uomRePl: using /dev/sdc not /dev/sdb Extending logical volume test-enospac to 17.50 GiB Logical volume test-enospac successfully resized # lvscan Found duplicate PV L2iTsxVknfTDmPqMGPXo0ujp6uomRePl: using /dev/sdc not /dev/sdb ACTIVE '/dev/s2/test-enospac' [17.50 GiB] inherit 6. Issue c in monitor, info migration, qemu claims migration completed, but in dst host, qemu claims that qemu is paused instead of running. ***In src host***: (qemu) info migrate Migration status: active transferred ram: 2456208 kbytes remaining ram: 2397944 kbytes total ram: 4325768 kbytes (qemu) block I/O error in device 'drive-ide0-0-0': No space left on device (28) handle_dev_input: stop (qemu) c handle_dev_input: start (qemu) info migrate Migration status: completed ***In dst host*** (qemu) handle_dev_destroy_surfaces: handle_dev_input: start block I/O error in device 'drive-ide0-0-0': No space left on device (28) handle_dev_input: stop (qemu) info status VM status: paused (qemu) 7. Issue cont in dst monitor, we get: (qemu) c handle_dev_input: start (qemu) block I/O error in device 'drive-ide0-0-0': No space left on device (28) handle_dev_input: stop Actual results: qemu-kvm in dst host ignores the enlarged size, still claims ENOSPAC. In fact, the image is already 17.5Gb. Expected results: qemu-kvm in dst host can finish installation without claiming ENOSPAC which does not exist actually. Additional info: Please **note**, after step 7 above, if I extend the image again in dst host, then issue c in monitor, installation can be continued, but qcow2 image was corrupted. (qemu) block I/O error in device 'drive-ide0-0-0': No space left on device (28) handle_dev_input: stop (qemu) c handle_dev_input: start (qemu) spice_server_add_interface: SPICE_INTERFACE_TABLET (qemu) info status VM status: running (qemu) # qemu-img check /dev/s2/test-enospac ... ERROR cluster 96880 refcount=0 reference=1 ERROR cluster 96881 refcount=0 reference=1 ERROR cluster 96882 refcount=0 reference=1 ERROR cluster 96883 refcount=0 reference=1 ERROR cluster 96884 refcount=0 reference=1 ERROR cluster 96885 refcount=0 reference=1 ERROR cluster 96886 refcount=0 reference=1 ERROR cluster 96887 refcount=0 reference=1 ERROR cluster 96888 refcount=0 reference=1 ERROR cluster 96889 refcount=0 reference=1 ERROR cluster 96890 refcount=0 reference=1 ... 764 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 13070 leaked clusters were found on the image. This means waste of disk space, but no harm to data.
What will happen if you'll enlarge the disk again (although it is not needed)? It might be iscsi/FC limitation since at the moment the image got enalrge, the dest was already running
Maybe as a temporal work around mgmt shouldn't complete live migration in case we got any storage error in between.
What happens if you do a lvmscan on destination host? Coping keving to see if he has any good idea about how this should work. My guess is that the "continue" command after enlarging the disk should do "something" also on destination.
(In reply to comment #4) > What happens if you do a lvmscan on destination host? > Coping keving to see if he has any good idea about how this should work. > My guess is that the "continue" command after enlarging the disk should do > "something" also on destination. Below is lvscan result on destination host. lvscan ACTIVE '/dev/vgtest1/lvtest1' [200.00 MiB] inherit ACTIVE '/dev/vgtest1/test-enospac' [17.50 GiB] inherit(In reply to comment #2) > What will happen if you'll enlarge the disk again (although it is not needed)? > It might be iscsi/FC limitation since at the moment the image got enalrge, the > dest was already running after enlarged the disk to 17.5G, it will have the same result on destination host. (qemu) c (qemu) handle_dev_input: start block I/O error in device 'drive-ide0-0-0': No space left on device (28) handle_dev_input: stop
It look like an issue with the LVM caching regarding the partition size. (If you run fdisk -l <lv path> on dest the sizeis still 512M). If you run "lvmchange --refresh <LV path> " , and continue the dest vm everything run smoothly. VDSM need to add the call to lvmchange --refresh in the migration dest after lv enlargment.
Was vdsm even involved in this 7-month-old bug? which version? may I see the vdsm.log? Vdsm had a glitch of not calling lvmchange --refresh few months ago, but it was promptly fixed.
Please reopen the bug if it reproduces by modern Vdsm version.