engine: engine reports LSM successful which allows us to put the src domain in maintenance while vm keeps writing to it, after ovirt-engine restart the mapping changes back to src due LSM reported as failed and mapping is changed back
Product:
Red Hat Enterprise Virtualization Manager
Reporter:
Dafna Ron <dron>
Component:
ovirt-engine
Assignee:
Nobody's working on this, feel free to take it <nobody>
Created attachment 753865[details]
logs
Description of problem:
LSM which failed in the DeleteImageGroupCommand step is reported as successful but after I restarted engine to clear ArrayOutOfBound error, engine detects LSM as failed and rolls back.
the vm's mapping in db is also rolled back, in vdsm however, the mapping has changed and we are writing to the target domain.
I put the domain in maintenance and the vm keeps writing to a domain which is not on-line.
I'm opening this bug for engine because the mapping issue should not happen.
Version-Release number of selected component (if applicable):
sf17.2
How reproducible:
100%
Steps to Reproduce:
1. in iscsi storage with two domain and two hosts, create and run a vm from template on the hsm.
2. live storage migrate the vm disk
3. when engine prints DeleteImage in the log block connectivity to the domains from the spm.
4. after the host becomes non-operational we can see that the vm disk is pointing to the target domain, put the src domain in maintenance.
5. restart ovirt-engine service
Actual results:
before the ovirt-engine restart the disk is shown on the target domain and no failure in the LSM is reported.
after the restart the mapping changes to the src domain.
vm pid is showing the dst domain as the drive file.
you can put the domain in maintenance which means that we are writing to a domain which no longer exists and all data written to it will be lost if we merge.
Expected results:
if engine rolls back than update should be sent to vdsm.
Additional info:
LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert
1683b62f-973f-44dd-89d2-31df8d80833a 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 3.00g
6282217b-4ef1-4c1e-9586-19154b41fb50 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 1.00g
f82a0d58-0791-4137-b1e6-22a8794acd2a 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 2.00g
ids 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 128.00m
inbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m
leases 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 2.00g
master 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 1.00g
metadata 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 512.00m
outbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m
lv_root vg0 -wi-ao--- 457.71g
lv_swap vg0 -wi-ao--- 7.85g
[root@cougar02 ~]# ps -elf |grep 25415
6 S qemu 25415 1 3 80 0 - 269437 poll_s 13:39 ? 00:00:32 /usr/libexec/qemu-kvm -name testtt -S -M rhel6.4.0 -cpu Opteron_G3 -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -uuid 098ef05d-c346-4006-98cf-0f0371c4a82a -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6Server-6.4.0.4.el6,serial=cfeccbf6-77c5-46b5-9367-7386e6a08831,uuid=098ef05d-c346-4006-98cf-0f0371c4a82a -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/testtt.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2013-05-28T10:39:05,driftfix=slew -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/rhev/data-center/7fd33b43-a9f4-4eb7-a885-e9583a929ceb/81ef11d0-4c0c-47b4-8953-d61a6af442d8/images/a567b2f9-9f19-4302-83e6-ec7de7d7734a/6282217b-4ef1-4c1e-9586-19154b41fb50,if=none,id=drive-ide0-0-0,format=qcow2,serial=a567b2f9-9f19-4302-83e6-ec7de7d7734a,cache=none,werror=stop,rerror=stop,aio=native -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:23:a1:1e,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/testtt.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/testtt.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5900,tls-port=5901,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0
1 S root 25417 2 0 80 0 - 0 vhost_ 13:39 ? 00:00:00 [vhost-25415]
0 S root 29130 13837 0 80 0 - 25811 pipe_w 13:53 pts/2 00:00:00 grep 25415
[root@cougar02 ~]#
before ovirt-engine restart:
2013-05-28 13:47:06,165 INFO [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-48) [11ca4bc1] Ending command successfully: org.ovirt.engine.core.bll.lsm.Li
veMigrateDiskCommand
After ovirt-engine restart:
2013-05-28 13:49:12,237 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-20) [11ca4bc1] Command org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Drive image file %s could not be found
to add to this bug, I had a vm running and I kept writing to it during the time the domain was in maintenance.
after I powered down the vm and started the domain which was in maintenance, I merged the snapshot and all the data that I wrote on the vm was gone.
so this scenario can cause user's data loss.
Changes to the recovery flow in 3.3 should have handled this issue too.
Nonetheless, I tried to reproduce and was unable to.
Moving to ON_QA to verify.
Created attachment 753865 [details] logs Description of problem: LSM which failed in the DeleteImageGroupCommand step is reported as successful but after I restarted engine to clear ArrayOutOfBound error, engine detects LSM as failed and rolls back. the vm's mapping in db is also rolled back, in vdsm however, the mapping has changed and we are writing to the target domain. I put the domain in maintenance and the vm keeps writing to a domain which is not on-line. I'm opening this bug for engine because the mapping issue should not happen. Version-Release number of selected component (if applicable): sf17.2 How reproducible: 100% Steps to Reproduce: 1. in iscsi storage with two domain and two hosts, create and run a vm from template on the hsm. 2. live storage migrate the vm disk 3. when engine prints DeleteImage in the log block connectivity to the domains from the spm. 4. after the host becomes non-operational we can see that the vm disk is pointing to the target domain, put the src domain in maintenance. 5. restart ovirt-engine service Actual results: before the ovirt-engine restart the disk is shown on the target domain and no failure in the LSM is reported. after the restart the mapping changes to the src domain. vm pid is showing the dst domain as the drive file. you can put the domain in maintenance which means that we are writing to a domain which no longer exists and all data written to it will be lost if we merge. Expected results: if engine rolls back than update should be sent to vdsm. Additional info: LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert 1683b62f-973f-44dd-89d2-31df8d80833a 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 3.00g 6282217b-4ef1-4c1e-9586-19154b41fb50 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 1.00g f82a0d58-0791-4137-b1e6-22a8794acd2a 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 2.00g ids 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 128.00m inbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m leases 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 2.00g master 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 1.00g metadata 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 512.00m outbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m lv_root vg0 -wi-ao--- 457.71g lv_swap vg0 -wi-ao--- 7.85g [root@cougar02 ~]# ps -elf |grep 25415 6 S qemu 25415 1 3 80 0 - 269437 poll_s 13:39 ? 00:00:32 /usr/libexec/qemu-kvm -name testtt -S -M rhel6.4.0 -cpu Opteron_G3 -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -uuid 098ef05d-c346-4006-98cf-0f0371c4a82a -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6Server-6.4.0.4.el6,serial=cfeccbf6-77c5-46b5-9367-7386e6a08831,uuid=098ef05d-c346-4006-98cf-0f0371c4a82a -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/testtt.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2013-05-28T10:39:05,driftfix=slew -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/rhev/data-center/7fd33b43-a9f4-4eb7-a885-e9583a929ceb/81ef11d0-4c0c-47b4-8953-d61a6af442d8/images/a567b2f9-9f19-4302-83e6-ec7de7d7734a/6282217b-4ef1-4c1e-9586-19154b41fb50,if=none,id=drive-ide0-0-0,format=qcow2,serial=a567b2f9-9f19-4302-83e6-ec7de7d7734a,cache=none,werror=stop,rerror=stop,aio=native -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:23:a1:1e,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/testtt.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/testtt.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5900,tls-port=5901,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 1 S root 25417 2 0 80 0 - 0 vhost_ 13:39 ? 00:00:00 [vhost-25415] 0 S root 29130 13837 0 80 0 - 25811 pipe_w 13:53 pts/2 00:00:00 grep 25415 [root@cougar02 ~]# before ovirt-engine restart: 2013-05-28 13:47:06,165 INFO [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-48) [11ca4bc1] Ending command successfully: org.ovirt.engine.core.bll.lsm.Li veMigrateDiskCommand After ovirt-engine restart: 2013-05-28 13:49:12,237 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-20) [11ca4bc1] Command org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Drive image file %s could not be found