Hide Forgot
Created attachment 499378 [details] logs Description of problem: starting VM migration from SPM host to HSM host and blocking SD connectivity during migration causes libvirt to kill the VM. Version-Release number of selected component (if applicable): ic117 vdsm-4.9-65.el6.x86_64 libvirt-0.8.7-18.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. start VM migration 2. block SD connectivity in destination host using iptables 3. Actual results: libvirt will kill VM Expected results: VM should not be killed. Additional info:logs
Created attachment 499380 [details] logs accidental attached wrong logs. correct logs attached now
The logs provided aren't usable since they contain data from many different operations & guests, and the log level is excluding all QEMU driver info. Please edit /etc/libvirt/libvirtd.conf and set the following log_filters="1:qemu 3:event 1:util 1:security" log_outputs="1:file:/var/log/libvirt/libvirtd.log" Do this on both the source and destination hosts used for migration and then rm -f /var/log/libvirt/libvirtd.log service libvirtd restart and execute *1* single migration attempt demonstrating the problem, and then. service libvirtd stop and attach the resulting /var/log/libvirt/libvirtd.log from both source & destination to this bug, so that we have a log with only the information for 1 guest and 1 migration attempt. Also please provide the XML for the guest, and the /var/log/libvirt/qemu/$GUESTNAME.log file from both source and dest hosts.
Created attachment 499818 [details] logs requested
According to those logs everything is working normally. libvirt.log.source shows migration starting, and finally completing without error: 14:45:55.650: 19216: debug : virJSONValueToString:1062 : result={"execute":"query-migrate"} 14:45:56.441: 19212: debug : virJSONValueFromString:933 : string={"return": {"status": "completed"}} libvirt.log.dest shows that the target VM started up and accepted the incoming migration, which completed, resulting in a running guest 14:46:53.579: 21160: debug : qemuMonitorJSONIOProcessLine:116 : Line [{"timestamp": {"seconds": 1305805613, "microseconds": 579538}, "event": "RESUME"}] 14:46:53.579: 21160: debug : virJSONValueFromString:933 : string={"timestamp": {"seconds": 1305805613, "microseconds": 579538}, "event": "RESUME"} Please provide more details of where the actual problem is ?
> the problem is that a guest migration from 1 host which has connectivity to > storage to a 2ed host which looses connectivity to storage (mid migration) will > cause the guest to shut down instead of the migration to fail. > > 1) we can see in the RHEVM GUI that the guest becomes non-responsive before it > stops - this usually means that the vdsm lost connectivity to libvirt. > 2) the libvirt seem to loose connection to the kvm process and kills the vm. > I spoke to vdsm who said that it looks like a libvirt or kvm issue - but since > libvirt is the one that kills the guest, I thought we should check with you > first. As outlined in comment #5, the logs of libvirtd you provided show no evidence that the VM on the destination host shutdown. The migration completed successfully and the VM is running on the destination. The source VM of course has shutdown, as is normal when migration completes. Please provide updated logs which actually demonstrate the problem you're describing.
So the interesting part of logs (from target machine) is: 10:12:09.386: 13166: debug : virDomainMigrateFinish2:4136 : dconn=0x7f3f94007cf0, dname=omri_xp, cookie=0x7f3f98003bc0, cookielen=333, uri=tcp:blond-vdsg.qa.lab.tlv.redhat.com:49157, flags=3, retcode=0 10:12:09.936: 13165: error : virStorageFileGetMetadataFromFD:832 : cannot read header '/rhev/data-center/91e7a658-5f50-40bc-8ccd-004f8f3de868/6f747221-9351-4fc5-87b6-9294257b7c0b/images/98753b08-b818-4995-94ec-f94197241a7c/a9e00ce5-0658-4c17-a2d8-ae8708479aa2': Input/output error 10:12:09.936: 13165: debug : virDomainFree:2294 : dom=0x7f3f8c09e410, (VM: name=omri_xp, uuid=723a6e62-772f-4a42-9896-7c31cb7a4976), 10:12:09.936: 13166: debug : qemuMonitorStartCPUs:954 : mon=0x7f3f8c0da160 10:12:09.937: 13165: warning : virEventUpdateHandleImpl:139 : Ignoring invalid update watch -1 10:12:09.937: 13161: debug : virConnectClose:1570 : conn=0x7f3f9010e030 10:12:10.087: 13161: debug : qemuMonitorIO:601 : Triggering EOF callback error? 1 10:12:10.087: 13161: debug : qemuHandleMonitorEOF:741 : Received EOF on 0x7f3f8c0a08b0 'omri_xp' 10:12:10.087: 13161: debug : qemudShutdownVMDaemon:3460 : Shutting down VM 'omri_xp' pid=31721 migrated=0 10:12:10.091: 13166: error : qemuMonitorJSONCommandWithFd:243 : cannot send monitor command '{"execute":"cont"}': Connection reset by peer 10:12:10.094: 13161: error : qemudShutdownVMDaemon:3517 : Failed to send SIGTERM to omri_xp (31721): No such process So while target libvirtd is in the Finish2 phase (which means the domain is already destroyed on the source) and tries to resume it, the qemu process is no longer there so it fails. This is fixed by migration v3 protocol which ensures that the domain on the source is not killed until target confirmed that the domain was successfully resumed there. The reason why qemu process vanished can be seen in /var/log/libvirt/qemu/omri_xp.log: qemu: could not open disk image /rhev/data-center/91e7a658-5f50-40bc-8ccd-004f8f3de868/6f747221-9351-4fc5-87b6-9294257b7c0b/images/98753b08-b818-4995-94ec-f94197241a7c/a9e00ce5-0658-4c17-a2d8-ae8708479aa2: Input/output error qemu: re-open of /rhev/data-center/91e7a658-5f50-40bc-8ccd-004f8f3de868/6f747221-9351-4fc5-87b6-9294257b7c0b/images/98753b08-b818-4995-94ec-f94197241a7c/a9e00ce5-0658-4c17-a2d8-ae8708479aa2 failed wth error -1 reopening of drives failed 2011-05-24 10:12:10.088: shutting down
*** Bug 707164 has been marked as a duplicate of this bug. ***
This will be fixed by rebasing libvirt to 0.9.2 since it contains migration v3 patches. I won't close this bug as a duplicate of migration v3 BZ so that this can be tested and verified as fixed separately.
This should be fixed by the libvirt-0.9.2-1.el6 rebase
verify pass on kernel-2.6.32-156.el6.x86_64 libvirt-0.9.2-1.el6.x86_64 qemu-kvm-0.12.1.2-2.165.el6.x86_64 steps: 1. prepare a individual nfs server which is not source or dest host 2. mount nfs on both source and dest host 3. start a guest which storage is located on shared dir 4. "setenforce 1" && "setsebool virt_use_nfs 1" on both sides 5. do migration on source # virsh migrate --live vr-rhel6-i386-kvm qemu+ssh://10.66.83.175/system 6. at the same time, on dest, do # iptables -A OUTPUT -d {nfs_ip} -p tcp --dport 2049 -j DROP 7. on source host, see the guest status on libvirt-0.9.1-1.el6.x86_64, the guest will be shutoff but migration still not finished on libvirt-0.9.2-1.el6.x86_64, the guest will be always on running status
Set it as VERIFIED per comment12
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1513.html