| Summary: | Libvirt: libvirt kills VM when SD connectivity is blocked on destination during VM migration | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Dafna Ron <dron> | ||||||||
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | urgent | ||||||||||
| Version: | 6.1 | CC: | ajia, berrange, dallan, danken, dyuan, gren, mgoldboi, mzhan, ohochman, rwu, syeghiay, weizhan | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | libvirt-0.9.2-1.el6 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2011-12-06 11:09:21 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Attachments: |
|
||||||||||
Created attachment 499380 [details]
logs
accidental attached wrong logs.
correct logs attached now
The logs provided aren't usable since they contain data from many different operations & guests, and the log level is excluding all QEMU driver info. Please edit /etc/libvirt/libvirtd.conf and set the following log_filters="1:qemu 3:event 1:util 1:security" log_outputs="1:file:/var/log/libvirt/libvirtd.log" Do this on both the source and destination hosts used for migration and then rm -f /var/log/libvirt/libvirtd.log service libvirtd restart and execute *1* single migration attempt demonstrating the problem, and then. service libvirtd stop and attach the resulting /var/log/libvirt/libvirtd.log from both source & destination to this bug, so that we have a log with only the information for 1 guest and 1 migration attempt. Also please provide the XML for the guest, and the /var/log/libvirt/qemu/$GUESTNAME.log file from both source and dest hosts. Created attachment 499818 [details]
logs requested
According to those logs everything is working normally.
libvirt.log.source shows migration starting, and finally completing without error:
14:45:55.650: 19216: debug : virJSONValueToString:1062 : result={"execute":"query-migrate"}
14:45:56.441: 19212: debug : virJSONValueFromString:933 : string={"return": {"status": "completed"}}
libvirt.log.dest shows that the target VM started up and accepted the incoming migration, which completed, resulting in a running guest
14:46:53.579: 21160: debug : qemuMonitorJSONIOProcessLine:116 : Line [{"timestamp": {"seconds": 1305805613, "microseconds": 579538}, "event": "RESUME"}]
14:46:53.579: 21160: debug : virJSONValueFromString:933 : string={"timestamp": {"seconds": 1305805613, "microseconds": 579538}, "event": "RESUME"}
Please provide more details of where the actual problem is ?
> the problem is that a guest migration from 1 host which has connectivity to > storage to a 2ed host which looses connectivity to storage (mid migration) will > cause the guest to shut down instead of the migration to fail. > > 1) we can see in the RHEVM GUI that the guest becomes non-responsive before it > stops - this usually means that the vdsm lost connectivity to libvirt. > 2) the libvirt seem to loose connection to the kvm process and kills the vm. > I spoke to vdsm who said that it looks like a libvirt or kvm issue - but since > libvirt is the one that kills the guest, I thought we should check with you > first. As outlined in comment #5, the logs of libvirtd you provided show no evidence that the VM on the destination host shutdown. The migration completed successfully and the VM is running on the destination. The source VM of course has shutdown, as is normal when migration completes. Please provide updated logs which actually demonstrate the problem you're describing. So the interesting part of logs (from target machine) is:
10:12:09.386: 13166: debug : virDomainMigrateFinish2:4136 :
dconn=0x7f3f94007cf0, dname=omri_xp, cookie=0x7f3f98003bc0,
cookielen=333, uri=tcp:blond-vdsg.qa.lab.tlv.redhat.com:49157,
flags=3, retcode=0
10:12:09.936: 13165: error : virStorageFileGetMetadataFromFD:832 :
cannot read header '/rhev/data-center/91e7a658-5f50-40bc-8ccd-004f8f3de868/6f747221-9351-4fc5-87b6-9294257b7c0b/images/98753b08-b818-4995-94ec-f94197241a7c/a9e00ce5-0658-4c17-a2d8-ae8708479aa2': Input/output error
10:12:09.936: 13165: debug : virDomainFree:2294 :
dom=0x7f3f8c09e410, (VM: name=omri_xp,
uuid=723a6e62-772f-4a42-9896-7c31cb7a4976),
10:12:09.936: 13166: debug : qemuMonitorStartCPUs:954 : mon=0x7f3f8c0da160
10:12:09.937: 13165: warning : virEventUpdateHandleImpl:139 :
Ignoring invalid update watch -1
10:12:09.937: 13161: debug : virConnectClose:1570 : conn=0x7f3f9010e030
10:12:10.087: 13161: debug : qemuMonitorIO:601 :
Triggering EOF callback error? 1
10:12:10.087: 13161: debug : qemuHandleMonitorEOF:741 :
Received EOF on 0x7f3f8c0a08b0 'omri_xp'
10:12:10.087: 13161: debug : qemudShutdownVMDaemon:3460 :
Shutting down VM 'omri_xp' pid=31721 migrated=0
10:12:10.091: 13166: error : qemuMonitorJSONCommandWithFd:243 :
cannot send monitor command '{"execute":"cont"}': Connection reset by peer
10:12:10.094: 13161: error : qemudShutdownVMDaemon:3517 :
Failed to send SIGTERM to omri_xp (31721): No such process
So while target libvirtd is in the Finish2 phase (which means the domain is
already destroyed on the source) and tries to resume it, the qemu process is
no longer there so it fails. This is fixed by migration v3 protocol which
ensures that the domain on the source is not killed until target confirmed
that the domain was successfully resumed there.
The reason why qemu process vanished can be seen in
/var/log/libvirt/qemu/omri_xp.log:
qemu: could not open disk image
/rhev/data-center/91e7a658-5f50-40bc-8ccd-004f8f3de868/6f747221-9351-4fc5-87b6-9294257b7c0b/images/98753b08-b818-4995-94ec-f94197241a7c/a9e00ce5-0658-4c17-a2d8-ae8708479aa2:
Input/output error
qemu: re-open of
/rhev/data-center/91e7a658-5f50-40bc-8ccd-004f8f3de868/6f747221-9351-4fc5-87b6-9294257b7c0b/images/98753b08-b818-4995-94ec-f94197241a7c/a9e00ce5-0658-4c17-a2d8-ae8708479aa2
failed wth error -1
reopening of drives failed
2011-05-24 10:12:10.088: shutting down
*** Bug 707164 has been marked as a duplicate of this bug. *** This will be fixed by rebasing libvirt to 0.9.2 since it contains migration v3 patches. I won't close this bug as a duplicate of migration v3 BZ so that this can be tested and verified as fixed separately. This should be fixed by the libvirt-0.9.2-1.el6 rebase verify pass on
kernel-2.6.32-156.el6.x86_64
libvirt-0.9.2-1.el6.x86_64
qemu-kvm-0.12.1.2-2.165.el6.x86_64
steps:
1. prepare a individual nfs server which is not source or dest host
2. mount nfs on both source and dest host
3. start a guest which storage is located on shared dir
4. "setenforce 1" && "setsebool virt_use_nfs 1" on both sides
5. do migration on source
# virsh migrate --live vr-rhel6-i386-kvm qemu+ssh://10.66.83.175/system
6. at the same time, on dest, do
# iptables -A OUTPUT -d {nfs_ip} -p tcp --dport 2049 -j DROP
7. on source host, see the guest status
on libvirt-0.9.1-1.el6.x86_64, the guest will be shutoff but migration still not finished
on libvirt-0.9.2-1.el6.x86_64, the guest will be always on running status
Set it as VERIFIED per comment12 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1513.html |
Created attachment 499378 [details] logs Description of problem: starting VM migration from SPM host to HSM host and blocking SD connectivity during migration causes libvirt to kill the VM. Version-Release number of selected component (if applicable): ic117 vdsm-4.9-65.el6.x86_64 libvirt-0.8.7-18.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. start VM migration 2. block SD connectivity in destination host using iptables 3. Actual results: libvirt will kill VM Expected results: VM should not be killed. Additional info:logs