Bug 1410314
Summary: | VM shutoff when migration | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Han Han <hhan> | ||||
Component: | General | Assignee: | Arik <ahadas> | ||||
Status: | CLOSED DEFERRED | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.0.6.3 | CC: | ahadas, bugs, chhu, dyuan, hhan, rbarry, tjelinek, xuzhang, yanqzhan | ||||
Target Milestone: | --- | Keywords: | FutureFeature | ||||
Target Release: | --- | Flags: | tjelinek:
ovirt-future?
rule-engine: planning_ack? rule-engine: devel_ack? rule-engine: testing_ack? |
||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-04-01 14:44:15 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Han Han
2017-01-05 06:05:02 UTC
Missing severity. Where are the libvirt logs? Anything happened to the VM or did it just fail migration and kept alive on the source? Anything on vdsm log, or did you just attach it? What migration policy did you use? Hi Yaniv, First of all, for severity, I think it is medium at least, for it could cause a running VM poweroff and it could be reproduced on ovirt web UI by chance. Originally I noticed the issue when I did VM migration on web UI. So the issue can be hit by our customers, too. To make it reproduced more conveniently, I wrote the python script as comment0. For the libvirt logs, they are too large(about 1.2G) to upload after multiple migrations. And I didn't find errors about migration in the libvirt logs. If you need them, I can offer you the test environment. For what happened to VM, the issue is that VM is poweroff when migration fails. It is totally poweroff, not keeping alive on the source. For vdsm logs, yes, I just attached them. I am not familiar with vdsm features, so I didn't analyze them. Maybe we need the help from RHEV QE :) For migration policy, I use 'Select Host Automatically' both on web UI and script. Maybe I should explain more about what I did about the bug. Originally I just did a simple VM migration on ovirt web UI with 'Select Host Automatically' policy but the VM became poweroff unexpectedly. I tried it again on web UI and found the issue hard to reproduce. So I wrote the script that migrates the VM looply until detecting VM poweroff. Additionally, I deployed my RHEV host with abrt that could catch coredump, but no coredump catched. So I am sure it's not crash in qemu or libvirt, and I tend to think it is problems in vdsm or ovirt. At last, I have to say, it is hard to reproduce. Maybe you will wait the script for half one hour or more to catch the bug :) Hi, Can reproduce this bug on rhevm4.1. Rhevm-4.1.0-0.3.beta2.el7.noarch Host: Libvirt-2.5.0-1.el7.x86_64 Qemu-kvm-rhev-2.8.0-1.el7.x86_64 vdsm-4.19.1-1.el7ev.x86_64 The phenomenon is a little different, there is a shutdown status when migrating, but it will automatically be running again immediately. Steps to Reproduce: 1. Prepare 2 hosts(yan-A, yan-B) in one datacenter. Create a VM with OS (yan-vm-2) 2. Used the same python script as in comment#1 to migrate VM to another host. Sometimes the VM will be shutdown, then script will stop. There are 3 kinds of output when script stops: (1).ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[]". HTTP response code is 400. (2).down Bugs! 1484533548.02 (3).ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot migrate VM. VM is not running.]". HTTP response code is 409. Pls refer to the attachment "rhevm4.1-mig-logs", it includes: pyError-[].txt pyError-bugsDown.txt pyError-[cannot].txt are script outputs for the above 3 situations. ovirt-engine.log yan-vm-2.log is the guest log. libvirtd-*.log vdsm-*.log rhvm-event-*.txt is some event info on rhevm page Tips: Pls ignore the messages like "profiling:/mnt/coverage/BUILD/libvirt-2.5.0/src/util/.libs/libvirt_util_la-vireventpoll.gcda:Cannot open" in yan-vm-2.log, that is because we used the libvirt-2.5.0-1.virtcov.el7.x86_64, we use virtcov to check code coverage, but didn't modify libvirt codes, so it does not affect. Log errors: 1. There seems no obvious migration related error message in libvirtd.log 2. The errors in vdsm log on yan-B: [root@hostb host-yanB]# cat vdsm-*|grep ERROR > 2017-01-15 21:24:54,811 ERROR (migsrc/01593538) [virt.vm] (vmId='01593538-a124-4958-bbf1-f6c83aaea600') migration destination error: Virtual machine already exists (migration:265) > 2017-01-15 21:25:02,428 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] Internal server error (__init__:552) > 2017-01-15 21:25:17,450 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] Internal server error (__init__:552) > 2017-01-15 21:25:32,476 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer] Internal server error (__init__:552) 2017-01-15 21:32:17,983 ERROR (jsonrpc/1) [jsonrpc.JsonRpcServer] Internal server error (__init__:552) 2017-01-15 21:32:48,020 ERROR (jsonrpc/2) [jsonrpc.JsonRpcServer] Internal server error (__init__:552) 2017-01-15 21:11:23,585 ERROR (jsonrpc/0) [virt.api] FINISH create error=Virtual machine already exists (api:69) 2017-01-15 21:12:11,031 ERROR (periodic/3) [virt.periodic.Operation] <vdsm.virt.sampling.VMBulkSampler object at 0x3792790> operation failed (periodic:192) 2017-01-15 21:12:21,763 ERROR (vm/01593538) [virt.vm] (vmId='01593538-a124-4958-bbf1-f6c83aaea600') Error fetching vm stats (vm:1320) 2017-01-15 21:12:25,691 ERROR (periodic/1) [root] VM metrics collection failed (vmstats:264) 2017-01-15 21:12:27,973 ERROR (migsrc/01593538) [virt.vm] (vmId='01593538-a124-4958-bbf1-f6c83aaea600') migration destination error: Virtual machine already exists (migration:265) 2017-01-15 21:12:31,461 ERROR (jsonrpc/6) [jsonrpc.JsonRpcServer] Internal server error (__init__:552) 2017-01-15 21:12:46,485 ERROR (jsonrpc/6) [jsonrpc.JsonRpcServer] Internal server error (__init__:552) > The phenomenon is a little different, there is a shutdown status when > migrating, but it will automatically be running again immediately. Do I understand correctly that the VM is not actually down, it is just reported as down for a short while and than again correctly reported as up? E.g. if you put the same sleep at the end of the script like here: https://bugzilla.redhat.com/show_bug.cgi?id=1409033 does it work correctly? Hi,Tomas Pls refer to my answers: 1.Do I understand correctly that the VM is not actually down, it is just reported as down for a short while and than again correctly reported as up? - Yes, I think so. Since the job in guest keeps running, not reboot. 2.if you put the same sleep at the end of the script like here: https://bugzilla.redhat.com/show_bug.cgi?id=1409033 does it work correctly? - I used the same script as https://bugzilla.redhat.com/show_bug.cgi?id=1409033#c6, sleep(2) before migrate. A fix for that was done back in Dec 14 (https://gerrit.ovirt.org/#/c/67917/) but it seems that a new version of vdsm-jsonrpc-java was built only on Jan 9. So I'm not sure whether the engine you used included the fix or not. Could you please try to reproduce it with the latest 4.1 version? Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1420718, cannot start a normal vm now. I've just seen this happening now in o-s-t. Need to reproduce with logs. This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly ok, closing. Please reopen if still relevant/you want to work on it. ok, closing. Please reopen if still relevant/you want to work on it. |