Bug 1142879 - Live migration of VM running from live CD not working, guest enters in to pause state and stays there.
Summary: Live migration of VM running from live CD not working, guest enters in to pau...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.5.0
Assignee: Francesco Romani
QA Contact: Nikolai Sednev
URL:
Whiteboard: virt
Depends On: 1154631 1154769
Blocks: 1084930
TreeView+ depends on / blocked
 
Reported: 2014-09-17 14:41 UTC by Nikolai Sednev
Modified: 2014-12-21 12:54 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-30 15:44:45 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (1.54 MB, application/octet-stream)
2014-09-17 14:41 UTC, Nikolai Sednev
no flags Details
logs (1.83 MB, application/octet-stream)
2014-09-17 15:00 UTC, Nikolai Sednev
no flags Details
logs from 12_10_14 (2.09 MB, application/x-gzip)
2014-10-12 06:30 UTC, Nikolai Sednev
no flags Details
logs from engine and blue-vdsc (RHEL 6.6 host) (2.33 MB, application/x-gzip)
2014-10-30 08:01 UTC, Nikolai Sednev
no flags Details

Description Nikolai Sednev 2014-09-17 14:41:50 UTC
Created attachment 938502 [details]
logs

Description of problem:
Live migration of VM running from live CD not working, guest enters in to pause state and stays there.

Version-Release number of selected component (if applicable):
sanlock-3.1.0-2.el7.x86_64
qemu-kvm-rhev-1.5.3-60.el7_0.2.x86_64
vdsm-4.16.4-0.el7.x86_64
libvirt-daemon-1.1.1-29.el7_0.1.x86_64
libvirt-daemon-driver-secret-1.1.1-29.el7_0.1.x86_64
libvirt-daemon-driver-storage-1.1.1-29.el7_0.1.x86_64
libvirt-daemon-driver-qemu-1.1.1-29.el7_0.1.x86_64
libvirt-python-1.1.1-29.el7_0.1.x86_64
libvirt-lock-sanlock-1.1.1-29.el7_0.1.x86_64
libvirt-daemon-kvm-1.1.1-29.el7_0.1.x86_64
libvirt-client-1.1.1-29.el7_0.1.x86_64
libvirt-daemon-driver-interface-1.1.1-29.el7_0.1.x86_64
libvirt-daemon-driver-nodedev-1.1.1-29.el7_0.1.x86_64
libvirt-daemon-config-nwfilter-1.1.1-29.el7_0.1.x86_64
libvirt-daemon-driver-network-1.1.1-29.el7_0.1.x86_64
libvirt-daemon-driver-nwfilter-1.1.1-29.el7_0.1.x86_64
rhevm-3.5.0-0.12.beta.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1.Create VM with boot option to run from CDROM, while use iso image with tiny linux for that matter, for example "stresslinux_64bit_11.4.x86_64-0.7.106.iso" from here "http://www.stresslinux.org/sl/downloads".
2.Start VM, while iso being received from iso domain.
3.Try to migrate VM to another host and you'll see that it enters in to paused state, in which it will stay until powered off.

Actual results:
VM enters in to paused state, migration fails.

Expected results:
Migration should success and VM have to return in to UP state.

Additional info:
logs

Comment 1 Nikolai Sednev 2014-09-17 15:00:48 UTC
Created attachment 938505 [details]
logs

Comment 2 Francesco Romani 2014-09-19 07:06:32 UTC
This is the reason why the VM got paused once migrated:

Thread-4670::DEBUG::2014-09-17 17:44:20,886::BindingXMLRPC::1132::vds::(wrapper) client [10.35.163.77]::call vmGetStats with ('7154e0ff-a1c6-4fe0-a4e8-9756d83e1529',) {} flowID [4be7acb3]
Thread-4670::DEBUG::2014-09-17 17:44:20,886::BindingXMLRPC::1139::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Migration Destination', 'displayInfo': [{'tlsPort': '5909', 'ipAddress': '10.35.117.26', 'type': 'spice', 'port': '-1'}], 'hash': '0', 'acpiEnable': 'true', 'pid': '0', 'displayIp': '10.35.117.26', 'displayPort': '-1', 'displaySecurePort': '-1', 'timeOffset': '0', 'pauseCode': 'NOERR', 'kvmEnable': 'true', 'network': {}, 'vmId': '7154e0ff-a1c6-4fe0-a4e8-9756d83e1529', 'vmType': 'kvm', 'disks': {}, 'monitorResponse': '0', 'elapsedTime': '1258', 'displayType': 'qxl', 'clientIp': ''}]}
libvirtEventLoop::DEBUG::2014-09-17 17:44:21,798::vm::5471::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::event Resumed detail 0 opaque None
libvirtEventLoop::DEBUG::2014-09-17 17:44:21,798::vm::5471::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::event Resumed detail 1 opaque None
libvirtEventLoop::DEBUG::2014-09-17 17:44:21,799::vm::5471::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::event Suspended detail 0 opaque None
Thread-9727::DEBUG::2014-09-17 17:44:21,830::vm::5311::vm.Vm::(_getUnderlyingDriveInfo) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::Looking for drive with attributes {'name': u'hdc', 'bootOrder': u'1', 'boot': [<DOM Element: boot at 0x7fda001d8368>], 'readonly': True, 'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, 'path': u'/rhev/data-center/mnt/10.35.160.108:_RHEV_nsednev-iso-domain/de096187-36e2-44e5-b8db-a9b67c3960de/images/11111111-1111-1111-1111-111111111111/stresslinux_64bit_11.4.x86_64-0.7.106.iso', 'type': u'cdrom'}
Thread-9727::DEBUG::2014-09-17 17:44:21,830::vm::5332::vm.Vm::(_getUnderlyingDriveInfo) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::Matched {'name': (u'hdc', u'hdc'), 'bootOrder': (u'1', u'1'), 'boot': ([<DOM Element: boot at 0x7fda001d8368>], None), 'readonly': (True, True), 'address': ({u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}), 'path': (u'/rhev/data-center/mnt/10.35.160.108:_RHEV_nsednev-iso-domain/de096187-36e2-44e5-b8db-a9b67c3960de/images/11111111-1111-1111-1111-111111111111/stresslinux_64bit_11.4.x86_64-0.7.106.iso', '/rhev/data-center/mnt/10.35.160.108:_RHEV_nsednev-iso-domain/de096187-36e2-44e5-b8db-a9b67c3960de/images/11111111-1111-1111-1111-111111111111/stresslinux_64bit_11.4.x86_64-0.7.106.iso'), 'type': (u'cdrom', u'cdrom')}
Thread-9727::DEBUG::2014-09-17 17:44:21,830::vm::5351::vm.Vm::(_getUnderlyingDriveInfo) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::Matched {'name': (u'hdc', None), 'bootOrder': (u'1', None), 'boot': ([<DOM Element: boot at 0x7fda001d8368>], None), 'readonly': (True, None), 'address': ({u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, None), 'path': (u'/rhev/data-center/mnt/10.35.160.108:_RHEV_nsednev-iso-domain/de096187-36e2-44e5-b8db-a9b67c3960de/images/11111111-1111-1111-1111-111111111111/stresslinux_64bit_11.4.x86_64-0.7.106.iso', None), 'type': (u'cdrom', None)}
Thread-9727::DEBUG::2014-09-17 17:44:21,831::vm::5311::vm.Vm::(_getUnderlyingDriveInfo) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::Looking for drive with attributes {'name': u'vda', 'bootOrder': '', 'boot': [], 'readonly': False, 'address': {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, 'path': u'/rhev/data-center/00000002-0002-0002-0002-00000000002d/7e6871fb-7f14-410f-a7a9-de0167aa8778/images/57e8b2e8-1b59-45b0-82de-621004fa3c36/d8142647-8448-4dc7-968b-58b5b0faff4d', 'type': u'disk'}
Thread-9727::DEBUG::2014-09-17 17:44:21,831::vm::5332::vm.Vm::(_getUnderlyingDriveInfo) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::Matched {'name': (u'vda', u'vda'), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (False, False), 'address': ({u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}), 'path': (u'/rhev/data-center/00000002-0002-0002-0002-00000000002d/7e6871fb-7f14-410f-a7a9-de0167aa8778/images/57e8b2e8-1b59-45b0-82de-621004fa3c36/d8142647-8448-4dc7-968b-58b5b0faff4d', u'/rhev/data-center/00000002-0002-0002-0002-00000000002d/7e6871fb-7f14-410f-a7a9-de0167aa8778/images/57e8b2e8-1b59-45b0-82de-621004fa3c36/d8142647-8448-4dc7-968b-58b5b0faff4d'), 'type': (u'disk', u'disk')}
Thread-9727::DEBUG::2014-09-17 17:44:21,831::vm::5351::vm.Vm::(_getUnderlyingDriveInfo) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::Matched {'name': (u'vda', None), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (False, None), 'address': ({u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, None), 'path': (u'/rhev/data-center/00000002-0002-0002-0002-00000000002d/7e6871fb-7f14-410f-a7a9-de0167aa8778/images/57e8b2e8-1b59-45b0-82de-621004fa3c36/d8142647-8448-4dc7-968b-58b5b0faff4d', None), 'type': (u'disk', None)}
Thread-9727::DEBUG::2014-09-17 17:44:21,898::sampling::405::vm.Vm::(start) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::Start statistics collection
Thread-9729::DEBUG::2014-09-17 17:44:21,898::sampling::434::vm.Vm::(run) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::Stats thread started

The most important line is:

libvirtEventLoop::DEBUG::2014-09-17 17:44:21,799::vm::5471::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::event Suspended detail 0 opaque None

libvirt reported the VM be paused once on destination. VDSM has no choice here but to comply.

Can you please share the libvirt and qemu logs on destination host?

Moreover, will see if I can reproduce locally. I have two questions on this topic:
1. I understand you are migrating from one EL7 host to another EL7 host, is that right?
2. with "2.Start VM, while iso being received from iso domain." you mean the VM is booting, thus reading data from the ISO domain, right?

Comment 3 Francesco Romani 2014-09-19 12:27:51 UTC
More details from further investigation.

Near to the end of a succesfull live migration, libvirt automatically does a 'cont'(inue) to the migrated VM in order to resume execution.
This triggers this event:

libvirtEventLoop::DEBUG::2014-09-17 17:44:21,798::vm::5471::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::event Resumed detail 1 opaque None

which is part of the regular flow.
After that libvirt checks the actual state of the QEMU, and, if (still) found paused, bubbles up the event, which reflects to

libvirtEventLoop::DEBUG::2014-09-17 17:44:21,799::vm::5471::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::event Suspended detail 0 opaque None

we saw before.

That said, there is another suspicious thing here:

libvirtEventLoop::DEBUG::2014-09-17 17:44:21,798::vm::5471::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`7154e0ff-a1c6-4fe0-a4e8-9756d83e1529`::event Resumed detail 0 opaque None

detail 0 is VIR_DOMAIN_EVENT_RESUMED_UNPAUSED. This event it is expected in two main cases:
1. after some client code explicitely called resume(), but this is not part of the live migration flow; that's what we do for dehibernation
2. in case of fakereboot, including if guest OS crashes.

Comment 4 Francesco Romani 2014-09-19 12:59:52 UTC
Tried to reproduce locally with no luck with VDSM 4.16.4 running on Centos7 due the lack of a RHEL7 environment at hand.

- configured a VM to boot from CDROM
- attached a livecd image: tried Fedora19 and Fedora20
- booted the VM
- using webadmin UI, migrated the VM as soon as the Engine allows it

Tried the above in the order of 5-6 times.

Admittedly, the scenario is not the same as reported but it is close enough. Does this really reproduces 100% of times?

Comment 5 Nikolai Sednev 2014-09-21 05:56:20 UTC
(In reply to Francesco Romani from comment #4)
> Tried to reproduce locally with no luck with VDSM 4.16.4 running on Centos7
> due the lack of a RHEL7 environment at hand.
> 
> - configured a VM to boot from CDROM
> - attached a livecd image: tried Fedora19 and Fedora20
> - booted the VM
> - using webadmin UI, migrated the VM as soon as the Engine allows it
> 
> Tried the above in the order of 5-6 times.
> 
> Admittedly, the scenario is not the same as reported but it is close enough.
> Does this really reproduces 100% of times?

Yes, on my environment.
Used RHEL7.0 to do so and migrated from it to RHEL6.5, may be that's the issue...

Comment 6 Francesco Romani 2014-09-23 09:53:19 UTC
(In reply to Nikolai Sednev from comment #5)
> (In reply to Francesco Romani from comment #4)
> > Tried to reproduce locally with no luck with VDSM 4.16.4 running on Centos7
> > due the lack of a RHEL7 environment at hand.
> > 
> > - configured a VM to boot from CDROM
> > - attached a livecd image: tried Fedora19 and Fedora20
> > - booted the VM
> > - using webadmin UI, migrated the VM as soon as the Engine allows it
> > 
> > Tried the above in the order of 5-6 times.
> > 
> > Admittedly, the scenario is not the same as reported but it is close enough.
> > Does this really reproduces 100% of times?
> 
> Yes, on my environment.
> Used RHEL7.0 to do so and migrated from it to RHEL6.5, may be that's the
> issue...

Sorry, from original report I understood the pausing happens in live migrations from RHEL7 to RHEL7.
Migrating from RHEL7 to RHEL6.5 is a completely different scenario - which I haven't investigated, but which of courese I will.

So does the VM pause when you migrate
from RHEL7 to RHEL6.5 or
from RHEL7 to RHEL7
?

Comment 7 Nikolai Sednev 2014-09-23 10:54:55 UTC
6.5 -> 7.0.
(In reply to Francesco Romani from comment #6)
> (In reply to Nikolai Sednev from comment #5)
> > (In reply to Francesco Romani from comment #4)
> > > Tried to reproduce locally with no luck with VDSM 4.16.4 running on Centos7
> > > due the lack of a RHEL7 environment at hand.
> > > 
> > > - configured a VM to boot from CDROM
> > > - attached a livecd image: tried Fedora19 and Fedora20
> > > - booted the VM
> > > - using webadmin UI, migrated the VM as soon as the Engine allows it
> > > 
> > > Tried the above in the order of 5-6 times.
> > > 
> > > Admittedly, the scenario is not the same as reported but it is close enough.
> > > Does this really reproduces 100% of times?
> > 
> > Yes, on my environment.
> > Used RHEL7.0 to do so and migrated from it to RHEL6.5, may be that's the
> > issue...
> 
> Sorry, from original report I understood the pausing happens in live
> migrations from RHEL7 to RHEL7.
> Migrating from RHEL7 to RHEL6.5 is a completely different scenario - which I
> haven't investigated, but which of courese I will.
> 
> So does the VM pause when you migrate
> from RHEL7 to RHEL6.5 or
> from RHEL7 to RHEL7
> ?

6.5 -> 7.0.

Comment 8 Francesco Romani 2014-10-03 08:58:53 UTC
Didn't reproduce locally on RHEL 6.5 -> RHEL 7.0.

From the logs you posted, VDSM is doing nothing else but comply to what libvirt is reporting. So, to go ahead, we need to figure out what libvirt is doing.

Please provide libvirt debug logs (http://wiki.libvirt.org/page/DebugLogs)
of a reproduction of the incident.

Comment 9 Nikolai Sednev 2014-10-05 04:14:40 UTC
(In reply to Francesco Romani from comment #8)
> Didn't reproduce locally on RHEL 6.5 -> RHEL 7.0.
> 
> From the logs you posted, VDSM is doing nothing else but comply to what
> libvirt is reporting. So, to go ahead, we need to figure out what libvirt is
> doing.
> 
> Please provide libvirt debug logs (http://wiki.libvirt.org/page/DebugLogs)
> of a reproduction of the incident.

https://bugzilla.redhat.com/show_bug.cgi?id=1141763

Comment 10 Francesco Romani 2014-10-06 14:39:22 UTC
(In reply to Nikolai Sednev from comment #9)
> (In reply to Francesco Romani from comment #8)

> > Please provide libvirt debug logs (http://wiki.libvirt.org/page/DebugLogs)
> > of a reproduction of the incident.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1141763

(also commented on that bug)

Yes, I'm aware of that. But we can re-enable libvirt debug logs on QE boxes, can't we?

I'm happy to help if you need assistance.

Comment 11 Michal Skrivanek 2014-10-07 09:10:08 UTC
Nikolai,
so which way it failed for you? There are contradicting statements in comment #5 you said 7.0->6.5, in comment 7 you say vice versa.

it doesn't reproduce for us either way though.
We need the libvirt&qemu logs when it happens to you

Comment 12 Francesco Romani 2014-10-07 14:15:03 UTC
Eventually I managed to reproduce the symptoms described.
It is not happening all the times - seems pretty random actually, or I haven't yet figured out how to reproduce systematically.

RHEL7.0 -> RHEL6.5 migration completes, but on dst host VM get paused.

The only meaningful info seems to be in QEMU logs and  is

=== cut here ===
kvm: unhandled exit 80000021
kvm_run returned -22
=== cut here ===

Comment 13 Francesco Romani 2014-10-07 14:51:24 UTC
(In reply to Francesco Romani from comment #12)
> Eventually I managed to reproduce the symptoms described.
> It is not happening all the times - seems pretty random actually, or I
> haven't yet figured out how to reproduce systematically.

https://bugzilla.redhat.com/show_bug.cgi?id=1150163

Comment 14 Francesco Romani 2014-10-10 12:37:40 UTC
(In reply to Francesco Romani from comment #13)
> (In reply to Francesco Romani from comment #12)
> > Eventually I managed to reproduce the symptoms described.
> > It is not happening all the times - seems pretty random actually, or I
> > haven't yet figured out how to reproduce systematically.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1150163

For the other direction of the migration (6.5->7.0 , https://bugzilla.redhat.com/show_bug.cgi?id=1142879#c12), see https://bugzilla.redhat.com/show_bug.cgi?id=1035184


Does this match the scenario?

Comment 15 Nikolai Sednev 2014-10-12 06:27:12 UTC
(In reply to Michal Skrivanek from comment #11)
> Nikolai,
> so which way it failed for you? There are contradicting statements in
> comment #5 you said 7.0->6.5, in comment 7 you say vice versa.
> 
> it doesn't reproduce for us either way though.
> We need the libvirt&qemu logs when it happens to you

To simplify things out, I reran the test scenario using couple of RHEL7.0 guests:
	
2014-Oct-12, 09:22
	
Migration failed due to Error: Fatal error during migration (VM: VM1_stress, Source: alma03.qa.lab.tlv.redhat.com, Destination: alma04.qa.lab.tlv.redhat.com).
	
2014-Oct-12, 09:22
	
Migration started (VM: VM1_stress, Source: alma03.qa.lab.tlv.redhat.com, Destination: alma04.qa.lab.tlv.redhat.com, User: admin).
Logs attached.

Comment 16 Nikolai Sednev 2014-10-12 06:30:41 UTC
Created attachment 946073 [details]
logs from 12_10_14

Comment 17 Nikolai Sednev 2014-10-12 06:37:18 UTC
(In reply to Michal Skrivanek from comment #11)
> Nikolai,
> so which way it failed for you? There are contradicting statements in
> comment #5 you said 7.0->6.5, in comment 7 you say vice versa.
> 
> it doesn't reproduce for us either way though.
> We need the libvirt&qemu logs when it happens to you

1.I think it was 6.5->7.0.
2.For this time (7.0->7.0) migration even not entered in to paused state, it's just failed.

Comment 18 Michal Skrivanek 2014-10-12 07:25:10 UTC
- please see comment #14
- the last attachment doesn't contain destination vdsm log where the error most likely would be.
- we haven't reproduce any 6.5->7.0 problem
- 7.0-6.x must be avoided, if you do that further behavior of the VM is undefined

Comment 19 Nikolai Sednev 2014-10-13 13:44:00 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1152136 blocking me from moving on and retesting this bug. 
7.0->6.6 should be available, as in mixed environments during migration for upgrade, rollback is required, for reasons where upgrade doesn't fits with customer expectations from it and they decide to move back to 6.6.

Comment 20 Michal Skrivanek 2014-10-13 16:32:16 UTC
(In reply to Nikolai Sednev from comment #19)
> https://bugzilla.redhat.com/show_bug.cgi?id=1152136 blocking me from moving
> on and retesting this bug. 
> 7.0->6.6 should be available, as in mixed environments during migration for
> upgrade, rollback is required, for reasons where upgrade doesn't fits with
> customer expectations from it and they decide to move back to 6.6.

Well,unfortunately that's not going to be supported. Will be blocked or documented soon. 
Anyway, not really the point right here...we need dst logs and a clear case of 7.0->7.0 migration iiuc

Comment 21 Michal Skrivanek 2014-10-20 14:07:56 UTC
see bug 1154631 for blocking mixing 6.6 and 7.0 in the same cluster

Comment 22 Michal Skrivanek 2014-10-22 16:50:34 UTC
it's an invalid flow which is going to be blocked by bug 1154631, hence no need to keep tracking this one unless you reproduce on purely 6.6 or purely 7.0. Please close if it's not the case...

Comment 23 Nikolai Sednev 2014-10-22 17:25:42 UTC
6.6<->6.6 worked for me with severe problems on HE environment (https://bugzilla.redhat.com/show_bug.cgi?id=1155705), 7.0<->7.0 not tested as blocked by another issue there with 7.0 deployment on to (https://bugzilla.redhat.com/show_bug.cgi?id=1154769) to HE environment.

Comment 24 Omer Frenkel 2014-10-29 15:31:26 UTC
Nikolai, setting need-info again on you to test once its possible and update this bug with what you discover, thanks.

Comment 25 Nikolai Sednev 2014-10-30 07:45:18 UTC
6.6<->6.6 migration is working fine, 7.0<->7.0 working fine too, but 7.0->6.6 stuck in paused state, I aware of this type of migrations will be prohibited and should be solved in 1154631, so please decide on this bug status to be closed or not, anyway it depends on 1154631.

Comment 26 Omer Frenkel 2014-10-30 07:57:54 UTC
Thanks, closing, please re-open if this happens on the valid migration flows.

Comment 27 Nikolai Sednev 2014-10-30 07:58:19 UTC
Logs attached.
I also saw these in /var/log/messages:
Oct 30 03:37:59 brown-vdsc journal: vdsm vm.Vm WARNING vmId=`418b65c9-028b-4e18-ab4f-68b435342aa4`::_readPauseCode unsupported by
 libvirt vm
Oct 30 03:37:59 brown-vdsc journal: vdsm jsonrpc.JsonRpcServer ERROR Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 486, in _serveRequest
    res = method(**params)
  File "/usr/share/vdsm/rpc/Bridge.py", line 264, in _dynamicMethod
    result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 338, in destroy
    res = v.destroy()
  File "/usr/share/vdsm/virt/vm.py", line 4919, in destroy
    response = self.doDestroy()
  File "/usr/share/vdsm/virt/vm.py", line 4936, in doDestroy
    return self.releaseVm()
  File "/usr/share/vdsm/virt/vm.py", line 4837, in releaseVm
    supervdsm.getProxy().removeFs(drive.path)
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in removeFs
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
Exception: Cannot remove Fs that does not exists in: /var/run/vdsm/payload
Oct 30 03:38:01 brown-vdsc ntpd[6667]: Listen normally on 18 vnet0 fe80::fc1a:4aff:fe19:d403 UDP 123
Oct 30 03:38:02 brown-vdsc journal: vdsm jsonrpc.JsonRpcServer ERROR Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 486, in _serveRequest
    res = method(**params)
  File "/usr/share/vdsm/rpc/Bridge.py", line 264, in _dynamicMethod
    result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 338, in destroy
    res = v.destroy()
  File "/usr/share/vdsm/virt/vm.py", line 4919, in destroy
    response = self.doDestroy()

Comment 28 Nikolai Sednev 2014-10-30 08:01:30 UTC
Created attachment 951987 [details]
logs from engine and blue-vdsc (RHEL 6.6 host)

Comment 29 Nikolai Sednev 2014-10-30 08:03:56 UTC
Please don't close until 1154631 solved.

Comment 30 Michal Skrivanek 2014-10-30 15:44:45 UTC
ok, finally the blocking is implemented

Comment 31 Francesco Romani 2014-10-30 16:07:25 UTC
(In reply to Nikolai Sednev from comment #27)
> Logs attached.
> I also saw these in /var/log/messages:
> Oct 30 03:37:59 brown-vdsc journal: vdsm vm.Vm WARNING
> vmId=`418b65c9-028b-4e18-ab4f-68b435342aa4`::_readPauseCode unsupported by
>  libvirt vm
> Oct 30 03:37:59 brown-vdsc journal: vdsm jsonrpc.JsonRpcServer ERROR
> Internal server error
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 486,
> in _serveRequest
>     res = method(**params)
>   File "/usr/share/vdsm/rpc/Bridge.py", line 264, in _dynamicMethod
>     result = fn(*methodArgs)
>   File "/usr/share/vdsm/API.py", line 338, in destroy
>     res = v.destroy()
>   File "/usr/share/vdsm/virt/vm.py", line 4919, in destroy
>     response = self.doDestroy()
>   File "/usr/share/vdsm/virt/vm.py", line 4936, in doDestroy
>     return self.releaseVm()
>   File "/usr/share/vdsm/virt/vm.py", line 4837, in releaseVm
>     supervdsm.getProxy().removeFs(drive.path)
>   File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
>     return callMethod()
>   File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
>     **kwargs)
>   File "<string>", line 2, in removeFs
>   File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in
> _callmethod
>     raise convert_to_error(kind, result)
> Exception: Cannot remove Fs that does not exists in: /var/run/vdsm/payload
> Oct 30 03:38:01 brown-vdsc ntpd[6667]: Listen normally on 18 vnet0
> fe80::fc1a:4aff:fe19:d403 UDP 123
> Oct 30 03:38:02 brown-vdsc journal: vdsm jsonrpc.JsonRpcServer ERROR
> Internal server error
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 486,
> in _serveRequest
>     res = method(**params)
>   File "/usr/share/vdsm/rpc/Bridge.py", line 264, in _dynamicMethod
>     result = fn(*methodArgs)
>   File "/usr/share/vdsm/API.py", line 338, in destroy
>     res = v.destroy()
>   File "/usr/share/vdsm/virt/vm.py", line 4919, in destroy
>     response = self.doDestroy()


This is a known side effect of this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1145636

If affected VM was created and migrated using a bugged engine, you'll get that error when you try to shut it off.


Note You need to log in before you can comment on or make changes to this bug.