Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 783977

Summary:

[ovirt] [vdsm] call destroy is sent to vm when migration is canceled

Product:

[Retired] oVirt

Reporter:

Haim <hateya>

Component:

vdsm

Assignee:

Shahar Havivi <shavivi>

Status:

CLOSED WORKSFORME

QA Contact:

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

unspecified

CC:

abaron, acathrow, bazulay, iheim, mgoldboi, michal.skrivanek, yeylon, ykaul

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

virt

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-10-18 07:43:13 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
vdsm log	none

Description Haim 2012-01-23 12:30:00 UTC

Description of problem:

scenario: 

- start migration using web-admin 
- using vdsClient, use command to cancel migration thread 
- migration is stopped, however, backend 'call destroy' command is sent, killing the vm. 

flow: 

- migration is called from backend
- migration is started
- from client, abort the migration
- migration is aborted
- backend send getVmStats, and vdsm reports as if migration was succeeded
- vm gets killed by backend
- vm moves to unknown on backend - and down after several minutes 

why - it smells like a nasty race:

lets take the following case: 

- vmId = 75f8b814-8a85-4bb0-a428-523e0ec6875c

Thread-2429::DEBUG::2012-01-23 04:36:14,256::clientIF::76::vds::(wrapper) [10.16.144.104]::call migrate with ({'src': '10.16.144.166', 'dst': '10.16.144.164:54321', 'vmId': '75f8b814-8a85-4bb0-a428-523e0ec6875c', 'method': 'online'},) {}

- migration starts, and runs in different thread:

Thread-2430::DEBUG::2012-01-23 04:36:14,261::vm::120::vm.Vm::(_setupVdsConnection) vmId=`75f8b814-8a85-4bb0-a428-523e0ec6875c`::Initiating connection with destination

- now I sent the migration cancel command:

Thread-2432::DEBUG::2012-01-23 04:36:16,688::clientIF::76::vds::(wrapper) [10.16.144.166]::call migrateCancel with ('75f8b814-8a85-4bb0-a428-523e0ec6875c',) {}

- migration was finished successfully: 

Thread-2438::DEBUG::2012-01-23 04:36:22,063::libvirtvm::317::vm.Vm::(run) vmId=`75f8b814-8a85-4bb0-a428-523e0ec6875c`::migration downtime thread started
Thread-2439::DEBUG::2012-01-23 04:36:22,065::libvirtvm::345::vm.Vm::(run) vmId=`75f8b814-8a85-4bb0-a428-523e0ec6875c`::starting migration monitor thread
Thread-2430::DEBUG::2012-01-23 04:36:22,065::libvirtvm::332::vm.Vm::(cancel) vmId=`75f8b814-8a85-4bb0-a428-523e0ec6875c`::canceling migration downtime thread
Thread-2430::DEBUG::2012-01-23 04:36:22,079::libvirtvm::382::vm.Vm::(stop) vmId=`75f8b814-8a85-4bb0-a428-523e0ec6875c`::stopping migration monitor thread
Thread-2438::DEBUG::2012-01-23 04:36:22,080::libvirtvm::329::vm.Vm::(run) vmId=`75f8b814-8a85-4bb0-a428-523e0ec6875c`::migration downtime thread exiting
Thread-2430::DEBUG::2012-01-23 04:36:22,187::vm::898::vm.Vm::(setDownStatus) vmId=`75f8b814-8a85-4bb0-a428-523e0ec6875c`::Changed state to Down: Migration succeeded

notes: 

- we need a way to avoid such races (add locking?)
- we need to set a point where migration cancel should return with error - saying, at this point we can't abort. 

attached logs.

Comment 1 Haim 2012-01-23 12:32:44 UTC

Created attachment 556957 [details]
vdsm log

Comment 2 Dan Kenigsberg 2012-01-23 12:44:07 UTC

Shahar, please see if this is the documented problem of

                #FIXME: there still a race here with libvirt,
                # if we call stop() and libvirt migrateToURI2 didn't start
                # we may return migration stop but it will start at libvirt
                # side

Comment 3 Shahar Havivi 2012-02-02 14:21:25 UTC

its looks like vdsm is moving the VM status to 'down',
and when the engine see the VM in status 'down' it call destroy().

this case is happened when doing starting migration on source, and from vdsClinet calling write an endless loop to stop the migration:

# while true; vdsClient -s 0 migrateCancel <vmid>; done

Comment 4 Shahar Havivi 2012-06-06 08:28:27 UTC

patch sent:
http://gerrit.ovirt.org/#/c/2533/

Comment 5 Shahar Havivi 2012-08-30 10:24:03 UTC

please check if still relevant

Comment 6 Haim 2012-08-30 10:29:29 UTC

(In reply to comment #5)
> please check if still relevant

no capacity of testing the flow again.

Comment 7 Michal Skrivanek 2012-10-19 10:51:33 UTC

different, but perhaps related bug: 867439