Bug 967751

Summary: vdsm: can't resume a vm that migrated due to EIO
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Michal Skrivanek <michal.skrivanek>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: bazulay, dron, hateya, iheim, jkt, lpeer, michal.skrivanek
Target Milestone: ---Keywords: Triaged
Target Release: 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-26 12:25:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 961154    
Bug Blocks:    
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-05-28 08:54:11 UTC
Created attachment 753801 [details]
logs

Description of problem:

I blocked storage in spm and vm migrated to the second host. 
it reached the target host as paused on EIO. 
when I tried to resume the vm the vm remains paused with no errors. 

Version-Release number of selected component (if applicable):

sf17.2
vdsm-4.10.2-22.0.el6ev.x86_64

How reproducible:

100%

Steps to Reproduce:
1. have a two hosts cluster with 2 iscsi domains and 1 export domain all located on the same storage server
2. create a vm from template (as thin clone)
3. run the vm on the spm and start a LSM for the vm disk -> block connectivity to the storage from the spm only using ip tables
4. after the vm migrates try to resume it  

Actual results:

we fail to resume the vm but there is no error or any indication why the vm fails to be resumed. 

Expected results:

we should either be able to resume the vm or indicate why we cannot and ask user to restart the vm. 

Additional info: logs

Thread-562::DEBUG::2013-05-28 11:42:50,682::BindingXMLRPC::913::vds::(wrapper) client [10.35.161.49]::call vmCont with ('156b2ec2-cb34-40cb-8e74-d0f76d9db74a',) {} flowID [5e89
1b87]
Thread-562::DEBUG::2013-05-28 11:42:50,724::BindingXMLRPC::920::vds::(wrapper) return vmCont with {'status': {'message': 'Done', 'code': 0}, 'output': ['']}
Dummy-335::DEBUG::2013-05-28 11:42:51,464::misc::83::Storage.Misc.excCmd::(<lambda>) 'dd if=/rhev/data-center/7fd33b43-a9f4-4eb7-a885-e9583a929ceb/mastersd/dom_md/inbox iflag=d
irect,fullblock count=1 bs=1024000' (cwd None)

Comment 1 Dafna Ron 2013-05-28 14:11:29 UTC
to clarify, the vm migration (not SLM) was performed by engine as part of the cluster migration policy.

Comment 2 Ayal Baron 2013-07-08 21:40:33 UTC
vmcont reported that it succeeded: 

Thread-562::DEBUG::2013-05-28 11:42:50,724::BindingXMLRPC::920::vds::(wrapper) return vmCont with {'status': {'message': 'Done', 'code': 0}, 'output': ['']}

Unfortunately the libvirt log ends before this so we cannot see what happened there.  Lat entry is a few hours earlier:
2013-05-28 08:45:50.329+0000: 10198: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x10221e0

Comment 3 Ayal Baron 2013-07-08 21:41:04 UTC
Haim,

Please reproduce and attach full logs

Comment 4 Michal Skrivanek 2013-07-12 11:51:20 UTC
as per current behavior the VM should not get migrated, see bug 961154 for related info

Comment 5 Dafna Ron 2013-07-14 08:22:05 UTC
(In reply to Michal Skrivanek from comment #4)
> as per current behavior the VM should not get migrated, see bug 961154 for
> related info

but vms do get migrated. 
if a vm pauses during migration we start migrating the vm while its still up and vm status is changed mid migration. 
depending on the status of the migration, vm can either finish migration and end up in pause state at the dst host or fail migration and returned to src. 
will we be able to start the vm if it migrates and change status mid migration?

Comment 6 Michal Skrivanek 2013-07-16 13:00:57 UTC
(In reply to Dafna Ron from comment #5)
> (In reply to Michal Skrivanek from comment #4)
> > as per current behavior the VM should not get migrated, see bug 961154 for
> > related info
> 
> but vms do get migrated. 
> if a vm pauses during migration we start migrating the vm while its still up
> and vm status is changed mid migration. 
yeah, and that's what bug 961154 is about

> will we be able to start the vm if it migrates and change status mid
> migration?
probably yes, but you're risking corruption. So we do want to block it or fail the migration if it happens in the middle

Comment 7 Michal Skrivanek 2013-07-26 12:25:16 UTC
This should be CLOSED INSUFFICIENT_DATA but since bug 961154 should take care of the original problem of EIO migration closing as DUPLICATE

*** This bug has been marked as a duplicate of bug 961154 ***