Bug 982968

Summary: The VM in rhevm env should be paused while block the nfs
Product: Red Hat Enterprise Virtualization Manager Reporter: Xuesong Zhang <xuzhang>
Component: vdsmAssignee: Nobody's working on this, feel free to take it <nobody>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: abaron, bazulay, bili, dron, dyuan, hateya, honzhang, iheim, jkt, lpeer, shyu, smizrahi, whuang
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-10 09:00:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Xuesong Zhang 2013-07-10 08:45:24 UTC
Description of problem:
The VM in rhevm env should be paused while block the nfs server, this is for security consideration. 

Version-Release number of selected component (if applicable):
vdsm-4.10.2-23.0.el6ev
libvirt-0.10.2-19.el6
qemu-kvm-rhev-0.12.1.2-2.377.el6 
spice-server-0.12.3-1.el6
kernel-2.6.32-396.el6

How reproducible:
100%

Steps to Reproduce:
1. prepare one rhevm env, register one host to this rhevm env.
2. prepare one health VM on this host, the storage type is NFS.
3. While the VM is running, block the NFS storage on the host.
4. after 5 minutes, the VM on the rhevm env is still running.


Actual results:
as step 4.

Expected results:
The VM on the rhevm env should be paused for security consideration.

Additional info:
If use ISCSI type in step2, the VM will be paused in step4. The VM action in step 4 should be keep consistency between the NFS and ISCSI type storage.

Comment 1 Dafna Ron 2013-07-10 09:00:20 UTC
the vms should only pause if they are actively writing. 
once they try to actively write they will be paused by qemu with EIO error (not vdsm) and the status should be reported to libvirt-> vdsm -> engine -> UI. 

I do not see a security issue here and as long as the vm is not writing there should be no data corruption.

Also, if a vm is paused it cannot migrate to a different host (this is blocked to prevent data corruption). so if a vm is not writing it will be migrated, if the vm is writing it will be paused and not migrated to prevent possible data corruption. 

closing this as not a bug since this is the design and because vdsm is not responsible to change the vm status at all.

Comment 2 Xuesong Zhang 2013-07-10 09:13:36 UTC
Have you see the addtional info in the bug description?
If I use ISCSI type storage, the VM will be paused in step4, I didn't do anything to the VM.
If I use NFS type storage, the VM is keep running in step 4.

The behavior should be keep same whether we use NFS or ISCSI type storage. 


(In reply to Dafna Ron from comment #1)
> the vms should only pause if they are actively writing. 
> once they try to actively write they will be paused by qemu with EIO error
> (not vdsm) and the status should be reported to libvirt-> vdsm -> engine ->
> UI. 
> 
> I do not see a security issue here and as long as the vm is not writing
> there should be no data corruption.
> 
> Also, if a vm is paused it cannot migrate to a different host (this is
> blocked to prevent data corruption). so if a vm is not writing it will be
> migrated, if the vm is writing it will be paused and not migrated to prevent
> possible data corruption. 
> 
> closing this as not a bug since this is the design and because vdsm is not
> responsible to change the vm status at all.

Comment 3 Dafna Ron 2013-07-10 13:00:44 UTC
I did see it. 
vms on block device will be more sensitive this is why they pause sooner.
even when there is no active writing there is always IO. since lvm is more sensitive qemu will detect the issue much faster and than will pause the vms.  

The fact that you mentioned that the vms would pause in iscsi and not in nfs simply told me that there is no issue but that the vms were not actively writing and so qemu did not detect the problem and did not pause the vms. 

hence, we have the same behaviour no matter what storage type, vm's pause by qemu once EIO issues are detected. 
 
here is the flow for any storage type: 

qemu detects an issue -> paused the vm with EIO
libvirt will update the vm status -> issue an event to vdsm 
vdsm will update the vm status -> issue an event to engine
engine will update the vm status in db -> UI will be updated with the change in status. 

to properly test this scenario you need to create vms with OS installed who will be actively writing and you need to open them on the correct component following the flow above.

Comment 4 Xuesong Zhang 2013-07-11 05:45:35 UTC
(In reply to Dafna Ron from comment #3)
> I did see it. 
hi, Dafna Ron,

I try to write in the VM, but the VM is still in running status at last. It seems the result is different with your description, would you please help to check it? 


Here is my test steps:

1. writing in the VM, such as dd like following:
dd if=/dev/zero of=/mnt/test.img bs=1K count=10000000

2. blcok the NFS storage in the host while the VM is still writing.

3. After 5 minutes, the VM is in funnel status.

4. The funnel status last for 3 minutes, then the VM turned to running status again.



> vms on block device will be more sensitive this is why they pause sooner.
> even when there is no active writing there is always IO. since lvm is more
> sensitive qemu will detect the issue much faster and than will pause the
> vms.  
> 
> The fact that you mentioned that the vms would pause in iscsi and not in nfs
> simply told me that there is no issue but that the vms were not actively
> writing and so qemu did not detect the problem and did not pause the vms. 
> 
> hence, we have the same behaviour no matter what storage type, vm's pause by
> qemu once EIO issues are detected. 
>  
> here is the flow for any storage type: 
> 
> qemu detects an issue -> paused the vm with EIO
> libvirt will update the vm status -> issue an event to vdsm 
> vdsm will update the vm status -> issue an event to engine
> engine will update the vm status in db -> UI will be updated with the change
> in status. 
> 
> to properly test this scenario you need to create vms with OS installed who
> will be actively writing and you need to open them on the correct component
> following the flow above.

Comment 6 Saggi Mizrahi 2013-08-06 13:14:07 UTC
If you used qcow2 as your image format the fact that you wrote zeroes could mean that no IO was issues. If you try to write all zeroes to an unallocated section of a qcow nothing will be written to disk.