Bug 1497173

Summary: VM marked as non responsive if it has ISO from an inaccessible ISO domain
Product: Red Hat Enterprise Virtualization Manager Reporter: nijin ashok <nashok>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED DUPLICATE QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.6CC: bazulay, fromani, gveitmic, lsurette, mavital, srevivo, tjelinek, ycui, ykaul
Target Milestone: ovirt-4.2.0   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-03 07:34:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description nijin ashok 2017-09-29 11:39:42 UTC
Description of problem:

RHV VMs are marked as non-responsive if it's having an ISO from an inaccessible ISO domain. The vdsm will not be able to get any statistics from the VM and it will log in the vdsm log as "monitor become unresponsive". All monitoring calls will get discarded or blocked. Since it's in non-responsive state, we will not be able to detach the CD from the portal. The only option available is to shutdown the VM. Even we will not be able to use virsh commands to detach the CD as the call is blocked in GetAllDomainStats. Also Powering off the VM sometimes fails and we have to kill the qemu-kvm process manually. The qemu-kvm process of the VM will be in D state.

Version-Release number of selected component (if applicable):

rhevm-4.1.6.2-0.1.el7.noarch


How reproducible:

100%

Steps to Reproduce:
 
Block the connection to ISO domain from the host.


Actual results:

VMs are going into non-responding state when it's having inaccessible ISO from a ISO domain.

Expected results:

VMs should work fine or at least should put into  

Additional info:

Comment 2 Francesco Romani 2017-10-02 09:21:04 UTC
(In reply to nijin ashok from comment #0)
> Description of problem:
> 
> RHV VMs are marked as non-responsive if it's having an ISO from an
> inaccessible ISO domain. The vdsm will not be able to get any statistics
> from the VM and it will log in the vdsm log as "monitor become
> unresponsive". All monitoring calls will get discarded or blocked. Since
> it's in non-responsive state, we will not be able to detach the CD from the
> portal. The only option available is to shutdown the VM. Even we will not be
> able to use virsh commands to detach the CD as the call is blocked in
> GetAllDomainStats.
> Also Powering off the VM sometimes fails and we have to
> kill the qemu-kvm process manually. The qemu-kvm process of the VM will be
> in D state.

This last bit of information about libvirt/qemu is important. It means that QEMU and/or libvirt are stuck, so Vdsm is reacting accordingly to this state and -as far as I can understand- handling it well.

Problem is: we should not end up in this state in the first place. QEMU/libvirt should tolerate the unavailability of ISO domain.

What we can do is:
1. review the configuration we use for cdrom devices, check if RHV is compliant with best practices
Once #1 is correct, if we still see this behaviour, we need to
2. file a bug against libvirt/qemu

> Expected results:
> 
> VMs should work fine or at least should put into  

VM should keep working with I/O error on the inaccessible cdrom devices, and this is what we need to check with the two steps above, but I think this is the best behaviour we can get in this scenario.

Comment 4 Tomas Jelinek 2017-10-02 10:21:05 UTC
(In reply to Francesco Romani from comment #2)
> (In reply to nijin ashok from comment #0)
> > Description of problem:
> > 
> > RHV VMs are marked as non-responsive if it's having an ISO from an
> > inaccessible ISO domain. The vdsm will not be able to get any statistics
> > from the VM and it will log in the vdsm log as "monitor become
> > unresponsive". All monitoring calls will get discarded or blocked. Since
> > it's in non-responsive state, we will not be able to detach the CD from the
> > portal. The only option available is to shutdown the VM. Even we will not be
> > able to use virsh commands to detach the CD as the call is blocked in
> > GetAllDomainStats.
> > Also Powering off the VM sometimes fails and we have to
> > kill the qemu-kvm process manually. The qemu-kvm process of the VM will be
> > in D state.
> 
> This last bit of information about libvirt/qemu is important. It means that
> QEMU and/or libvirt are stuck, so Vdsm is reacting accordingly to this state
> and -as far as I can understand- handling it well.
> 
> Problem is: we should not end up in this state in the first place.
> QEMU/libvirt should tolerate the unavailability of ISO domain.
> 
> What we can do is:
> 1. review the configuration we use for cdrom devices, check if RHV is
> compliant with best practices
> Once #1 is correct, if we still see this behaviour, we need to

so setting for now for 4.2 and we can decide later based on the result of this investigation.

> 2. file a bug against libvirt/qemu
> 
> > Expected results:
> > 
> > VMs should work fine or at least should put into  
> 
> VM should keep working with I/O error on the inaccessible cdrom devices, and
> this is what we need to check with the two steps above, but I think this is
> the best behaviour we can get in this scenario.

Comment 5 Germano Veit Michel 2017-10-02 23:12:31 UTC
(In reply to Francesco Romani from comment #2)
> What we can do is:
> 1. review the configuration we use for cdrom devices, check if RHV is
> compliant with best practices

Isn't this the same as BZ1207992?