Bug 1497173 - VM marked as non responsive if it has ISO from an inaccessible ISO domain
Summary: VM marked as non responsive if it has ISO from an inaccessible ISO domain
Keywords:
Status: CLOSED DUPLICATE of bug 1207992
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.1.6
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ovirt-4.2.0
: ---
Assignee: Dan Kenigsberg
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-29 11:39 UTC by nijin ashok
Modified: 2021-05-01 16:53 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-03 07:34:26 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1207992 0 medium CLOSED [RFE] Report IO errors to guests if the device is a CDROM 2021-09-09 11:42:55 UTC

Internal Links: 1207992

Description nijin ashok 2017-09-29 11:39:42 UTC
Description of problem:

RHV VMs are marked as non-responsive if it's having an ISO from an inaccessible ISO domain. The vdsm will not be able to get any statistics from the VM and it will log in the vdsm log as "monitor become unresponsive". All monitoring calls will get discarded or blocked. Since it's in non-responsive state, we will not be able to detach the CD from the portal. The only option available is to shutdown the VM. Even we will not be able to use virsh commands to detach the CD as the call is blocked in GetAllDomainStats. Also Powering off the VM sometimes fails and we have to kill the qemu-kvm process manually. The qemu-kvm process of the VM will be in D state.

Version-Release number of selected component (if applicable):

rhevm-4.1.6.2-0.1.el7.noarch


How reproducible:

100%

Steps to Reproduce:
 
Block the connection to ISO domain from the host.


Actual results:

VMs are going into non-responding state when it's having inaccessible ISO from a ISO domain.

Expected results:

VMs should work fine or at least should put into  

Additional info:

Comment 2 Francesco Romani 2017-10-02 09:21:04 UTC
(In reply to nijin ashok from comment #0)
> Description of problem:
> 
> RHV VMs are marked as non-responsive if it's having an ISO from an
> inaccessible ISO domain. The vdsm will not be able to get any statistics
> from the VM and it will log in the vdsm log as "monitor become
> unresponsive". All monitoring calls will get discarded or blocked. Since
> it's in non-responsive state, we will not be able to detach the CD from the
> portal. The only option available is to shutdown the VM. Even we will not be
> able to use virsh commands to detach the CD as the call is blocked in
> GetAllDomainStats.
> Also Powering off the VM sometimes fails and we have to
> kill the qemu-kvm process manually. The qemu-kvm process of the VM will be
> in D state.

This last bit of information about libvirt/qemu is important. It means that QEMU and/or libvirt are stuck, so Vdsm is reacting accordingly to this state and -as far as I can understand- handling it well.

Problem is: we should not end up in this state in the first place. QEMU/libvirt should tolerate the unavailability of ISO domain.

What we can do is:
1. review the configuration we use for cdrom devices, check if RHV is compliant with best practices
Once #1 is correct, if we still see this behaviour, we need to
2. file a bug against libvirt/qemu

> Expected results:
> 
> VMs should work fine or at least should put into  

VM should keep working with I/O error on the inaccessible cdrom devices, and this is what we need to check with the two steps above, but I think this is the best behaviour we can get in this scenario.

Comment 4 Tomas Jelinek 2017-10-02 10:21:05 UTC
(In reply to Francesco Romani from comment #2)
> (In reply to nijin ashok from comment #0)
> > Description of problem:
> > 
> > RHV VMs are marked as non-responsive if it's having an ISO from an
> > inaccessible ISO domain. The vdsm will not be able to get any statistics
> > from the VM and it will log in the vdsm log as "monitor become
> > unresponsive". All monitoring calls will get discarded or blocked. Since
> > it's in non-responsive state, we will not be able to detach the CD from the
> > portal. The only option available is to shutdown the VM. Even we will not be
> > able to use virsh commands to detach the CD as the call is blocked in
> > GetAllDomainStats.
> > Also Powering off the VM sometimes fails and we have to
> > kill the qemu-kvm process manually. The qemu-kvm process of the VM will be
> > in D state.
> 
> This last bit of information about libvirt/qemu is important. It means that
> QEMU and/or libvirt are stuck, so Vdsm is reacting accordingly to this state
> and -as far as I can understand- handling it well.
> 
> Problem is: we should not end up in this state in the first place.
> QEMU/libvirt should tolerate the unavailability of ISO domain.
> 
> What we can do is:
> 1. review the configuration we use for cdrom devices, check if RHV is
> compliant with best practices
> Once #1 is correct, if we still see this behaviour, we need to

so setting for now for 4.2 and we can decide later based on the result of this investigation.

> 2. file a bug against libvirt/qemu
> 
> > Expected results:
> > 
> > VMs should work fine or at least should put into  
> 
> VM should keep working with I/O error on the inaccessible cdrom devices, and
> this is what we need to check with the two steps above, but I think this is
> the best behaviour we can get in this scenario.

Comment 5 Germano Veit Michel 2017-10-02 23:12:31 UTC
(In reply to Francesco Romani from comment #2)
> What we can do is:
> 1. review the configuration we use for cdrom devices, check if RHV is
> compliant with best practices

Isn't this the same as BZ1207992?


Note You need to log in before you can comment on or make changes to this bug.