Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 881743

Summary: vdsm: vm's are stuck in non-responsive state when there is no connectivity to NFS storage from vds
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Barak <bazulay>
Status: CLOSED NOTABUG QA Contact: Haim <hateya>
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: abaron, bazulay, dyasny, hateya, iheim, lpeer, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-28 08:37:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-11-29 13:57:48 UTC
Created attachment 654291 [details]
logs

Description of problem:

in 3 hosts cluster with 1 NFS storage domain I blocked connectivity to the master storage domain from 2 out of 3 hosts. 
vm's on one host only are reported as non-responsive. 

looking at libvirt, we are getting response 

Version-Release number of selected component (if applicable):

vdsm-4.9.6-44.0.el6_3.x86_64
libvirt-0.9.10-21.el6_3.6.x86_64
qemu-img-rhev-0.12.1.2-2.295.el6_3.5.x86_64
qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64

How reproducible:

100%

Steps to Reproduce:
1. in 3 hosts cluster with 1 NFS storage domain block connectivity to storage from 2 of the hosts using iptables 
2.
3.
  
Actual results:

vms are stuck in non-responsive state in one of the hosts for as long as the storage connectivity is down. 

Expected results:

vms should not be non-responsive

Additional info: logs



    [root@gold-vdsc ~]# virsh -r list
    Id Name State
    ----------------------------------------------------
    22 NFS1-11 paused
    23 NFS1-5 paused
    24 NFS1-18 paused
    25 NFS1-20 paused
    26 NFS1-1 paused
    27 NFS1-4 paused
    28 NFS1-6 paused
    29 NFS1-9 paused
    30 NFS1-16 paused
    31 NFS1-17 paused
    32 NFS1-12 paused
     
     
    [root@gold-vdsc ~]# vdsClient -s 0 list table
    8276b727-a821-487a-bb32-754c1c4bc1a1 24348 NFS1-9 Paused*
    a28d3c72-60bd-4400-820b-c65db44c407c 18242 NFS1-20 Paused
    450583cb-10ec-4807-919a-dcdde8471173 24530 NFS1-16 Paused*
    82ab834c-0bb0-4bbd-803f-bf7d359930c8 17732 NFS1-11 Paused*
    6dec3ce3-035b-4dfb-be19-108518f51749 24697 NFS1-17 Paused
    7ddb10ba-c284-477a-92b5-31acb868df81 24176 NFS1-6 Paused*
    8cc2afdd-b255-4c34-93c5-1bf70c823148 17874 NFS1-5 Paused*
    975f6346-4772-4446-848e-58b57b7f7b9b 18384 NFS1-1 Paused*
    5259613e-5b2c-441e-acdd-7f95680343a1 18051 NFS1-18 Paused*
    905f6cad-1571-4c31-bb29-a2fd6cb3e15c 18579 NFS1-4 Paused*
    4964332c-ad1e-4efa-85d7-23c0b19f21fe 24863 NFS1-12 Paused*
     
     
     
    [root@gold-vdsc ~]# virsh -r capabilities
    <capabilities>
     
    <host>
    <uuid>e4868e64-9e93-48a0-83a7-79d79c518d67</uuid>
    <cpu>
    <arch>x86_64</arch>
    <model>Nehalem</model>
    <vendor>Intel</vendor>
    <topology sockets='1' cores='4' threads='1'/>
    <feature name='rdtscp'/>
    <feature name='dca'/>
    <feature name='pdcm'/>
    <feature name='xtpr'/>
    <feature name='tm2'/>
    <feature name='est'/>
    <feature name='vmx'/>
    <feature name='ds_cpl'/>
    <feature name='monitor'/>
    <feature name='dtes64'/>
    <feature name='pbe'/>
    <feature name='tm'/>
    <feature name='ht'/>
    <feature name='ss'/>
    <feature name='acpi'/>
    <feature name='ds'/>
    <feature name='vme'/>
    </cpu>
    <power_management>
    <suspend_disk/>
    </power_management>
    <migration_features>
    <live/>
    <uri_transports>
    <uri_transport>tcp</uri_transport>
    </uri_transports>
    </migration_features>
    <topology>
    <cells num='2'>
    <cell id='0'>
    <cpus num='4'>
    <cpu id='0'/>
    <cpu id='1'/>
    <cpu id='2'/>
    <cpu id='3'/>
    </cpus>
    </cell>
    <cell id='1'>
    <cpus num='4'>
    <cpu id='4'/>
    <cpu id='5'/>
    <cpu id='6'/>
    <cpu id='7'/>
    </cpus>
    </cell>
    </cells>
    </topology>
    <secmodel>
    <model>selinux</model>
    <doi>0</doi>
    </secmodel>
    </host>
     
    <guest>
    <os_type>hvm</os_type>
    <arch name='i686'>
    <wordsize>32</wordsize>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <machine>rhel6.3.0</machine>
    <machine canonical='rhel6.3.0'>pc</machine>
    <machine>rhel6.2.0</machine>
    <machine>rhel6.1.0</machine>
    <machine>rhel6.0.0</machine>
    <machine>rhel5.5.0</machine>
    <machine>rhel5.4.4</machine>
    <machine>rhel5.4.0</machine>
    <domain type='qemu'>
    </domain>
    <domain type='kvm'>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    </domain>
    </arch>
    <features>
    <cpuselection/>
    <deviceboot/>
    <pae/>
    <nonpae/>
    <acpi default='on' toggle='yes'/>
    <apic default='on' toggle='no'/>
    </features>
    </guest>
     
    <guest>
    <os_type>hvm</os_type>
    <arch name='x86_64'>
    <wordsize>64</wordsize>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <machine>rhel6.3.0</machine>
    <machine canonical='rhel6.3.0'>pc</machine>
    <machine>rhel6.2.0</machine>
    <machine>rhel6.1.0</machine>
    <machine>rhel6.0.0</machine>
    <machine>rhel5.5.0</machine>
    <machine>rhel5.4.4</machine>
    <machine>rhel5.4.0</machine>
    <domain type='qemu'>
    </domain>
    <domain type='kvm'>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    </domain>
    </arch>
    <features>
    <cpuselection/>
    <deviceboot/>
    <acpi default='on' toggle='yes'/>
    <apic default='on' toggle='no'/>
    </features>
    </guest>
     
    </capabilities>

Comment 1 RHEL Program Management 2012-12-14 07:50:22 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 2 Dan Yasny 2013-03-01 12:06:34 UTC
VMs should be paused, as they are, not non-responsive

Comment 3 Barak 2013-03-07 12:52:35 UTC
Omer,

Does the above happens due to monitorResponse = -1 returned on these VMs.

Comment 4 Omer Frenkel 2013-03-07 15:26:07 UTC
(In reply to comment #3)
> Omer,
> 
> Does the above happens due to monitorResponse = -1 returned on these VMs.

yes

Comment 5 Barak 2013-03-24 13:09:12 UTC
Dafna - per the above response,
Why do you think it is a bug ?

Comment 6 Dafna Ron 2013-03-24 13:33:28 UTC
the vm's are reported in vds as paused and non-responsive in engine, why do you think that its not a bug?

Comment 7 Omer Frenkel 2013-03-24 13:57:19 UTC
the problem is that vdsm reports both to the engine: "vm is paused and its not responding to the monitor requests"
engine consider both paused and non-responsive as a vm status, since only one can be shown to the user, we need to think and agree which status is more important/relevant to show the user, or come with other alternative..

(BTW, usually when user see vm as non-responsive, if you look in vdsm/libvirt it's status will be up/running..)

Comment 8 Omer Frenkel 2013-05-08 08:59:02 UTC
*** Bug 960537 has been marked as a duplicate of this bug. ***