Bug 881743
| Summary: | vdsm: vm's are stuck in non-responsive state when there is no connectivity to NFS storage from vds | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||
| Component: | vdsm | Assignee: | Barak <bazulay> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Haim <hateya> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | unspecified | CC: | abaron, bazulay, dyasny, hateya, iheim, lpeer, yeylon, ykaul | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.2.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | infra | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-03-28 08:37:29 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. VMs should be paused, as they are, not non-responsive Omer, Does the above happens due to monitorResponse = -1 returned on these VMs. (In reply to comment #3) > Omer, > > Does the above happens due to monitorResponse = -1 returned on these VMs. yes Dafna - per the above response, Why do you think it is a bug ? the vm's are reported in vds as paused and non-responsive in engine, why do you think that its not a bug? the problem is that vdsm reports both to the engine: "vm is paused and its not responding to the monitor requests" engine consider both paused and non-responsive as a vm status, since only one can be shown to the user, we need to think and agree which status is more important/relevant to show the user, or come with other alternative.. (BTW, usually when user see vm as non-responsive, if you look in vdsm/libvirt it's status will be up/running..) *** Bug 960537 has been marked as a duplicate of this bug. *** |
Created attachment 654291 [details] logs Description of problem: in 3 hosts cluster with 1 NFS storage domain I blocked connectivity to the master storage domain from 2 out of 3 hosts. vm's on one host only are reported as non-responsive. looking at libvirt, we are getting response Version-Release number of selected component (if applicable): vdsm-4.9.6-44.0.el6_3.x86_64 libvirt-0.9.10-21.el6_3.6.x86_64 qemu-img-rhev-0.12.1.2-2.295.el6_3.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64 How reproducible: 100% Steps to Reproduce: 1. in 3 hosts cluster with 1 NFS storage domain block connectivity to storage from 2 of the hosts using iptables 2. 3. Actual results: vms are stuck in non-responsive state in one of the hosts for as long as the storage connectivity is down. Expected results: vms should not be non-responsive Additional info: logs [root@gold-vdsc ~]# virsh -r list Id Name State ---------------------------------------------------- 22 NFS1-11 paused 23 NFS1-5 paused 24 NFS1-18 paused 25 NFS1-20 paused 26 NFS1-1 paused 27 NFS1-4 paused 28 NFS1-6 paused 29 NFS1-9 paused 30 NFS1-16 paused 31 NFS1-17 paused 32 NFS1-12 paused [root@gold-vdsc ~]# vdsClient -s 0 list table 8276b727-a821-487a-bb32-754c1c4bc1a1 24348 NFS1-9 Paused* a28d3c72-60bd-4400-820b-c65db44c407c 18242 NFS1-20 Paused 450583cb-10ec-4807-919a-dcdde8471173 24530 NFS1-16 Paused* 82ab834c-0bb0-4bbd-803f-bf7d359930c8 17732 NFS1-11 Paused* 6dec3ce3-035b-4dfb-be19-108518f51749 24697 NFS1-17 Paused 7ddb10ba-c284-477a-92b5-31acb868df81 24176 NFS1-6 Paused* 8cc2afdd-b255-4c34-93c5-1bf70c823148 17874 NFS1-5 Paused* 975f6346-4772-4446-848e-58b57b7f7b9b 18384 NFS1-1 Paused* 5259613e-5b2c-441e-acdd-7f95680343a1 18051 NFS1-18 Paused* 905f6cad-1571-4c31-bb29-a2fd6cb3e15c 18579 NFS1-4 Paused* 4964332c-ad1e-4efa-85d7-23c0b19f21fe 24863 NFS1-12 Paused* [root@gold-vdsc ~]# virsh -r capabilities <capabilities> <host> <uuid>e4868e64-9e93-48a0-83a7-79d79c518d67</uuid> <cpu> <arch>x86_64</arch> <model>Nehalem</model> <vendor>Intel</vendor> <topology sockets='1' cores='4' threads='1'/> <feature name='rdtscp'/> <feature name='dca'/> <feature name='pdcm'/> <feature name='xtpr'/> <feature name='tm2'/> <feature name='est'/> <feature name='vmx'/> <feature name='ds_cpl'/> <feature name='monitor'/> <feature name='dtes64'/> <feature name='pbe'/> <feature name='tm'/> <feature name='ht'/> <feature name='ss'/> <feature name='acpi'/> <feature name='ds'/> <feature name='vme'/> </cpu> <power_management> <suspend_disk/> </power_management> <migration_features> <live/> <uri_transports> <uri_transport>tcp</uri_transport> </uri_transports> </migration_features> <topology> <cells num='2'> <cell id='0'> <cpus num='4'> <cpu id='0'/> <cpu id='1'/> <cpu id='2'/> <cpu id='3'/> </cpus> </cell> <cell id='1'> <cpus num='4'> <cpu id='4'/> <cpu id='5'/> <cpu id='6'/> <cpu id='7'/> </cpus> </cell> </cells> </topology> <secmodel> <model>selinux</model> <doi>0</doi> </secmodel> </host> <guest> <os_type>hvm</os_type> <arch name='i686'> <wordsize>32</wordsize> <emulator>/usr/libexec/qemu-kvm</emulator> <machine>rhel6.3.0</machine> <machine canonical='rhel6.3.0'>pc</machine> <machine>rhel6.2.0</machine> <machine>rhel6.1.0</machine> <machine>rhel6.0.0</machine> <machine>rhel5.5.0</machine> <machine>rhel5.4.4</machine> <machine>rhel5.4.0</machine> <domain type='qemu'> </domain> <domain type='kvm'> <emulator>/usr/libexec/qemu-kvm</emulator> </domain> </arch> <features> <cpuselection/> <deviceboot/> <pae/> <nonpae/> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> </features> </guest> <guest> <os_type>hvm</os_type> <arch name='x86_64'> <wordsize>64</wordsize> <emulator>/usr/libexec/qemu-kvm</emulator> <machine>rhel6.3.0</machine> <machine canonical='rhel6.3.0'>pc</machine> <machine>rhel6.2.0</machine> <machine>rhel6.1.0</machine> <machine>rhel6.0.0</machine> <machine>rhel5.5.0</machine> <machine>rhel5.4.4</machine> <machine>rhel5.4.0</machine> <domain type='qemu'> </domain> <domain type='kvm'> <emulator>/usr/libexec/qemu-kvm</emulator> </domain> </arch> <features> <cpuselection/> <deviceboot/> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> </features> </guest> </capabilities>