Bug 927143

Summary: [vdsm] ShutdownVM fails after plugging shared disk to 2 vms at once due to 'Bad File Descriptor' in vdsm
Product: Red Hat Enterprise Virtualization Manager Reporter: Gadi Ickowicz <gickowic>
Component: vdsmAssignee: Saveliev Peter <peet>
Status: CLOSED ERRATA QA Contact: Gadi Ickowicz <gickowic>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: abaron, bazulay, danken, hateya, iheim, lpeer, michal.skrivanek, nlevinki, sgrinber, ykaul, zdover
Target Milestone: ---Keywords: Regression
Target Release: 3.2.0Flags: sgrinber: Triaged+
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, hot unplugging disks caused VDSM to stop communicating with guest agents. The logic of the virtual-machine-cleanup code and the hot-unplugging code has now been separated. Now when disks are hot-unplugged, VDSM does not touch guest-agent communication channels. When disks are hot-unplugged, VDSM removes only the detached disk from the virtual machine. This allows VDSM to continue communicating with guest agents after hot-unplugging of virtual disks.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-06-10 20:46:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm, engine, libvirt logs none

Description Gadi Ickowicz 2013-03-25 08:00:16 UTC
Created attachment 715892 [details]
vdsm, engine, libvirt logs

Description of problem:
Trying to shutdown a vm using using ShutdownVM command fails due to:
Thread-188981::ERROR::2013-03-25 09:21:05,057::guestIF::269::vm.Vm::(desktopShutdown) vmId=`3bfd8c39-acc2-4cb3-9e6a-36b4b386e2f4`::desktopShutdown failed
Traceback (most recent call last):
  File "/usr/share/vdsm/guestIF.py", line 267, in desktopShutdown
    self._forward('shutdown', {'timeout': timeout, 'message': msg})
  File "/usr/share/vdsm/guestIF.py", line 135, in _forward
    self._sock.send(message)
  File "/usr/lib64/python2.6/socket.py", line 167, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

in vdsm.

The scenario used is described below

Version-Release number of selected component (if applicable):
vdsm-4.10.2-11.0.el6ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Have 2 vms with guest-agent and RHEL installed
2. Plug in the same shareable disk (in addition to each VMs OS disk) to both vms
3. Power vms on
4. Deactivate disk from one of the vms
5. attempt to shutdown vm that still has disk attached
  
Actual results:
Shutdown fails - command is never sent to guest-agent

Expected results:
VM should shutdown gracefully

Additional info:

Comment 3 Dan Kenigsberg 2013-04-02 10:27:31 UTC
This sounds related to the funny call to Vm._cleanup() on hotunplug: http://gerrit.ovirt.org/12564

Could you comment-out that call in vdsm, and retry the scenario?

Comment 4 Saveliev Peter 2013-04-04 07:57:52 UTC
On a disk hot unplug VDSM closes the communication channel to the guest agent and this is a bug, that's for sure (reproduced and will be fixed)

But Gadi, are you sure that VDSM also closes the channel to the agent on the VM, from where a disk is *not* detached? I can not reproduce this particular case in both cases (2 VMs on the same hypervizor and 2 VMs on 2 hypervizors, detach disc from one VM and shut down other VM — all works OK for the second VM)

Comment 5 Gadi Ickowicz 2013-04-04 08:45:15 UTC
(In reply to comment #4)
> On a disk hot unplug VDSM closes the communication channel to the guest
> agent and this is a bug, that's for sure (reproduced and will be fixed)
> 
> But Gadi, are you sure that VDSM also closes the channel to the agent on the
> VM, from where a disk is *not* detached? I can not reproduce this particular
> case in both cases (2 VMs on the same hypervizor and 2 VMs on 2 hypervizors,
> detach disc from one VM and shut down other VM — all works OK for the second
> VM)

You are right Peter - the scenario actually shuts down the VM that had the disk unplugged first.

Comment 6 Saveliev Peter 2013-04-04 08:46:41 UTC
Great, thanks.

Comment 7 Saveliev Peter 2013-04-10 10:16:38 UTC
Merged as ovirt:435568c89df238f24e39b74d2441f925581d3c87

Preparing the backport

Comment 8 Saveliev Peter 2013-04-10 13:51:34 UTC
Merged as rhev:4b7a7b882d965c3b35f7f44cb42422ca0be671d6

Comment 10 Gadi Ickowicz 2013-04-14 11:30:00 UTC
Verified on SI13.1 -  vdsm-4.10.2-15.0.el6ev.x86_64 - ran automated test that reproduced scenario and shutdownVm worked successfully.

Comment 11 Gadi Ickowicz 2013-04-14 11:47:14 UTC
(In reply to comment #10)
> Verified on SF13.1 -  vdsm-4.10.2-15.0.el6ev.x86_64 - ran automated test
> that reproduced scenario and shutdownVm worked successfully.
** SF13.1

Comment 13 Saveliev Peter 2013-04-15 08:14:12 UTC
Doc text provided, see the field

Comment 15 errata-xmlrpc 2013-06-10 20:46:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0886.html