Bug 707202 - Hang up with virsh destroy a VM when the connection of nfs storage is blocked
Summary: Hang up with virsh destroy a VM when the connection of nfs storage is blocked
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.2
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-24 11:47 UTC by Nan Zhang
Modified: 2011-07-12 06:03 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-25 03:37:42 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Nan Zhang 2011-05-24 11:47:50 UTC
Description of problem:
Attach a nfs volume to a VM, and block the connection of the nfs by using iptables, to destroy the VM will cause libvirt hang up.

Version-Release number of selected component (if applicable):
libvirt-0.9.1-1.el6.x86_64

How reproducible:
always

Steps to Reproduce:
# virsh start vm1
Domain vm1 started

# virsh list
 Id Name                 State
----------------------------------
  2 vm1                  running

# iptables -A OUTPUT -d 10.66.90.115 -p tcp --dport 2049 -j DROP

# LIBVIRT_DEBUG=1 virsh destroy vm1
...
06:54:07.385: 3261: debug : virDomainLookupByName:1995 : conn=0x14f34e0, name=vm1
06:54:07.385: 3261: debug : remoteIO:10761 : Do proc=23 serial=3 length=36 wait=(nil)
06:54:07.385: 3261: debug : remoteIO:10833 : We have the buck 23 0x14f37f0 0x14f37f0
06:54:07.385: 3261: debug : virEventPollUpdateHandle:144 : Update handle w=2 e=0
06:54:07.385: 3261: debug : virEventPollInterruptLocked:686 : Skip interrupt, 0 0
06:54:07.386: 3261: debug : remoteIODecodeMessageLength:10153 : Got length, now need 84 total (80 more)
06:54:07.386: 3261: debug : remoteIOEventLoop:10687 : Giving up the buck 23 0x14f37f0 (nil)
06:54:07.386: 3261: debug : virEventPollUpdateHandle:144 : Update handle w=2 e=1
06:54:07.386: 3261: debug : virEventPollInterruptLocked:686 : Skip interrupt, 0 0
06:54:07.386: 3261: debug : remoteIO:10861 : All done with our call 23 (nil) 0x14f37f0
06:54:07.386: 3261: debug : virDomainDestroy:2040 : dom=0x14f3330, (VM: name=vm1, uuid=4ab7ea78-40b0-475d-9f32-9397d87a76d5), 
06:54:07.386: 3261: debug : remoteIO:10761 : Do proc=12 serial=4 length=56 wait=(nil)
06:54:07.386: 3261: debug : remoteIO:10833 : We have the buck 12 0x14f37f0 0x14f37f0
06:54:07.386: 3261: debug : virEventPollUpdateHandle:144 : Update handle w=2 e=0
06:54:07.386: 3261: debug : virEventPollInterruptLocked:686 : Skip interrupt, 0 0

# iptables -D OUTPUT -d 10.66.90.115 -p tcp --dport 2049 -j DROP

# virsh domstate vm1
shut off
  
Actual results:
Hang up with virsh destroy command.

Expected results:
Destroy should be fail, and the VM state is still running.

Additional info:

Comment 2 Daniel Berrangé 2011-05-24 14:30:50 UTC
> Attach a nfs volume to a VM, and block the connection of the nfs by using
> iptables, to destroy the VM will cause libvirt hang up.

This is expected behaviour with NFS by default.  NFS mounts default to using the 'hard' flag. This means that if the NFS server goes away (eg due to network connectivity lost / blocked), the client will retry indefinitely. The application (like libvirt) does not have any say in this matter, the kernel retries forever and the system won't ever return an error to userspace.

If the NFS mount uses 'soft' flag, then the kernel will timeout after N retries (controlled by the retrans mount flag), and return an error to userspace.

When you kill a guest with libvirt, one of the things it has todo is restore security labelling. THis obviously involves I/O operations, so if the NFS server is blocked/dead and 'hard' mount option is set, then libvirtd will "hang" until the NFS server recovers. If you mount with 'soft', then libvirt shouldn't "hang", but introduces some risk to data integrity.


Note You need to log in before you can comment on or make changes to this bug.