Bug 707202

Summary: Hang up with virsh destroy a VM when the connection of nfs storage is blocked
Product: Red Hat Enterprise Linux 6 Reporter: Nan Zhang <nzhang>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2CC: berrange, dallan, dyuan
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-25 03:37:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Nan Zhang 2011-05-24 11:47:50 UTC
Description of problem:
Attach a nfs volume to a VM, and block the connection of the nfs by using iptables, to destroy the VM will cause libvirt hang up.

Version-Release number of selected component (if applicable):
libvirt-0.9.1-1.el6.x86_64

How reproducible:
always

Steps to Reproduce:
# virsh start vm1
Domain vm1 started

# virsh list
 Id Name                 State
----------------------------------
  2 vm1                  running

# iptables -A OUTPUT -d 10.66.90.115 -p tcp --dport 2049 -j DROP

# LIBVIRT_DEBUG=1 virsh destroy vm1
...
06:54:07.385: 3261: debug : virDomainLookupByName:1995 : conn=0x14f34e0, name=vm1
06:54:07.385: 3261: debug : remoteIO:10761 : Do proc=23 serial=3 length=36 wait=(nil)
06:54:07.385: 3261: debug : remoteIO:10833 : We have the buck 23 0x14f37f0 0x14f37f0
06:54:07.385: 3261: debug : virEventPollUpdateHandle:144 : Update handle w=2 e=0
06:54:07.385: 3261: debug : virEventPollInterruptLocked:686 : Skip interrupt, 0 0
06:54:07.386: 3261: debug : remoteIODecodeMessageLength:10153 : Got length, now need 84 total (80 more)
06:54:07.386: 3261: debug : remoteIOEventLoop:10687 : Giving up the buck 23 0x14f37f0 (nil)
06:54:07.386: 3261: debug : virEventPollUpdateHandle:144 : Update handle w=2 e=1
06:54:07.386: 3261: debug : virEventPollInterruptLocked:686 : Skip interrupt, 0 0
06:54:07.386: 3261: debug : remoteIO:10861 : All done with our call 23 (nil) 0x14f37f0
06:54:07.386: 3261: debug : virDomainDestroy:2040 : dom=0x14f3330, (VM: name=vm1, uuid=4ab7ea78-40b0-475d-9f32-9397d87a76d5), 
06:54:07.386: 3261: debug : remoteIO:10761 : Do proc=12 serial=4 length=56 wait=(nil)
06:54:07.386: 3261: debug : remoteIO:10833 : We have the buck 12 0x14f37f0 0x14f37f0
06:54:07.386: 3261: debug : virEventPollUpdateHandle:144 : Update handle w=2 e=0
06:54:07.386: 3261: debug : virEventPollInterruptLocked:686 : Skip interrupt, 0 0

# iptables -D OUTPUT -d 10.66.90.115 -p tcp --dport 2049 -j DROP

# virsh domstate vm1
shut off
  
Actual results:
Hang up with virsh destroy command.

Expected results:
Destroy should be fail, and the VM state is still running.

Additional info:

Comment 2 Daniel Berrangé 2011-05-24 14:30:50 UTC
> Attach a nfs volume to a VM, and block the connection of the nfs by using
> iptables, to destroy the VM will cause libvirt hang up.

This is expected behaviour with NFS by default.  NFS mounts default to using the 'hard' flag. This means that if the NFS server goes away (eg due to network connectivity lost / blocked), the client will retry indefinitely. The application (like libvirt) does not have any say in this matter, the kernel retries forever and the system won't ever return an error to userspace.

If the NFS mount uses 'soft' flag, then the kernel will timeout after N retries (controlled by the retrans mount flag), and return an error to userspace.

When you kill a guest with libvirt, one of the things it has todo is restore security labelling. THis obviously involves I/O operations, so if the NFS server is blocked/dead and 'hard' mount option is set, then libvirtd will "hang" until the NFS server recovers. If you mount with 'soft', then libvirt shouldn't "hang", but introduces some risk to data integrity.