This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 788471 - [libvirt] libvirt process stuck in D status when blocking the connection to the storage
[libvirt] libvirt process stuck in D status when blocking the connection to t...
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: libvirt (Show other bugs)
16
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Michal Privoznik
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-02-08 04:20 EST by Kiril Nesenko
Modified: 2014-07-10 20:08 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-10-20 19:01:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
logs (648.37 KB, application/x-bzip)
2012-02-08 04:22 EST, Kiril Nesenko
no flags Details

  None (edit)
Description Kiril Nesenko 2012-02-08 04:20:53 EST
Description of problem:
I have a host with NFS storage and I am blocking a connection to the storage. After a while libvirt stuck in D status. 

root     10367  0.0  0.1 1044980 17872 ?       DLl  Feb06   0:54 libvirtd --daemon --listen

Logs:

2012-02-08 08:27:47.748+0000: 10367: debug : qemuProcessKill:3227 : vm=rhel_ha pid=26459 gracefully=0
2012-02-08 08:27:47.748+0000: 10367: debug : qemuProcessAutoDestroyRemove:3749 : vm=rhel_ha uuid=1b3db5d8-5baf-46c7-b993-db3f7fc98482
2012-02-08 08:53:48.449+0000: 10367: warning : SELinuxRestoreSecurityFileLabel:519 : cannot resolve symlink /rhev/data-center/cd84d709-d762-4df6-9667-a7d0981bd8ed/3041dbba-225f-400f-ad32-09314284553e/images/9d934674-d74f-476e-8bdc-5131c0763dc0/9f8070a4-0cb5-457d-9e9c-390317a04f40: Input/output error
2012-02-08 08:53:48.450+0000: 10367: debug : virCgroupNew:602 : New group /libvirt/qemu/rhel_ha

Version-Release number of selected component (if applicable):
libvirt-0.9.6-4.fc16.x86_64
vdsm-4.9.3.2-0.fc16.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Run VMs on the host and block the connection to the storage. 
2.
3.
  
Actual results:
libvirt process stuck in D status when blocking the connection to the storage 

Expected results:
libvirt should not stuck



Additional info:
Comment 1 Kiril Nesenko 2012-02-08 04:22:15 EST
Created attachment 560196 [details]
logs
Comment 2 Kiril Nesenko 2012-02-09 07:41:47 EST
Tested the issue on downstream with the same scenario. libvrtid stops logging, but libvirt process stays in S status.

root      2497  0.0  0.0 925324 15516 ?        SLsl Feb06   0:02 /usr/sbin/libvirtd --listen
qemu     25491  1.4  1.7 1050600 286912 ?      Sl   14:06   0:25 /usr/libexec/qemu-kvm -S -M rhel6.2.0 -cpu Conroe -enable-kvm -m 500 -smp 1,sockets=1,cores=1,threads=1 -name pin_to_host -uuid da6438b3-0ee4-457b-9ca1-07b7e54b458a -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6Server-6.2.0.3.el6,serial=0BACEBFC-EE0D-11DF-89EA-E41F13CC3360_00:10:18:53:D5:94,uuid=da6438b3-0ee4-457b-9ca1-07b7e54b458a -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/pin_to_host.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2012-02-09T12:06:51,driftfix=slew -no-shutdown -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/rhev/data-center/7e1e7286-d526-4a5d-9068-bec94ea32665/5b99aaf3-cb8e-421e-8d28-c028c082d913/images/e5303dfd-d4c5-423a-9861-be93589e349e/7a223e92-8ffa-4a2a-ae88-69db5192c811,if=none,id=drive-ide0-0-0,format=qcow2,serial=3a-9861-be93589e349e,cache=none,werror=stop,rerror=stop,aio=threads -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:23:71:07,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/pin_to_host.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -usb -spice port=5900,tls-port=5901,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=inputs -k en-us -vga qxl -global qxl-vga.vram_size=67108864
qemu     25622  1.5  1.7 1050600 289972 ?      Sl   14:06   0:27 /usr/libexec/qemu-kvm -S -M rhel6.2.0 -cpu Conroe -enable-kvm -m 500 -smp 1,sockets=1,cores=1,threads=1 -name rhel_ha -uuid 1b3db5d8-5baf-46c7-b993-db3f7fc98482 -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6Server-6.2.0.3.el6,serial=0BACEBFC-EE0D-11DF-89EA-E41F13CC3360_00:10:18:53:D5:94,uuid=1b3db5d8-5baf-46c7-b993-db3f7fc98482 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel_ha.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2012-02-09T12:06:52,driftfix=slew -no-shutdown -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/rhev/data-center/7e1e7286-d526-4a5d-9068-bec94ea32665/5b99aaf3-cb8e-421e-8d28-c028c082d913/images/9d934674-d74f-476e-8bdc-5131c0763dc0/9f8070a4-0cb5-457d-9e9c-390317a04f40,if=none,id=drive-ide0-0-0,format=qcow2,serial=6e-8bdc-5131c0763dc0,cache=none,werror=stop,rerror=stop,aio=threads -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:23:71:03,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/rhel_ha.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -usb -spice port=5904,tls-port=5905,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=inputs -k en-us -vga qxl -global qxl-vga.vram_size=67108864
root     28755  0.0  0.0 103304   884 pts/2    S+   14:37   0:00 grep libvirt

Versions:
libvirt-python-0.9.4-23.el6.x86_64
libvirt-client-0.9.4-23.el6.x86_64
libvirt-0.9.4-23.el6.x86_64
Comment 3 Michal Privoznik 2012-02-13 12:54:24 EST
Kiril, from the attached logs I can see at 8:27:47.563 your machine 'rhel_ha' died (we've received EOF on the monitor). Libvirt reacts to this, and among other things, it tries to restore security selinux labels on disks used by domain.
That is - we try to access that NFS you've just cut off. AFAIK, NFS are mounted with 'soft' argument. That is, NFS should timeout after a while (2 minutes by default) unless using TCP which itself has it's own (much longer) timeouts.

During this, the process accessing dead NFS is put into D state; There is no way for process to defend that.

IIUC, after ~26 minutes, libvirt became responsible again. Am I right?

On the other hand, libvirt doesn't need to restore selinux labels on NFS; But to be able to tell if a file is on NFS, we should be able to stat() it. And I am afraid calling stat() on dead NFS will put us in the D state either.
Comment 4 Michal Privoznik 2012-02-13 12:57:05 EST
What can be however done, is tuning up NFS mount options: timeo, retrans; but most of all - switch to proto=UDP instead of TCP.

Does this help?
Comment 5 Kiril Nesenko 2012-02-14 07:53:38 EST
(In reply to comment #3)
> Kiril, from the attached logs I can see at 8:27:47.563 your machine 'rhel_ha'
> died (we've received EOF on the monitor). Libvirt reacts to this, and among
> other things, it tries to restore security selinux labels on disks used by
> domain.
> That is - we try to access that NFS you've just cut off. AFAIK, NFS are mounted
> with 'soft' argument. That is, NFS should timeout after a while (2 minutes by
> default) unless using TCP which itself has it's own (much longer) timeouts.
> 
> During this, the process accessing dead NFS is put into D state; There is no
> way for process to defend that.
> 
> IIUC, after ~26 minutes, libvirt became responsible again. Am I right?
>

Right it became responsive.

> On the other hand, libvirt doesn't need to restore selinux labels on NFS; But
> to be able to tell if a file is on NFS, we should be able to stat() it. And I
> am afraid calling stat() on dead NFS will put us in the D state either.

After unblocking the connection, libvirt goes to the normal process state.
Comment 7 Cole Robinson 2012-10-20 19:01:01 EDT
Reading over this it doesn't sound like there's anything to do here on the libvirt side, we are acting in accordance with the nfs mount options. Closing as NOTABUG, but please reopen if I've missed something.

Note You need to log in before you can comment on or make changes to this bug.