Bug 707197 - takes between 1-3 minutes to start a VM non-bootable disk
Summary: takes between 1-3 minutes to start a VM non-bootable disk
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Osier Yang
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 682015
TreeView+ depends on / blocked
 
Reported: 2011-05-24 11:37 UTC by Haim
Modified: 2014-01-13 00:49 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-07 14:00:06 UTC
Target Upstream Version:


Attachments (Terms of Use)
vdsm logs. (990.57 KB, application/x-gzip)
2011-05-24 11:37 UTC, Haim
no flags Details
libvirt logs (179.41 KB, application/x-gzip)
2011-05-31 13:51 UTC, Haim
no flags Details

Description Haim 2011-05-24 11:37:21 UTC
Created attachment 500587 [details]
vdsm logs.

Description of problem:

noticed that it takes more then usual to start a new VM, it differs between 1-3 minutes, on a very high performance machines. 
from investigation I did, it appears that thread goes to sleep for a minute after performing memTestAndCommit check, and after almost a minute, continues to VM wrapper thread, which prepares the volume, creates the XML, and sends to libvirt.

also, performed some code modification on order to isolate this issue and following are the results:

1) disabled resource locking in memTestAndCommit function under /usr/share/vdsm/clientIf.py


 def memTestAndCommit(self, newVm):
        """
        Test if enough memory is available for new VM (currently unused)
        """
        #self._memLock.acquire()

using script, started bunch of disk-less VMs, it took a minute to start each VM.

pleas refer to 'vdsm-disable_lock.log' - attached. 

2) disabled relative 'if' that call this function in /usr/share/vdsm/vm.py


    def _startUnderlyingVm(self):
        try:        
            if 'recover' not in self.conf:
               #if not self.cif.memTestAndCommit(self):
               #     self.setDownStatus(ERROR,
               #                        'Out of memory - machine not created')
               #     return
            
                rc, err = self._prepostVmScript('pre_vm')

using script, started a bunch of disk-less VMs, it took a minute to start 10 vms. 

please refer to 'vdsm-disable_if.log' - attached. 

not sure if its a regression.

Comment 3 Haim 2011-05-31 13:51:07 UTC
Created attachment 502004 [details]
libvirt logs

Comment 5 Michal Privoznik 2011-06-07 09:47:30 UTC
Haim, which version of libvirt you were running? Log inspection shows, we wait too long on file descriptors to became available for I/O, although they are, e.g.:

17:45:48.577: 1961: debug : virExecWithHook:761 : LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root USER=root LOGNAME=root /usr/libexec/qemu-kvm -device ? -device pci-assign,? -device virtio-blk-pci,? -device virtio-net-pci,?
17:45:48.578: 1961: debug : virCommandRunAsync:1214 : Command result 0, with PID 5486
....
17:46:01.303: 1961: debug : virCommandRun:1084 : Result status 0, stdout: '' stderr: 'name "pci-bridge", bus PCI


17:46:01.316: 1961: debug : virExecWithHook:761 : LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root USER=root LOGNAME=root /usr/libexec/qemu-kvm -device ? -device pci-assign,? -device virtio-blk-pci,? -device virtio-net-pci,?
17:46:01.316: 1961: debug : virCommandRunAsync:1214 : Command result 0, with PID 5694
...
17:46:19.330: 1961: debug : virCommandRun:1084 : Result status 0, stdout: '' stderr: 'name "pci-bridge", bus PCI


17:46:19.626: 1961: debug : virJSONValueToString:1062 : result={"execute":"qmp_capabilities"}
17:46:19.626: 1961: debug : virEventPollUpdateHandle:144 : Update handle w=31 e=15
17:46:19.626: 1961: debug : virEventPollInterruptLocked:690 : Interrupting
...
17:46:27.300: 1961: debug : virEventPollInterruptLocked:690 : Interrupting
17:46:27.300: 1961: debug : virJSONValueFromString:933 : string={"return": {}}


Moreover, notice (it is not shown here, but in attached logs) when qemu outputs on stdout we pick output immediately.

Comment 6 Haim 2011-06-07 20:36:45 UTC
(In reply to comment #5)
> Haim, which version of libvirt you were running? Log inspection shows, we wait
> too long on file descriptors to became available for I/O, although they are,
> e.g.:
> 
> 17:45:48.577: 1961: debug : virExecWithHook:761 : LC_ALL=C
> PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root USER=root LOGNAME=root
> /usr/libexec/qemu-kvm -device ? -device pci-assign,? -device virtio-blk-pci,?
> -device virtio-net-pci,?
> 17:45:48.578: 1961: debug : virCommandRunAsync:1214 : Command result 0, with
> PID 5486
> ....
> 17:46:01.303: 1961: debug : virCommandRun:1084 : Result status 0, stdout: ''
> stderr: 'name "pci-bridge", bus PCI
> 
> 
> 17:46:01.316: 1961: debug : virExecWithHook:761 : LC_ALL=C
> PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root USER=root LOGNAME=root
> /usr/libexec/qemu-kvm -device ? -device pci-assign,? -device virtio-blk-pci,?
> -device virtio-net-pci,?
> 17:46:01.316: 1961: debug : virCommandRunAsync:1214 : Command result 0, with
> PID 5694
> ...
> 17:46:19.330: 1961: debug : virCommandRun:1084 : Result status 0, stdout: ''
> stderr: 'name "pci-bridge", bus PCI
> 
> 
> 17:46:19.626: 1961: debug : virJSONValueToString:1062 :
> result={"execute":"qmp_capabilities"}
> 17:46:19.626: 1961: debug : virEventPollUpdateHandle:144 : Update handle w=31
> e=15
> 17:46:19.626: 1961: debug : virEventPollInterruptLocked:690 : Interrupting
> ...
> 17:46:27.300: 1961: debug : virEventPollInterruptLocked:690 : Interrupting
> 17:46:27.300: 1961: debug : virJSONValueFromString:933 : string={"return": {}}
> 
> 
> Moreover, notice (it is not shown here, but in attached logs) when qemu outputs
> on stdout we pick output immediately.


libvirt-0.9.1-1.el6.x86_64

Comment 7 Michal Privoznik 2011-06-08 09:19:38 UTC
Well, I can't see any problem in the code, but I might be wrong. What I still think of is qemu might be slow. I mean - it can, under some circumstances, take long qemu to write output and quit, which result in slow libvirt. So Haim, would you mind reproduce this, and when you reach the point, where everything is so lazy, just run qemu manually:

LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root USER=root LOGNAME=root /usr/libexec/qemu-kvm -device ? -device pci-assign,? -device virtio-blk-pci,? -device virtio-net-pci,?

This write something on stderr. If this command runs more than, say 5 seconds, I believe this is qemu issue. Thanks.


Note You need to log in before you can comment on or make changes to this bug.