Bug 970179 - After hypervisor upgrade creation time of virtual machines is longer than expected.
After hypervisor upgrade creation time of virtual machines is longer than exp...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.2
x86_64 Linux
urgent Severity high
: ---
: 3.2.0
Assigned To: Nobody's working on this, feel free to take it
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-03 11:40 EDT by Josh Carter
Modified: 2016-02-10 13:07 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-06-13 14:35:38 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
spm-vdsm-log (12.07 MB, text/x-log)
2013-06-03 17:44 EDT, Josh Carter
no flags Details
rhevm-engine (3.62 MB, text/x-log)
2013-06-03 17:45 EDT, Josh Carter
no flags Details
iostat output (2.91 MB, text/plain)
2013-06-04 12:49 EDT, Marina
no flags Details
watcher script output (16.97 MB, application/x-gzip)
2013-06-05 10:51 EDT, Marina
no flags Details

  None (edit)
Description Josh Carter 2013-06-03 11:40:34 EDT
Description of problem:

Customer upgraded rhev hypervisors from:
[root@up-rhevh-2 ~]# cat /etc/redhat-release
Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130318.0.bz918541.el6_4)
[root@up-rhevh-2 ~]# rpm -qa|grep vdsm
vdsm-cli-4.10.2-1.8.el6_4.bz918541.noarch
vdsm-4.10.2-1.8.el6_4.bz918541.x86_64
vdsm-hook-vhostmd-4.10.2-1.8.el6_4.bz918541.noarch
vdsm-python-4.10.2-1.8.el6_4.bz918541.x86_64
vdsm-xmlrpc-4.10.2-1.8.el6_4.bz918541.noarch
vdsm-reg-4.10.2-1.8.el6_4.bz918541.noarch
[root@up-rhevh-2 ~]# rpm -qa|grep libvirt
libvirt-lock-sanlock-0.10.2-18.el6.x86_64
libvirt-client-0.10.2-18.el6.x86_64
libvirt-python-0.10.2-18.el6.x86_64
libvirt-cim-0.6.1-4.el6.x86_64
libvirt-0.10.2-18.el6.x86_64

To the current level at :

[root@up-rhevh-01 ~]# cat /etc/redhat-release
Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130501.0.el6_4)
[root@up-rhevh-01 ~]# rpm -qa|grep vdsm
vdsm-cli-4.10.2-1.13.el6ev.noarch
vdsm-4.10.2-1.13.el6ev.x86_64
vdsm-hook-vhostmd-4.10.2-1.13.el6ev.noarch
vdsm-xmlrpc-4.10.2-1.13.el6ev.noarch
vdsm-reg-4.10.2-1.13.el6ev.noarch
vdsm-python-4.10.2-1.13.el6ev.x86_64
[root@up-rhevh-01 ~]# rpm -qa|grep libvirt
libvirt-lock-sanlock-0.10.2-18.el6_4.4.x86_64
libvirt-client-0.10.2-18.el6_4.4.x86_64
libvirt-python-0.10.2-18.el6_4.4.x86_64
libvirt-cim-0.6.1-4.el6.x86_64
libvirt-0.10.2-18.el6_4.4.x86_64

The customer upgraded his rhev hypervisors to the levels posted in the case description and is unable to deploy virtual machines in a timely manner. We ruled out the Cloudforms 2.0 by using the following scripts with curl to the api to create vm's. 

#!/bin/bash

NUM_OF_VMS=8

for i in `seq 1 $NUM_OF_VMS`;
do
  ID=$(uuidgen)
  echo $ID
  curl -X POST -H "Accept: application/xml" -H "Content-Type: application/xml" \
   -u admin@internal:redhat \
   --cacert /etc/pki/ovirt-engine/ca.pem \
   -d "<vm id='$ID'><name>start-test-$i</name><cluster><name>Default</name></cluster><template><name>vm1-test-template</name></template></vm>" \
    https://rhevdm31.rhev.gsslab.rdu.redhat.com/api/vms &
done

Tt took 11 minutes before the VM's were "startable" from the admin portal . The template I used is probably the worse case test. I used the template for creating our database machines. It provisions 2 50GB disk with 1 core and 3GB memory.

I ran a second test with a template that is more "standard" in our environment. It has 1 50GB disk, 6GB of memory and 1 core. It only took 6 minutes. 

All of the virtual machine templates are using thin provisioned disk and after discussing this case with other rhev engineers the times reports are slow. 

Customer has also stated that he is seeing alot of events saying that vm's are not responding.



Customer is also seeing the following db connection requests that are failing but after talking to Derrick this seems to be resolved in the .50 release that I am currently trying to get the customer to upgrade to. 


2013-05-30 10:40:41,871 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-56) Timer update runtimeinfo failed. Exception:: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDB
C Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:/ENGINEDataSource



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Josh Carter 2013-06-03 17:44:18 EDT
Created attachment 756543 [details]
spm-vdsm-log
Comment 2 Josh Carter 2013-06-03 17:45:25 EDT
Created attachment 756544 [details]
rhevm-engine
Comment 4 Michal Skrivanek 2013-06-04 02:15:01 EDT
there seems to be some storage responsiveness issue - pending customer test results. Other than that the VM creation seems to work ok. 
Moving to storage
Comment 5 Marina 2013-06-04 12:49:02 EDT
Created attachment 756885 [details]
iostat output
Comment 6 Marina 2013-06-04 12:51:54 EDT
Looking at iostat, we can see svctm gets high.
[mku@mku-ws]$ awk '($11 > 5) {print $0}; /util/ {print $0}' iostat.out
...  some output ....
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
dm-380            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.53   5.34   0.00
dm-381            0.00     0.00    0.00    0.00     0.00     0.00     8.24     0.00   12.16  10.57   0.00
dm-382            0.00     0.00    6.70   10.65   127.03   135.52    30.27     0.32   18.65   5.51   9.56
dm-385            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00   11.13  10.95   0.00
dm-416            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.39   5.15   0.00
dm-417            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.20   5.04   0.00
dm-418            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    6.99   6.80   0.00
dm-419            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    6.32   6.05   0.00
dm-421            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    6.37   6.13   0.00
dm-422            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    7.17   6.74   0.00
dm-424            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    6.18   5.94   0.00
dm-425            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.29   5.10   0.00
dm-17             0.00     0.00    0.30    0.00     0.50     0.00     3.33     0.01   25.00  25.00   0.75
dm-20             0.00     0.00    0.80    0.00   200.20     0.00   500.00     0.03   35.38  12.75   1.02
.....
Comment 7 Marina 2013-06-05 09:34:08 EDT
After working with Storage team, we confirmed this is not a storage issue.
As well, we can see here: http://pastebin.test.redhat.com/145578 vdsm flow, seems like all lvcreate and qemu-img create commands are performed really fast.

The question is why does it take vdsm so much time?
We asked the customer to run the script introduced on this article: https://access.redhat.com/site/articles/279063 while he is running the api create script and we are waiting for his results.

We are trying now to see if their dns setup set correctly, if vdsClient and sudo commands outside vdsm run fast enough.
https://access.redhat.com/site/solutions/35304
Comment 8 Marina 2013-06-05 10:50:38 EDT
Dns confirmed working properly.
Attaching the script results for perfoemance tests.
Comment 9 Marina 2013-06-05 10:51:33 EDT
Created attachment 757251 [details]
watcher script output
Comment 10 Michal Skrivanek 2013-06-07 04:37:46 EDT
I don't see anything exceptional in the setup. Maybe high memory consumption of vdsm, don't know if that's normal but even if not the CPU is fine - so there doesn't seem to be an issue of vdsm code.

did you try to run those storage-related externals command separately and they indeed returned quickly enough?

can we attach strace to the vdsm process and check whether it's not blocked in some syscall?
Comment 12 Marina 2013-06-13 14:35:38 EDT
After upgrading to 3.2 GA all the issues resolved.
Closing the case.
Comment 13 Marina 2013-06-27 17:42:49 EDT
Unfortunately, the issue was not resolved.
But seems like we narrowed it down to this BZ#979193, so I decided to leave this one closed and open a new one.

Note You need to log in before you can comment on or make changes to this bug.