Bug 970179 - After hypervisor upgrade creation time of virtual machines is longer than expected.
Summary: After hypervisor upgrade creation time of virtual machines is longer than exp...
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
(Show other bugs)
Version: 3.1.2
Hardware: x86_64 Linux
urgent
high
Target Milestone: ---
: 3.2.0
Assignee: Nobody's working on this, feel free to take it
QA Contact:
URL:
Whiteboard: storage
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-03 15:40 UTC by Josh Carter
Modified: 2018-12-01 14:37 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-06-13 18:35:38 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
spm-vdsm-log (12.07 MB, text/x-log)
2013-06-03 21:44 UTC, Josh Carter
no flags Details
rhevm-engine (3.62 MB, text/x-log)
2013-06-03 21:45 UTC, Josh Carter
no flags Details
iostat output (2.91 MB, text/plain)
2013-06-04 16:49 UTC, Marina
no flags Details
watcher script output (16.97 MB, application/x-gzip)
2013-06-05 14:51 UTC, Marina
no flags Details

Description Josh Carter 2013-06-03 15:40:34 UTC
Description of problem:

Customer upgraded rhev hypervisors from:
[root@up-rhevh-2 ~]# cat /etc/redhat-release
Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130318.0.bz918541.el6_4)
[root@up-rhevh-2 ~]# rpm -qa|grep vdsm
vdsm-cli-4.10.2-1.8.el6_4.bz918541.noarch
vdsm-4.10.2-1.8.el6_4.bz918541.x86_64
vdsm-hook-vhostmd-4.10.2-1.8.el6_4.bz918541.noarch
vdsm-python-4.10.2-1.8.el6_4.bz918541.x86_64
vdsm-xmlrpc-4.10.2-1.8.el6_4.bz918541.noarch
vdsm-reg-4.10.2-1.8.el6_4.bz918541.noarch
[root@up-rhevh-2 ~]# rpm -qa|grep libvirt
libvirt-lock-sanlock-0.10.2-18.el6.x86_64
libvirt-client-0.10.2-18.el6.x86_64
libvirt-python-0.10.2-18.el6.x86_64
libvirt-cim-0.6.1-4.el6.x86_64
libvirt-0.10.2-18.el6.x86_64

To the current level at :

[root@up-rhevh-01 ~]# cat /etc/redhat-release
Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130501.0.el6_4)
[root@up-rhevh-01 ~]# rpm -qa|grep vdsm
vdsm-cli-4.10.2-1.13.el6ev.noarch
vdsm-4.10.2-1.13.el6ev.x86_64
vdsm-hook-vhostmd-4.10.2-1.13.el6ev.noarch
vdsm-xmlrpc-4.10.2-1.13.el6ev.noarch
vdsm-reg-4.10.2-1.13.el6ev.noarch
vdsm-python-4.10.2-1.13.el6ev.x86_64
[root@up-rhevh-01 ~]# rpm -qa|grep libvirt
libvirt-lock-sanlock-0.10.2-18.el6_4.4.x86_64
libvirt-client-0.10.2-18.el6_4.4.x86_64
libvirt-python-0.10.2-18.el6_4.4.x86_64
libvirt-cim-0.6.1-4.el6.x86_64
libvirt-0.10.2-18.el6_4.4.x86_64

The customer upgraded his rhev hypervisors to the levels posted in the case description and is unable to deploy virtual machines in a timely manner. We ruled out the Cloudforms 2.0 by using the following scripts with curl to the api to create vm's. 

#!/bin/bash

NUM_OF_VMS=8

for i in `seq 1 $NUM_OF_VMS`;
do
  ID=$(uuidgen)
  echo $ID
  curl -X POST -H "Accept: application/xml" -H "Content-Type: application/xml" \
   -u admin@internal:redhat \
   --cacert /etc/pki/ovirt-engine/ca.pem \
   -d "<vm id='$ID'><name>start-test-$i</name><cluster><name>Default</name></cluster><template><name>vm1-test-template</name></template></vm>" \
    https://rhevdm31.rhev.gsslab.rdu.redhat.com/api/vms &
done

Tt took 11 minutes before the VM's were "startable" from the admin portal . The template I used is probably the worse case test. I used the template for creating our database machines. It provisions 2 50GB disk with 1 core and 3GB memory.

I ran a second test with a template that is more "standard" in our environment. It has 1 50GB disk, 6GB of memory and 1 core. It only took 6 minutes. 

All of the virtual machine templates are using thin provisioned disk and after discussing this case with other rhev engineers the times reports are slow. 

Customer has also stated that he is seeing alot of events saying that vm's are not responding.



Customer is also seeing the following db connection requests that are failing but after talking to Derrick this seems to be resolved in the .50 release that I am currently trying to get the customer to upgrade to. 


2013-05-30 10:40:41,871 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-56) Timer update runtimeinfo failed. Exception:: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDB
C Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:/ENGINEDataSource



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Josh Carter 2013-06-03 21:44:18 UTC
Created attachment 756543 [details]
spm-vdsm-log

Comment 2 Josh Carter 2013-06-03 21:45:25 UTC
Created attachment 756544 [details]
rhevm-engine

Comment 4 Michal Skrivanek 2013-06-04 06:15:01 UTC
there seems to be some storage responsiveness issue - pending customer test results. Other than that the VM creation seems to work ok. 
Moving to storage

Comment 5 Marina 2013-06-04 16:49:02 UTC
Created attachment 756885 [details]
iostat output

Comment 6 Marina 2013-06-04 16:51:54 UTC
Looking at iostat, we can see svctm gets high.
[mku@mku-ws]$ awk '($11 > 5) {print $0}; /util/ {print $0}' iostat.out
...  some output ....
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
dm-380            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.53   5.34   0.00
dm-381            0.00     0.00    0.00    0.00     0.00     0.00     8.24     0.00   12.16  10.57   0.00
dm-382            0.00     0.00    6.70   10.65   127.03   135.52    30.27     0.32   18.65   5.51   9.56
dm-385            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00   11.13  10.95   0.00
dm-416            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.39   5.15   0.00
dm-417            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.20   5.04   0.00
dm-418            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    6.99   6.80   0.00
dm-419            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    6.32   6.05   0.00
dm-421            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    6.37   6.13   0.00
dm-422            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    7.17   6.74   0.00
dm-424            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    6.18   5.94   0.00
dm-425            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.29   5.10   0.00
dm-17             0.00     0.00    0.30    0.00     0.50     0.00     3.33     0.01   25.00  25.00   0.75
dm-20             0.00     0.00    0.80    0.00   200.20     0.00   500.00     0.03   35.38  12.75   1.02
.....

Comment 7 Marina 2013-06-05 13:34:08 UTC
After working with Storage team, we confirmed this is not a storage issue.
As well, we can see here: http://pastebin.test.redhat.com/145578 vdsm flow, seems like all lvcreate and qemu-img create commands are performed really fast.

The question is why does it take vdsm so much time?
We asked the customer to run the script introduced on this article: https://access.redhat.com/site/articles/279063 while he is running the api create script and we are waiting for his results.

We are trying now to see if their dns setup set correctly, if vdsClient and sudo commands outside vdsm run fast enough.
https://access.redhat.com/site/solutions/35304

Comment 8 Marina 2013-06-05 14:50:38 UTC
Dns confirmed working properly.
Attaching the script results for perfoemance tests.

Comment 9 Marina 2013-06-05 14:51:33 UTC
Created attachment 757251 [details]
watcher script output

Comment 10 Michal Skrivanek 2013-06-07 08:37:46 UTC
I don't see anything exceptional in the setup. Maybe high memory consumption of vdsm, don't know if that's normal but even if not the CPU is fine - so there doesn't seem to be an issue of vdsm code.

did you try to run those storage-related externals command separately and they indeed returned quickly enough?

can we attach strace to the vdsm process and check whether it's not blocked in some syscall?

Comment 12 Marina 2013-06-13 18:35:38 UTC
After upgrading to 3.2 GA all the issues resolved.
Closing the case.

Comment 13 Marina 2013-06-27 21:42:49 UTC
Unfortunately, the issue was not resolved.
But seems like we narrowed it down to this BZ#979193, so I decided to leave this one closed and open a new one.


Note You need to log in before you can comment on or make changes to this bug.