Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 970179

Summary:

After hypervisor upgrade creation time of virtual machines is longer than expected.

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Josh Carter <jocarter>

Component:

ovirt-engine

Assignee:

Nobody's working on this, feel free to take it <nobody>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Severity:

high

Docs Contact:

Priority:

urgent

Version:

3.1.2

CC:

acathrow, amureini, dyasny, iheim, jocarter, lpeer, michal.skrivanek, mkalinin, pstehlik, Rhev-m-bugs, yeylon, ykaul

Target Milestone:

---

Target Release:

3.2.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

storage

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-06-13 18:35:38 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
spm-vdsm-log	none
rhevm-engine	none
iostat output	none
watcher script output	none

Description Josh Carter 2013-06-03 15:40:34 UTC

Description of problem:

Customer upgraded rhev hypervisors from:
[root@up-rhevh-2 ~]# cat /etc/redhat-release
Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130318.0.bz918541.el6_4)
[root@up-rhevh-2 ~]# rpm -qa|grep vdsm
vdsm-cli-4.10.2-1.8.el6_4.bz918541.noarch
vdsm-4.10.2-1.8.el6_4.bz918541.x86_64
vdsm-hook-vhostmd-4.10.2-1.8.el6_4.bz918541.noarch
vdsm-python-4.10.2-1.8.el6_4.bz918541.x86_64
vdsm-xmlrpc-4.10.2-1.8.el6_4.bz918541.noarch
vdsm-reg-4.10.2-1.8.el6_4.bz918541.noarch
[root@up-rhevh-2 ~]# rpm -qa|grep libvirt
libvirt-lock-sanlock-0.10.2-18.el6.x86_64
libvirt-client-0.10.2-18.el6.x86_64
libvirt-python-0.10.2-18.el6.x86_64
libvirt-cim-0.6.1-4.el6.x86_64
libvirt-0.10.2-18.el6.x86_64

To the current level at :

[root@up-rhevh-01 ~]# cat /etc/redhat-release
Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130501.0.el6_4)
[root@up-rhevh-01 ~]# rpm -qa|grep vdsm
vdsm-cli-4.10.2-1.13.el6ev.noarch
vdsm-4.10.2-1.13.el6ev.x86_64
vdsm-hook-vhostmd-4.10.2-1.13.el6ev.noarch
vdsm-xmlrpc-4.10.2-1.13.el6ev.noarch
vdsm-reg-4.10.2-1.13.el6ev.noarch
vdsm-python-4.10.2-1.13.el6ev.x86_64
[root@up-rhevh-01 ~]# rpm -qa|grep libvirt
libvirt-lock-sanlock-0.10.2-18.el6_4.4.x86_64
libvirt-client-0.10.2-18.el6_4.4.x86_64
libvirt-python-0.10.2-18.el6_4.4.x86_64
libvirt-cim-0.6.1-4.el6.x86_64
libvirt-0.10.2-18.el6_4.4.x86_64

The customer upgraded his rhev hypervisors to the levels posted in the case description and is unable to deploy virtual machines in a timely manner. We ruled out the Cloudforms 2.0 by using the following scripts with curl to the api to create vm's. 

#!/bin/bash

NUM_OF_VMS=8

for i in `seq 1 $NUM_OF_VMS`;
do
  ID=$(uuidgen)
  echo $ID
  curl -X POST -H "Accept: application/xml" -H "Content-Type: application/xml" \
   -u admin@internal:redhat \
   --cacert /etc/pki/ovirt-engine/ca.pem \
   -d "<vm id='$ID'><name>start-test-$i</name><cluster><name>Default</name></cluster><template><name>vm1-test-template</name></template></vm>" \
    https://rhevdm31.rhev.gsslab.rdu.redhat.com/api/vms &
done

Tt took 11 minutes before the VM's were "startable" from the admin portal . The template I used is probably the worse case test. I used the template for creating our database machines. It provisions 2 50GB disk with 1 core and 3GB memory.

I ran a second test with a template that is more "standard" in our environment. It has 1 50GB disk, 6GB of memory and 1 core. It only took 6 minutes. 

All of the virtual machine templates are using thin provisioned disk and after discussing this case with other rhev engineers the times reports are slow. 

Customer has also stated that he is seeing alot of events saying that vm's are not responding.



Customer is also seeing the following db connection requests that are failing but after talking to Derrick this seems to be resolved in the .50 release that I am currently trying to get the customer to upgrade to. 


2013-05-30 10:40:41,871 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-56) Timer update runtimeinfo failed. Exception:: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDB
C Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:/ENGINEDataSource



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Josh Carter 2013-06-03 21:44:18 UTC

Created attachment 756543 [details]
spm-vdsm-log

Comment 2 Josh Carter 2013-06-03 21:45:25 UTC

Created attachment 756544 [details]
rhevm-engine

Comment 4 Michal Skrivanek 2013-06-04 06:15:01 UTC

there seems to be some storage responsiveness issue - pending customer test results. Other than that the VM creation seems to work ok. 
Moving to storage

Comment 5 Marina Kalinin 2013-06-04 16:49:02 UTC

Created attachment 756885 [details]
iostat output

Comment 6 Marina Kalinin 2013-06-04 16:51:54 UTC

Looking at iostat, we can see svctm gets high.
[mku@mku-ws]$ awk '($11 > 5) {print $0}; /util/ {print $0}' iostat.out
...  some output ....
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
dm-380            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.53   5.34   0.00
dm-381            0.00     0.00    0.00    0.00     0.00     0.00     8.24     0.00   12.16  10.57   0.00
dm-382            0.00     0.00    6.70   10.65   127.03   135.52    30.27     0.32   18.65   5.51   9.56
dm-385            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00   11.13  10.95   0.00
dm-416            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.39   5.15   0.00
dm-417            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.20   5.04   0.00
dm-418            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    6.99   6.80   0.00
dm-419            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    6.32   6.05   0.00
dm-421            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    6.37   6.13   0.00
dm-422            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    7.17   6.74   0.00
dm-424            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    6.18   5.94   0.00
dm-425            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.29   5.10   0.00
dm-17             0.00     0.00    0.30    0.00     0.50     0.00     3.33     0.01   25.00  25.00   0.75
dm-20             0.00     0.00    0.80    0.00   200.20     0.00   500.00     0.03   35.38  12.75   1.02
.....

Comment 7 Marina Kalinin 2013-06-05 13:34:08 UTC

After working with Storage team, we confirmed this is not a storage issue.
As well, we can see here: http://pastebin.test.redhat.com/145578 vdsm flow, seems like all lvcreate and qemu-img create commands are performed really fast.

The question is why does it take vdsm so much time?
We asked the customer to run the script introduced on this article: https://access.redhat.com/site/articles/279063 while he is running the api create script and we are waiting for his results.

We are trying now to see if their dns setup set correctly, if vdsClient and sudo commands outside vdsm run fast enough.
https://access.redhat.com/site/solutions/35304

Comment 8 Marina Kalinin 2013-06-05 14:50:38 UTC

Dns confirmed working properly.
Attaching the script results for perfoemance tests.

Comment 9 Marina Kalinin 2013-06-05 14:51:33 UTC

Created attachment 757251 [details]
watcher script output

Comment 10 Michal Skrivanek 2013-06-07 08:37:46 UTC

I don't see anything exceptional in the setup. Maybe high memory consumption of vdsm, don't know if that's normal but even if not the CPU is fine - so there doesn't seem to be an issue of vdsm code.

did you try to run those storage-related externals command separately and they indeed returned quickly enough?

can we attach strace to the vdsm process and check whether it's not blocked in some syscall?

Comment 12 Marina Kalinin 2013-06-13 18:35:38 UTC

After upgrading to 3.2 GA all the issues resolved.
Closing the case.

Comment 13 Marina Kalinin 2013-06-27 21:42:49 UTC

Unfortunately, the issue was not resolved.
But seems like we narrowed it down to this BZ#979193, so I decided to leave this one closed and open a new one.