Bug 673070

Summary: Guest VM bad performance and softlocks on a 5.6 host compared to a 5.5 host
Product: Red Hat Enterprise Linux 5 Reporter: Dan Yasny <dyasny>
Component: kvmAssignee: Glauber Costa <gcosta>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.6CC: acathrow, apevec, bcao, cpelland, dmair, iheim, jbrier, kwolf, mkenneth, ovirt-maint, plyons, roland.friedwagner, tburke, virt-maint
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-03 09:54:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 568128, 580948    
Attachments:
Description Flags
sosreport from VM none

Description Dan Yasny 2011-01-27 09:49:19 UTC
Created attachment 475572 [details]
sosreport from VM

Description of problem:
A RHEL5.5 guest running on a RHEL5.5 KVM host was performing fine, until one of the hosts has been upgraded to 5.6.
After that, the guest continuously produces softlock messages and is generally slow. Installation of new VMs is much slower on 5.6 than it is on 5.5.
Tested with a single vCPU - same results.
No evidence of timedrift visible in the guest (running while /bin/true; do date; dleep 1; done  shows no deviation from the timeflow)


Version-Release number of selected component (if applicable):
RHEL5.6 on host
RHEL5.5.z latest on VM


How reproducible:
Always at cust. site

Steps to Reproduce:
1. running RHEV with two hosts in cluster, one host 5.5.Z and the second 5.6, migrate a Linux VM between the hosts, and compare performance
2.
3.
  
Actual results:
softlockups and slowness when VM is migrated to, or started on, a 5.6 host

Expected results:
same or better performance on 5.6, no softlocks

Additional info:
sosreports from VM attached
can this be a regression?

Comment 12 Mike Cao 2011-01-31 10:22:26 UTC
Tried the following senarios ,can not reproduce .

Host A:
# uname -r
2.6.18-194.32.1.el5
# rpm -q kvm
kvm-83-164.el5_5.25
Host B:
# uname -r
2.6.18-237.el5
# rpm -q kvm
kvm-83-224.el5
Guest info:
rhel5.5 x86_64 guest.

steps:
1. Create VM on host A(rhel5.5.z).
2. Start VM on host A(rhel5.5.z).
eg :/usr/libexec/qemu-kvm -m 2G -smp 1 -name rhel5 -uuid
12bb419b-8730-cbbd-b0d9-168fa4225b6d -monitor stdio -boot c -drive
file=/opt/rhel5.225,if=ide,boot=on,format=raw,cache=none,werror=stop -net
nic,macaddr=44:52:00:0e:bf:a1,vlan=0 -net tap,script=/etc/qemu-ifup,vlan=0
-serial pty -parallel none -usb -vnc :1 -vga cirrus -balloon virtio
3. Start listenning port on host B(rhel5.6).
4. Migrate from host A to host B

Actual Results:
guest works fine ,no soft lockup occurs in guest


steps:
1. Create VM on host A(rhel5.5.z).
2. Start it on host B(rhel5.6).
eg:/usr/libexec/qemu-kvm -m 2G -smp 1 -name rhel5 -uuid
12bb419b-8730-cbbd-b0d9-168fa4225b6d -monitor stdio -boot c -drive
file=/opt/rhel5.225,if=ide,boot=on,format=raw,cache=none,werror=stop -net
nic,macaddr=44:52:00:0e:bf:a1,vlan=0 -net tap,script=/etc/qemu-ifup,vlan=0
-serial pty -parallel none -usb -vnc :1 -vga cirrus -balloon virtio

Actual Resutls:
guest works fine ,no soft lockup occurs in the guest


Additional info:
1. update host A's kvm version to kvm-83-164.el5_5.30 .did steps above ,guest
works fine ,no soft lockup occurs.

2.
cpuinfo of host A:
processor       : 3
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 2
model name      : AMD Phenom(tm) 9600B Quad-Core Processor
stepping        : 3
cpu MHz         : 1200.000
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm
cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch
osvw
bogomips        : 4587.41
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate [8]


 cpuinfo of host B:

processor       : 3
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 2
model name      : AMD Phenom(tm) 9600B Quad-Core Processor
stepping        : 3
cpu MHz         : 1200.000
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm
cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch
osvw
bogomips        : 4587.48
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate [8]

Comment 14 John Brier 2011-01-31 15:37:39 UTC
Mike, in comment #12 I noticed you never used kernel 2.6.18-194.26.1.el5 which is what the customer used in all cases. I know Alan said kernel doesn't seem to be related, but if you can maybe it would be worth retesting with that kernel.

Also the kvm version you used on Host A (kvm-83-164.el5_5.25) was a version the customer never reproduced the issue with. They did have the problem with kvm-83-164.el5_5.30, though

kvm-83-224.el5 also had the problem, but only with kernel 2.6.18-194.26.1.el5 and here you were using 2.6.18-237.el5.

Comment 15 Mike Cao 2011-01-31 15:53:47 UTC
(In reply to comment #14)
> Mike, in comment #12 I noticed you never used kernel 2.6.18-194.26.1.el5 which
> is what the customer used in all cases. I know Alan said kernel doesn't seem to
> be related, but if you can maybe it would be worth retesting with that kernel.
> 
will try it tomorrow.

> Also the kvm version you used on Host A (kvm-83-164.el5_5.25) was a version the
> customer never reproduced the issue with. They did have the problem with
> kvm-83-164.el5_5.30, though
> 

I tried host A with kvm-83-164.el5_5.30 ,referring to "additional info" in comment #12.


> kvm-83-224.el5 also had the problem, but only with kernel 2.6.18-194.26.1.el5
> and here you were using 2.6.18-237.el5.

Let me confirm whether I fully understand the issue,
Install a rhel5.5 x86_64 guest on host A(rhel5.5.z),
migrate or start it on host B(rhel5.6) host.
Is that right ? 
Is so ,do you mean I need to use a RHEL5.5 kernel(2.6.18-194.26.1.el5) & RHEL5.6 kvm package for host B(rhel5.6) ?

Comment 16 John Brier 2011-01-31 16:07:59 UTC
(In reply to comment #15)
> > Also the kvm version you used on Host A (kvm-83-164.el5_5.25) was a version the
> > customer never reproduced the issue with. They did have the problem with
> > kvm-83-164.el5_5.30, though
> > 
> 
> I tried host A with kvm-83-164.el5_5.30 ,referring to "additional info" in
> comment #12.

Oh, I missed that.

> > kvm-83-224.el5 also had the problem, but only with kernel 2.6.18-194.26.1.el5
> > and here you were using 2.6.18-237.el5.
> 
> Let me confirm whether I fully understand the issue,
> Install a rhel5.5 x86_64 guest on host A(rhel5.5.z),
> migrate or start it on host B(rhel5.6) host.
> Is that right ? 
> Is so ,do you mean I need to use a RHEL5.5 kernel(2.6.18-194.26.1.el5) &
> RHEL5.6 kvm package for host B(rhel5.6) ?

I don't think where you install the guest really matters. We just know that they had problems on one host with a certain combination of kvm/kernel/vdsm and not on another combination. The issue isn't really specific to migrating, they just used that to troubleshoot since they had no issues on the one host that was never upgraded, but did have issues on the host that *was* upgraded.

And yes, you are correct, RHEL 5.5 kernel and RHEL 5.6 kvm packages. See comment #4 also comment #7 adds some detail to the story.

Comment 17 Mike Cao 2011-02-01 05:34:16 UTC
(In reply to comment #16)

> And yes, you are correct, RHEL 5.5 kernel and RHEL 5.6 kvm packages. See
> comment #4 also comment #7 adds some detail to the story.

Tried on following configurations
# uname -r
2.6.18-194.26.1.el5
# rpm -q kvm
kvm-83-224.el5

Actual Results:

Can NOT probe kvm,ksm and kvm_amd modules on kernel 2.6.18-194.26.1.el5.
# insmod kvm.ko 
insmod: error inserting 'kvm.ko': -1 Unknown symbol in module
# insmod ksm.ko 
insmod: error inserting 'ksm.ko': -1 Unknown symbol in module
# insmod kvm-amd.ko 
insmod: error inserting 'kvm-amd.ko': -1 Unknown symbol in module

W/O kvm module ,there is no doubt guest will have a very bad performance.
update the kernel to rhel5.6 should solve it .

bcao--->jbrier

John,Could you check the modules whether exsits after updating kvm packages ?

Comment 18 Mike Cao 2011-02-01 06:06:09 UTC
(In reply to comment #17)

> 
> bcao--->jbrier
> 
> John,Could you check the modules whether exsits after updating kvm packages ?

Hi, John.

I also tried following senarios:
host A: 
#uname -r
2.6.18-194.26.1.el5
#rpm -q kvm
kvm-83-164.el5_5.30.x86_64

then update kvm to kvm-83-224.el5 & without host reboot.

this means the guest uses kvm-83-224 version qemu-userspace and kvm-83-164.el5 kvm-modules in kernel .I started a VM ,and guest works well ,no soft lockup occurs.(I don't think it is a senario we supported)

If reboot the host,
there is no kvm,kvm-amd and ksm modules probed in kernel ,as the condition I described in comment #17.