Bug 1026808 - qemu-kvm slow RHEL-7 guest
qemu-kvm slow RHEL-7 guest
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.0
x86_64 Linux
unspecified Severity high
: rc
: 7.0
Assigned To: Vlad Yasevich
Virtualization Bugs
: Reopened
Depends On: 1002621
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-05 08:30 EST by Heinz Mauelshagen
Modified: 2017-01-30 12:04 EST (History)
28 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1002621
Environment:
Last Closed: 2017-01-30 12:04:10 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Heinz Mauelshagen 2013-11-05 08:30:35 EST
Description of problem:
Running a "make -j25" upstream kernel compilation takes about factor 10 more time in a RHEL-7 guest than on its Fedora-19 host on an Intel Hexacore i7-3930K.
Running 12 processes with busy loops (ie. no IO) saturates the guests VCPUs but
not the hosts coresm which are still idling at ~50%.

Version-Release number of selected component (if applicable):
qemu-kvm-1.4.2-12.fc19.x86_64

How reproducible:
Create a RHEL-7 guest on an Intel Hexacore with 12 VCPUS pinned to 12 hyperthreads (mapped to 6 cores in one socket), 16G memory for the guest (32G physical RAM on the host) with qcow2 backup file as virtio system disk for the guest.

Steps to Reproduce:
1. Make usptream kernel source tree accessible to guest
   (NFS or copy to /tmp doesn't matter in my case!)
2. in kernel source topdir: make distclean; make localmodconfig;make -j 25
3. check with top in guest and host

Actual results:
Builds slow with guest idling at ~80% and host at ~77%

Expected results:
Guest VCPUS _and_ host Cores all saturated. Build only some one digit percentage slower than on the host.

Additional info:
Comment 1 Gleb Natapov 2013-11-05 09:11:42 EST
There was a lot of investigation on original BZ1002621. Can you update this one with the result? After all we found that original bug report is misleading (the reason for tmpfs compile is ccache misconfiguration) and network is the real bottleneck, but you copied original bug report verbatim.
Comment 2 Heinz Mauelshagen 2013-11-05 09:37:45 EST
(In reply to Gleb Natapov from comment #1)
> There was a lot of investigation on original BZ1002621. Can you update this
> one with the result? After all we found that original bug report is
> misleading (the reason for tmpfs compile is ccache misconfiguration) and
> network is the real bottleneck, but you copied original bug report verbatim.


https://bugzilla.redhat.com/show_bug.cgi?id=1002621#c44 shows this best:

Evidence is, that networking constraints are the core of the issue:

I disabled ccache completely in a test series:

1. on the host with a kernel build tree on local ext4
2. fs shared build tree via NFS locally (ie. nfs client and server on the host)
3. fs shared build tree via NFS on the VM (NFS server on the host as before)

The NFSv4 mount options have been the same in case 2 and 3:

Results of a "make clean ; make -j 25":

1. 2m8s
2. 2m38s
3. 6m39s

BTW: putting the workset in tmpfs on the VM: 2m24s (good vs. 1 above)

So the virtio newtworking (with 64K jumbo frames) seems to be the bottleneck with a factor _10_ difference in ops as per nfsiostat in case 2 compared to 3.

How can the virtio bottleneck be analyzed further and eased?
Comment 3 Ronen Hod 2014-02-25 12:43:24 EST
Postponed to 7.1, as this is not about a regression, and we are too late.
Comment 6 Vlad Yasevich 2015-08-26 20:06:59 EDT
Moving to 7.3.  I don't think I'll get to it in 7.2 timeframe.
Comment 13 Yanhui Ma 2016-07-05 06:01:35 EDT
Here are latest RHEL7.3 results and test steps

Host:
qemu-kvm-rhev-2.6.0-10.el7.x86_64
kernel-3.10.0-461.el7.x86_64

MemTotal:       32727644 kB

CPU(s):                64
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             4
Vendor ID:             GenuineIntel
Model name:            Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz

guest:
kernel-3.10.0-461.el7.x86_64
16G memory
16 VCPUS

Steps:
1.boot a RHEL7.3 guest:
/usr/libexec/qemu-kvm -machine accel=kvm -name RHEL-7.3 -S -machine pc,accel=kvm,usb=off -cpu SandyBridge -m 16384 -smp 16,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 3bf7f3da-8b6f-eb49-6086-f63952b29ed1 -no-user-config -nodefaults -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/RHEL-Server-7.3-64-virtio.raw,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:19:af:71,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc :0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -monitor stdio

2. git clone RHEL7.3 kernel to /home in guest
git clone git://git.app.eng.bos.redhat.com/rhel7.git 
git checkout kernel-3.10.0-461.el7

3.copy /boot/config-3.10.0-461.el7.x86_64 in host to kernel source code dir
4. make distclean
   time make -j 16/64

Results:
time make -j 16 in guest and host respectively
---------------------------------------------
        time  | guest cpu idle | host cpu idle 
---------------------------------------------
guest | 10m34s|    ~1%         |  ~75%          
---------------------------------------------
host: | 8m45s |     x          |  ~71%



time make -j 64 in guest and host respectively
---------------------------------------------
        time  | guest cpu idle | host cpu idle 
---------------------------------------------
guest | 11m13s|    ~0%         |  ~75%         
---------------------------------------------
host: | 4m21s |    x           |  ~0%
Comment 15 Vlad Yasevich 2016-07-06 13:01:17 EDT
There issue was that an nfs mounted source tree was slow to build.

The original bug describes a host system acting as an NFS server to the guest running on it.  The guest then mounts the source tree and builds it.

A suggested config (the one I used to reproduce) was a private bridge with jumbo
mtu configured on it as well as on the tap devices and virtio devices.

What I just thought of and didn't consider before was whether NFSv4 or v3 was used and whether TCP or UDP protocols were used.

-vlad

Note You need to log in before you can comment on or make changes to this bug.