Hide Forgot
Description of problem: Running a "make -j25" upstream kernel compilation takes about factor 10 more time in a RHEL-7 guest than on its Fedora-19 host on an Intel Hexacore i7-3930K. Running 12 processes with busy loops (ie. no IO) saturates the guests VCPUs but not the hosts coresm which are still idling at ~50%. Version-Release number of selected component (if applicable): qemu-kvm-1.4.2-12.fc19.x86_64 How reproducible: Create a RHEL-7 guest on an Intel Hexacore with 12 VCPUS pinned to 12 hyperthreads (mapped to 6 cores in one socket), 16G memory for the guest (32G physical RAM on the host) with qcow2 backup file as virtio system disk for the guest. Steps to Reproduce: 1. Make usptream kernel source tree accessible to guest (NFS or copy to /tmp doesn't matter in my case!) 2. in kernel source topdir: make distclean; make localmodconfig;make -j 25 3. check with top in guest and host Actual results: Builds slow with guest idling at ~80% and host at ~77% Expected results: Guest VCPUS _and_ host Cores all saturated. Build only some one digit percentage slower than on the host. Additional info:
There was a lot of investigation on original BZ1002621. Can you update this one with the result? After all we found that original bug report is misleading (the reason for tmpfs compile is ccache misconfiguration) and network is the real bottleneck, but you copied original bug report verbatim.
(In reply to Gleb Natapov from comment #1) > There was a lot of investigation on original BZ1002621. Can you update this > one with the result? After all we found that original bug report is > misleading (the reason for tmpfs compile is ccache misconfiguration) and > network is the real bottleneck, but you copied original bug report verbatim. https://bugzilla.redhat.com/show_bug.cgi?id=1002621#c44 shows this best: Evidence is, that networking constraints are the core of the issue: I disabled ccache completely in a test series: 1. on the host with a kernel build tree on local ext4 2. fs shared build tree via NFS locally (ie. nfs client and server on the host) 3. fs shared build tree via NFS on the VM (NFS server on the host as before) The NFSv4 mount options have been the same in case 2 and 3: Results of a "make clean ; make -j 25": 1. 2m8s 2. 2m38s 3. 6m39s BTW: putting the workset in tmpfs on the VM: 2m24s (good vs. 1 above) So the virtio newtworking (with 64K jumbo frames) seems to be the bottleneck with a factor _10_ difference in ops as per nfsiostat in case 2 compared to 3. How can the virtio bottleneck be analyzed further and eased?
Postponed to 7.1, as this is not about a regression, and we are too late.
Moving to 7.3. I don't think I'll get to it in 7.2 timeframe.
Here are latest RHEL7.3 results and test steps Host: qemu-kvm-rhev-2.6.0-10.el7.x86_64 kernel-3.10.0-461.el7.x86_64 MemTotal: 32727644 kB CPU(s): 64 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 4 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz guest: kernel-3.10.0-461.el7.x86_64 16G memory 16 VCPUS Steps: 1.boot a RHEL7.3 guest: /usr/libexec/qemu-kvm -machine accel=kvm -name RHEL-7.3 -S -machine pc,accel=kvm,usb=off -cpu SandyBridge -m 16384 -smp 16,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 3bf7f3da-8b6f-eb49-6086-f63952b29ed1 -no-user-config -nodefaults -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/RHEL-Server-7.3-64-virtio.raw,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:19:af:71,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc :0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -monitor stdio 2. git clone RHEL7.3 kernel to /home in guest git clone git://git.app.eng.bos.redhat.com/rhel7.git git checkout kernel-3.10.0-461.el7 3.copy /boot/config-3.10.0-461.el7.x86_64 in host to kernel source code dir 4. make distclean time make -j 16/64 Results: time make -j 16 in guest and host respectively --------------------------------------------- time | guest cpu idle | host cpu idle --------------------------------------------- guest | 10m34s| ~1% | ~75% --------------------------------------------- host: | 8m45s | x | ~71% time make -j 64 in guest and host respectively --------------------------------------------- time | guest cpu idle | host cpu idle --------------------------------------------- guest | 11m13s| ~0% | ~75% --------------------------------------------- host: | 4m21s | x | ~0%
There issue was that an nfs mounted source tree was slow to build. The original bug describes a host system acting as an NFS server to the guest running on it. The guest then mounts the source tree and builds it. A suggested config (the one I used to reproduce) was a private bridge with jumbo mtu configured on it as well as on the tap devices and virtio devices. What I just thought of and didn't consider before was whether NFSv4 or v3 was used and whether TCP or UDP protocols were used. -vlad