Bug 1373075 - [CGROUP]STEAL time doesn't work on POWER
Summary: [CGROUP]STEAL time doesn't work on POWER
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.3
Hardware: ppc64le
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: David Gibson
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-05 06:03 UTC by Min Deng
Modified: 2016-11-18 02:54 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-18 02:54:56 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Min Deng 2016-09-05 06:03:10 UTC
Description of problem:
STEAL time doesn't work on POWER
Version-Release number of selected component (if applicable):
kernel-3.10.0-495.el7.ppc64le
qemu-kvm-rhev-2.6.0-22.el7.ppc64le
RHEL7.3 - with kernel-3.10.0-495.el7.ppc64le
How reproducible:
5/5
Steps to Reproduce:
Settings
#mount -t cgroup -o cpuset cpuset /cgroup
#cd /cgroup
1. Create cgroups
# mkdir cpuset1
2. set cpus/mems
# echo 0 > cpuset1/cpuset.cpus  [1 means the host cpu 1]
# echo 0 > cpuset1/cpuset.mems  [0 means the host numa node 0]
or
# echo 120 > cpuset1/cpuset.cpus    
# echo 1 > cpuset1/cpuset.mems
My hosts
available: 2 nodes (0-1)
node 0 cpus: 0 8 16 24 32 40 48 56
node 0 size: 131072 MB
node 0 free: 121412 MB
node 1 cpus: 64 72 80 88 96 104 112 120
node 1 size: 131072 MB
node 1 free: 126452 MB
node distances:
node   0   1 
  0:  10  40 
  1:  40  10 
3.Boot two guests
/usr/libexec/qemu-kvm -smp 1...
root       6252 41.2  0.6 2446592 1845056 pts/2 SLl+ 01:18   5:06 /usr/libexec/qemu-kvm -name virt-tests-vm1 -sandbox off -machine pseries -nodefaults -vga std -serial unix:/tmp/socket-mazhang,server,nowait -qmp tcp:0:2221,server,nowait -m 2G -smp 1 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=06,disable-legacy=off,disable-modern=on -drive id=drive_disk1,if=none,snapshot=off,aio=threads,file=/home/RHEL2.qcow2 -device scsi-hd,id=disk1,drive=drive_disk1,bootindex=0 -vnc :0 -rtc base=utc,clock=host -boot menu=on -enable-kvm -monitor stdio -device virtio-mouse-pci,id=mouse0 -device virtio-keyboard-pci,id=kbd0 -chardev pty,id=pty0
root       6274 43.1  0.6 2406528 1842496 pts/0 SLl+ 01:18   5:13 /usr/libexec/qemu-kvm -name virt-tests-vm1 -sandbox off -machine pseries -nodefaults -vga std -serial unix:/tmp/socket-mazhang,server,nowait -qmp tcp:0:6661,server,nowait -m 2G -smp 1 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=06,disable-legacy=off,disable-modern=on -drive id=drive_disk1,if=none,snapshot=off,aio=threads,file=/home/RHEL.qcow2 -device scsi-hd,id=disk1,drive=drive_disk1,bootindex=0 -vnc :1 -rtc base=utc,clock=host -boot menu=on -enable-kvm -monitor stdio -device virtio-mouse-pci,id=mouse0 -device virtio-keyboard-pci,id=kbd0 -chardev pty,id=pty0
3. echo two guess pid to tasks
#echo xxx > cpuset1/tasks (contain threads)
4. Run stress in both guests.
EG: for((;;));do x=1;done
5.Check the steal time inside both guests
#top

Actual results:
All st is zero %

Expected results:
The two guests' st time both about 50%

Additional info:
Communicate with x86 guys,they cannot reproduce it on x86 platform.Any issues please let me know.

Comment 3 David Gibson 2016-11-18 02:54:56 UTC
I've discussed this with Paul Mackerras at IBM, and I believe we've determined the cause.  This is a side-effect of the way that in normal operation a POWER host can run more guest threads than host threads - that's because hardware-level threads can be used in the guest, but not in the host, due to restrictions of the virtualization hardware.

More specifically, the dynamic multithreading code we include means that although both VMs are bound to the same host thread, they could actually run on different "subcores" of the host core.  When this is the case, it won't get accounted as  stolen time (the two VMs still may affect each others' performance, but how much depends on whether the workloads on each are using the same functional units in the CPU, so it can't be measured as an amount of time).

The test case will need to be adjusted for Power: there are two obvious ways to do this:

1) Disable dynamic multi-threading:

    echo 0 >/sys/module/kvm_hv/parameters/dynamic_mt_modes

With this executed before performing the test, the stolen time results should be as expected.

2) Increase each VM to 8 threads, and run 8 stress threads on each VM

This ensures that each VM occupies a whole host core, so they can't be packed onto the same core at the same time.


Note You need to log in before you can comment on or make changes to this bug.