Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1373075

Summary: [CGROUP]STEAL time doesn't work on POWER
Product: Red Hat Enterprise Linux 7 Reporter: Min Deng <mdeng>
Component: qemu-kvm-rhevAssignee: David Gibson <dgibson>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: knoel, qzhang, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-18 02:54:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Min Deng 2016-09-05 06:03:10 UTC
Description of problem:
STEAL time doesn't work on POWER
Version-Release number of selected component (if applicable):
kernel-3.10.0-495.el7.ppc64le
qemu-kvm-rhev-2.6.0-22.el7.ppc64le
RHEL7.3 - with kernel-3.10.0-495.el7.ppc64le
How reproducible:
5/5
Steps to Reproduce:
Settings
#mount -t cgroup -o cpuset cpuset /cgroup
#cd /cgroup
1. Create cgroups
# mkdir cpuset1
2. set cpus/mems
# echo 0 > cpuset1/cpuset.cpus  [1 means the host cpu 1]
# echo 0 > cpuset1/cpuset.mems  [0 means the host numa node 0]
or
# echo 120 > cpuset1/cpuset.cpus    
# echo 1 > cpuset1/cpuset.mems
My hosts
available: 2 nodes (0-1)
node 0 cpus: 0 8 16 24 32 40 48 56
node 0 size: 131072 MB
node 0 free: 121412 MB
node 1 cpus: 64 72 80 88 96 104 112 120
node 1 size: 131072 MB
node 1 free: 126452 MB
node distances:
node   0   1 
  0:  10  40 
  1:  40  10 
3.Boot two guests
/usr/libexec/qemu-kvm -smp 1...
root       6252 41.2  0.6 2446592 1845056 pts/2 SLl+ 01:18   5:06 /usr/libexec/qemu-kvm -name virt-tests-vm1 -sandbox off -machine pseries -nodefaults -vga std -serial unix:/tmp/socket-mazhang,server,nowait -qmp tcp:0:2221,server,nowait -m 2G -smp 1 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=06,disable-legacy=off,disable-modern=on -drive id=drive_disk1,if=none,snapshot=off,aio=threads,file=/home/RHEL2.qcow2 -device scsi-hd,id=disk1,drive=drive_disk1,bootindex=0 -vnc :0 -rtc base=utc,clock=host -boot menu=on -enable-kvm -monitor stdio -device virtio-mouse-pci,id=mouse0 -device virtio-keyboard-pci,id=kbd0 -chardev pty,id=pty0
root       6274 43.1  0.6 2406528 1842496 pts/0 SLl+ 01:18   5:13 /usr/libexec/qemu-kvm -name virt-tests-vm1 -sandbox off -machine pseries -nodefaults -vga std -serial unix:/tmp/socket-mazhang,server,nowait -qmp tcp:0:6661,server,nowait -m 2G -smp 1 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=06,disable-legacy=off,disable-modern=on -drive id=drive_disk1,if=none,snapshot=off,aio=threads,file=/home/RHEL.qcow2 -device scsi-hd,id=disk1,drive=drive_disk1,bootindex=0 -vnc :1 -rtc base=utc,clock=host -boot menu=on -enable-kvm -monitor stdio -device virtio-mouse-pci,id=mouse0 -device virtio-keyboard-pci,id=kbd0 -chardev pty,id=pty0
3. echo two guess pid to tasks
#echo xxx > cpuset1/tasks (contain threads)
4. Run stress in both guests.
EG: for((;;));do x=1;done
5.Check the steal time inside both guests
#top

Actual results:
All st is zero %

Expected results:
The two guests' st time both about 50%

Additional info:
Communicate with x86 guys,they cannot reproduce it on x86 platform.Any issues please let me know.

Comment 3 David Gibson 2016-11-18 02:54:56 UTC
I've discussed this with Paul Mackerras at IBM, and I believe we've determined the cause.  This is a side-effect of the way that in normal operation a POWER host can run more guest threads than host threads - that's because hardware-level threads can be used in the guest, but not in the host, due to restrictions of the virtualization hardware.

More specifically, the dynamic multithreading code we include means that although both VMs are bound to the same host thread, they could actually run on different "subcores" of the host core.  When this is the case, it won't get accounted as  stolen time (the two VMs still may affect each others' performance, but how much depends on whether the workloads on each are using the same functional units in the CPU, so it can't be measured as an amount of time).

The test case will need to be adjusted for Power: there are two obvious ways to do this:

1) Disable dynamic multi-threading:

    echo 0 >/sys/module/kvm_hv/parameters/dynamic_mt_modes

With this executed before performing the test, the stolen time results should be as expected.

2) Increase each VM to 8 threads, and run 8 stress threads on each VM

This ensures that each VM occupies a whole host core, so they can't be packed onto the same core at the same time.