1554851 – realtime-virtual-host: vCPU being interrupted by I/O thread

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1554851 - realtime-virtual-host: vCPU being interrupted by I/O thread

Summary: realtime-virtual-host: vCPU being interrupted by I/O thread

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	tuned
Sub Component:
Version:	7.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jaroslav Škarvada
QA Contact:	Robin Hack
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1670275 1679007 (view as bug list)
Depends On:
Blocks:	kvm-rt-tuned 1672377 1728699
TreeView+	depends on / blocked

Reported:	2018-03-13 13:10 UTC by Luiz Capitulino
Modified:	2019-08-06 13:05 UTC (History)
CC List:	12 users (show)
Fixed In Version:	tuned-2.11.0-0.1.rc1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1728699 (view as bug list)
Environment:
Last Closed:	2019-08-06 13:04:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:2201	0	None	None	None	2019-08-06 13:05:15 UTC

Description Luiz Capitulino 2018-03-13 13:10:45 UTC

Description of problem:

I can observe that the vCPU thread of the guest ran by run-tscdeadline-latency.sh being interrupted by QEMU's I/O thread. This increases the latency when trying to find the best timer advancement value from the host profile.

The problem is that all QEMU's threads are pinned to the same isolated host CPU in run-tscdeadline-latency.sh:

    chrt -f 1 taskset -c $1 $QEMU [...]

We need to someone way move the vCPU thread to its own isolated CPU.

Version-Release number of selected component (if applicable):  tuned-2.9.0-1.el7.noarch


How reproducible:

I observed this issue on system that had many small issues affecting run-tscdeadline.sh fixed, I'm not sure it's going to be easy to observe this on an unfixed system where tscdeadline_latency.flat runs quick. But the procedure would be:

1. Pick an isolated CPU in the host (eg. CPU 1)

2. Change run-tscdeadline-latency.sh so that it runs only once (instead of going from 1000 to 7000 in 500 increments)

3. Start tracing for sched_switch on that CPU:

# trace-cmd record -esched_switch -M2

4. Run:

# ./run-tscdeadline-latency.sh 1

5. When run-tscdeadline-latency.sh finishes, run:

# trace-cmd list | grep qemu

When the issue triggers, you see two qemu PIDs preempting each other. If the problem doesn't show up, you can try it a few times or change tscdeadline_latency.flat to run for a longer duration (then I'm sure it will show up).

Comment 2 Luiz Capitulino 2018-03-13 13:19:08 UTC

I don't know what's the best way to fix this. As a workaround, I wrote a python script that gets the vCPU thread id from QEMU via QMP and move it to another isolated core. But this seems overly complex, and QEMU has to be started with a QMP socket in paused mode (so that it doesn't start the measurement before the vCPU thread is moved).

Another two other ideas that may be much simpler:

1. Check if it's possible to tell if a qemu thread is a vCPU thread from /proc. This removes the need of having the Python script, but qemu still has to be started with -S

2. Carry a simple xml definition in the realtime-virtual-host profile and define the guest in libvirt on the spot, as libvirt supports everything we need

Comment 3 Marcelo Tosatti 2018-03-14 21:06:49 UTC

(In reply to Luiz Capitulino from comment #2)
> I don't know what's the best way to fix this. As a workaround, I wrote a
> python script that gets the vCPU thread id from QEMU via QMP and move it to
> another isolated core. But this seems overly complex, and QEMU has to be
> started with a QMP socket in paused mode (so that it doesn't start the
> measurement before the vCPU thread is moved).
> 
> Another two other ideas that may be much simpler:
> 
> 1. Check if it's possible to tell if a qemu thread is a vCPU thread from
> /proc. This removes the need of having the Python script, but qemu still has
> to be started with -S

Better use the official interface (QMP). 

> 2. Carry a simple xml definition in the realtime-virtual-host profile and
> define the guest in libvirt on the spot, as libvirt supports everything we
> need

Thats much more complex.

To me, your workaround is the proper fix (i would fix it the same way).

Comment 4 Luiz Capitulino 2018-03-14 21:11:17 UTC

OK, I'll polish my script, integrate it in the profile and post it. I'll do this soon (ie. not right now).

Comment 5 Marcelo Tosatti 2018-07-05 15:34:53 UTC

Patch posted upstream, reassigning to Jaroslav 
for the RHEL tuned integration.

Comment 6 Jaroslav Škarvada 2018-07-09 16:43:07 UTC

Upstream commit:
https://github.com/redhat-performance/tuned/commit/4790e570ce0e41bde4e1866ed6e3cba723b5f4d8

Comment 12 Jaison Raju 2019-02-28 05:25:11 UTC

*** Bug 1679007 has been marked as a duplicate of this bug. ***

Comment 13 Jaroslav Škarvada 2019-03-06 13:56:24 UTC

*** Bug 1670275 has been marked as a duplicate of this bug. ***

Comment 19 Pei Zhang 2019-07-08 03:17:56 UTC

We need this fix for 7.6.z. 

In latest 7.6.z testing, below command can not return successfully, it's hang there. After applying patch of this bug, tuned can work well.

# tuned-adm profile realtime-virtual-host


Versions:
3.10.0-957.29.1.rt56.941.el7.x86_64
tuned-2.10.0-6.el7_6.3.noarch

As Luiz suggested, I add "7.6.z ?" flag here.

Best regards,

Pei

Comment 25 errata-xmlrpc 2019-08-06 13:04:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2201

Note You need to log in before you can comment on or make changes to this bug.