2143840 – prio_workers in virtqemud.conf does not take effect

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2143840 - prio_workers in virtqemud.conf does not take effect

Summary: prio_workers in virtqemud.conf does not take effect

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	9.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Privoznik
QA Contact:	yafu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-11-18 03:37 UTC by yafu
Modified:	2023-05-09 08:09 UTC (History)
CC List:	4 users (show)
Fixed In Version:	libvirt-8.10.0-1.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-05-09 07:27:15 UTC
Type:	Bug
Target Upstream Version:	8.10.0
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-139880	0	None	None	None	2022-11-18 03:43:17 UTC
Red Hat Product Errata	RHBA-2023:2171	0	None	None	None	2023-05-09 07:27:30 UTC

Description yafu 2022-11-18 03:37:57 UTC

Description of problem:
prio_workers in virtqemud.conf does not take effect

Version-Release number of selected component (if applicable):
libvirt-8.9.0-2.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Edit virtqemud.conf with following setting and restart virtqemud:
#cat /etc/libvirt/virtqemud.conf
max_clients = 500
min_workers = 2
max_workers = 4
prio_workers = 5

#systemctl restart virtqemud

2.Start guest:
#virsh start avocado-vt-vm1

3.Open 4 terminals and switch non-root user to connect 'qemu:///system' in every terminal:
#su - test
$ virsh -c qemu:///system
==== AUTHENTICATING FOR org.libvirt.unix.manage ====
System policy prevents management of local virtualized systems
Authenticating as: root
Password: 

4.Destroy guest in another terminal:
#virsh destroy avocado-vt-vm1


Actual results:
Destroy guest hang in step 4 until one of the connection to 'qemu:///system' exit after timeout.

Expected results:
Destroy guest should not hang according to the expression of 'prio_workers': 
# The number of priority workers. If all workers from above
# pool are stuck, some calls marked as high priority
# (notably domainDestroy) can be executed in this pool.

Additional info:
1.Get a warning in virtlogd.log when executing step 4:
2022-11-18 03:13:47.227+0000: 1659694: warning : virNetServerClientDispatchRead:1269 : Client hit max requests limit 1. This may result in keep-alive timeouts. Consider tuning the max_client_requests server parameter

Comment 1 Michal Privoznik 2022-11-23 08:47:30 UTC

The problem here is not the destroy command but how it's invoked, because 'virsh destroy' does a bit more than just destroying the domain. It also needs to open a connection. And opening a new connection consists of multiple RPC calls. Most of them are marked as high priority (meaning the priority worker can process them), except for one: REMOTE_PROC_CONNECT_REGISTER_CLOSE_CALLBACK. And this is the point where virsh gets stuck (long before it's even able to call virDomainDestroy()). Therefore, if you'd start virsh interactively, then destroy would work just fine. I need to investigate whether aforementioned RPC call can be marked as high priority and what would be the implications.

Comment 2 Michal Privoznik 2022-11-23 09:38:34 UTC

Patch posted onto the list:

https://listman.redhat.com/archives/libvir-list/2022-November/235887.html

Comment 3 Michal Privoznik 2022-11-23 11:19:00 UTC

Merged upstream as:

commit a6d3717e7f28bfb574f3122d9b99703a7d722033
Author:     Michal Prívozník <mprivozn>
AuthorDate: Wed Nov 23 09:50:29 2022 +0100
Commit:     Michal Prívozník <mprivozn>
CommitDate: Wed Nov 23 12:13:10 2022 +0100

    rpc: Mark close callback (un-)register as high priority
    
    Our RPC calls can be divided into two groups: regular and high
    priority. The latter can be then processed by so called high
    priority worker threads. This is our way of defeating a
    'deadlock' and allowing some RPCs to be processed even when all
    (regular) worker threads are stuck. For instance: if all regular
    worker threads get stuck when talking to QEMU on monitor, the
    virDomainDestroy() can be processed by a high priority worker
    thread(s) and thus unstuck those threads.
    
    Now, this is all fine, except if users want to use virsh
    non interactively:
    
      virsh destroy $dom
    
    This does a bit more - it needs to open a connection. And that
    consists of multiple RPC calls: AUTH_LIST,
    CONNECT_SUPPORTS_FEATURE, CONNECT_OPEN, and finally
    CONNECT_REGISTER_CLOSE_CALLBACK. All of them are marked as high
    priority except the last one. Therefore, virsh just sits there
    with a partially open connection.
    
    There's one requirement for high priority calls though: they can
    not get stuck. Hopefully, the reason is obvious by now. And
    looking into the server side implementation the
    CONNECT_REGISTER_CLOSE_CALLBACK processing can't ever get stuck.
    The only driver that implements the callback for public API is
    Parallels (vz). And that can't block really.
    
    And for virConnectUnregisterCloseCallback() it's the same story.
    
    Therefore, both can be marked as high priority.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2143840
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Daniel P. Berrangé <berrange>

v8.9.0-281-ga6d3717e7f

Comment 4 yafu 2022-12-07 11:01:20 UTC

Tested pass with libvirt-8.10.0-2.el9.x86_64.

Comment 7 yafu 2022-12-08 09:19:32 UTC

Verified with libvirt-8.10.0-2.el9.x86_64.

Test steps:
1.Edit virtqemud.conf with following setting and restart virtqemud:
#cat /etc/libvirt/virtqemud.conf
max_clients = 500
min_workers = 2
max_workers = 4
prio_workers = 5

#systemctl restart virtqemud

2.Start guest:
#virsh start avocado-vt-vm1

3.Open 4 terminals and switch non-root user to connect 'qemu:///system' in every terminal:
#su - test
$ virsh -c qemu:///system
==== AUTHENTICATING FOR org.libvirt.unix.manage ====
System policy prevents management of local virtualized systems
Authenticating as: root
Password: 

4.Destroy guest in another terminal:
#virsh destroy avocado-vt-vm1
Domain 'avocado-vt-vm1' destroyed

Comment 9 errata-xmlrpc 2023-05-09 07:27:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171

Note You need to log in before you can comment on or make changes to this bug.