Bug 2143840

Summary: prio_workers in virtqemud.conf does not take effect
Product: Red Hat Enterprise Linux 9 Reporter: yafu <yafu>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
libvirt sub component: General QA Contact: yafu <yafu>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: jdenemar, lmen, mprivozn, virt-maint
Version: 9.2Keywords: AutomationTriaged, Triaged, Upstream
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-8.10.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 07:27:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 8.10.0
Embargoed:

Description yafu 2022-11-18 03:37:57 UTC
Description of problem:
prio_workers in virtqemud.conf does not take effect

Version-Release number of selected component (if applicable):
libvirt-8.9.0-2.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Edit virtqemud.conf with following setting and restart virtqemud:
#cat /etc/libvirt/virtqemud.conf
max_clients = 500
min_workers = 2
max_workers = 4
prio_workers = 5

#systemctl restart virtqemud

2.Start guest:
#virsh start avocado-vt-vm1

3.Open 4 terminals and switch non-root user to connect 'qemu:///system' in every terminal:
#su - test
$ virsh -c qemu:///system
==== AUTHENTICATING FOR org.libvirt.unix.manage ====
System policy prevents management of local virtualized systems
Authenticating as: root
Password: 

4.Destroy guest in another terminal:
#virsh destroy avocado-vt-vm1


Actual results:
Destroy guest hang in step 4 until one of the connection to 'qemu:///system' exit after timeout.

Expected results:
Destroy guest should not hang according to the expression of 'prio_workers': 
# The number of priority workers. If all workers from above
# pool are stuck, some calls marked as high priority
# (notably domainDestroy) can be executed in this pool.

Additional info:
1.Get a warning in virtlogd.log when executing step 4:
2022-11-18 03:13:47.227+0000: 1659694: warning : virNetServerClientDispatchRead:1269 : Client hit max requests limit 1. This may result in keep-alive timeouts. Consider tuning the max_client_requests server parameter

Comment 1 Michal Privoznik 2022-11-23 08:47:30 UTC
The problem here is not the destroy command but how it's invoked, because 'virsh destroy' does a bit more than just destroying the domain. It also needs to open a connection. And opening a new connection consists of multiple RPC calls. Most of them are marked as high priority (meaning the priority worker can process them), except for one: REMOTE_PROC_CONNECT_REGISTER_CLOSE_CALLBACK. And this is the point where virsh gets stuck (long before it's even able to call virDomainDestroy()). Therefore, if you'd start virsh interactively, then destroy would work just fine. I need to investigate whether aforementioned RPC call can be marked as high priority and what would be the implications.

Comment 2 Michal Privoznik 2022-11-23 09:38:34 UTC
Patch posted onto the list:

https://listman.redhat.com/archives/libvir-list/2022-November/235887.html

Comment 3 Michal Privoznik 2022-11-23 11:19:00 UTC
Merged upstream as:

commit a6d3717e7f28bfb574f3122d9b99703a7d722033
Author:     Michal Prívozník <mprivozn>
AuthorDate: Wed Nov 23 09:50:29 2022 +0100
Commit:     Michal Prívozník <mprivozn>
CommitDate: Wed Nov 23 12:13:10 2022 +0100

    rpc: Mark close callback (un-)register as high priority
    
    Our RPC calls can be divided into two groups: regular and high
    priority. The latter can be then processed by so called high
    priority worker threads. This is our way of defeating a
    'deadlock' and allowing some RPCs to be processed even when all
    (regular) worker threads are stuck. For instance: if all regular
    worker threads get stuck when talking to QEMU on monitor, the
    virDomainDestroy() can be processed by a high priority worker
    thread(s) and thus unstuck those threads.
    
    Now, this is all fine, except if users want to use virsh
    non interactively:
    
      virsh destroy $dom
    
    This does a bit more - it needs to open a connection. And that
    consists of multiple RPC calls: AUTH_LIST,
    CONNECT_SUPPORTS_FEATURE, CONNECT_OPEN, and finally
    CONNECT_REGISTER_CLOSE_CALLBACK. All of them are marked as high
    priority except the last one. Therefore, virsh just sits there
    with a partially open connection.
    
    There's one requirement for high priority calls though: they can
    not get stuck. Hopefully, the reason is obvious by now. And
    looking into the server side implementation the
    CONNECT_REGISTER_CLOSE_CALLBACK processing can't ever get stuck.
    The only driver that implements the callback for public API is
    Parallels (vz). And that can't block really.
    
    And for virConnectUnregisterCloseCallback() it's the same story.
    
    Therefore, both can be marked as high priority.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2143840
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Daniel P. Berrangé <berrange>

v8.9.0-281-ga6d3717e7f

Comment 4 yafu 2022-12-07 11:01:20 UTC
Tested pass with libvirt-8.10.0-2.el9.x86_64.

Comment 7 yafu 2022-12-08 09:19:32 UTC
Verified with libvirt-8.10.0-2.el9.x86_64.

Test steps:
1.Edit virtqemud.conf with following setting and restart virtqemud:
#cat /etc/libvirt/virtqemud.conf
max_clients = 500
min_workers = 2
max_workers = 4
prio_workers = 5

#systemctl restart virtqemud

2.Start guest:
#virsh start avocado-vt-vm1

3.Open 4 terminals and switch non-root user to connect 'qemu:///system' in every terminal:
#su - test
$ virsh -c qemu:///system
==== AUTHENTICATING FOR org.libvirt.unix.manage ====
System policy prevents management of local virtualized systems
Authenticating as: root
Password: 

4.Destroy guest in another terminal:
#virsh destroy avocado-vt-vm1
Domain 'avocado-vt-vm1' destroyed

Comment 9 errata-xmlrpc 2023-05-09 07:27:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171