RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2222217 - virtiofsd stops responding after pausing and resuming VM
Summary: virtiofsd stops responding after pausing and resuming VM
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: virtiofsd
Version: 9.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: German Maglione
QA Contact: xiagao
URL:
Whiteboard:
Depends On: 2222221
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-12 09:39 UTC by German Maglione
Modified: 2023-11-30 06:18 UTC (History)
4 users (show)

Fixed In Version: virtiofsd-1.7.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-07 08:36:17 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rust-vmm vhost pull 154 0 None Merged Fix return value of `GET_VRING_BASE` message 2023-07-12 09:42:40 UTC
Github rust-vmm vhost pull 161 0 None Merged get_vring_base should not reset the queue 2023-07-12 09:42:57 UTC
Gitlab virtio-fs virtiofsd issues 110 0 None closed virtio-fs stops responding after pausing and resuming VM 2023-07-12 09:42:01 UTC
Gitlab virtio-fs virtiofsd merge_requests 175 0 None merged Upgrade rust-vmm dependencies 2023-07-12 09:42:21 UTC
Red Hat Issue Tracker RHELPLAN-162095 0 None None None 2023-07-12 09:41:24 UTC
Red Hat Product Errata RHBA-2023:6522 0 None None None 2023-11-07 08:36:29 UTC

Description German Maglione 2023-07-12 09:39:53 UTC
Description of problem:

In upstream, it was reported that virtiofsd stops responding when pausing and resuming the VM:
- https://gitlab.com/virtio-fs/virtiofsd/-/issues/110

This is a regression, it works in the C version of virtiofsd

How reproducible: 100%

Steps to Reproduce:
1. Create a VM with virtiofs device
2. Boot the VM and mount virtiofs
3. virsh suspend vm
4. virsh resume vm

Actual results:
virtiofsd stops responding

Expected results:
virtiofsd should continue working

Additional info:
This is not a bug in virtiofsd, it was an error in one of our dependencies, the vhost-user-backend crate (rust-vmm):

When the VM is stopped, GET_VRING_BASE is issued, and when it is resumed, SET_VRING_BASE will set the retrieved value. Because GET_VRING_BASE is resetting the state of the VQ, it fails to resume the operation. This is already fixed upstream:
- https://github.com/rust-vmm/vhost/pull/154
- https://github.com/rust-vmm/vhost/pull/161
- https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/175

Comment 1 xiagao 2023-07-13 02:52:10 UTC

Windows driver also hit

Comment 2 xiagao 2023-07-13 04:49:45 UTC
I didn't reproduce this issue on the following env.

Guest: rhel9.3 5.14.0-333.el9.x86_64
host: rhel9.3  5.14.0-324.el9.x86_64
qemu-kvm-8.0.0-5.el9.x86_64
virtio-win-prewhql-0.1-239
kernel-5.14.0-324.el9.x86_64
edk2-ovmf-20230301gitf80f052277c8-5.el9.noarch

Steps like comment 0:

1. Create a VM with virtiofs device
# /usr/libexec/virtiofsd --shared-dir /home/test --socket-path /tmp/sock1 --lo│(qemu) boot-ovmf.sh: line 48: -chardev: command not found
g-level debug              

2. Boot the VM and mount virtiofs
 -chardev socket,id=char_virtiofs_fs,path=/tmp/sock1 \
 -device vhost-user-fs-pci,id=vufs_virtiofs_fs,chardev=char_virtiofs_fs,tag=myfs,bus=pcie-root-port-3,addr=0x0 \

3. stop and cont vm
(qemu) stop
(qemu) cont

4. check virtiofs in guest.

Results:
it works well, can read/write in virtiofs inside guest.

Results

Comment 3 xiagao 2023-07-16 14:16:50 UTC
Hi German, could you help to check the steps above if you're available, thanks.

Comment 4 German Maglione 2023-07-17 14:57:38 UTC
(In reply to xiagao from comment #3)
> Hi German, could you help to check the steps above if you're available,
> thanks.

The steps are ok, but if you check the debug output of virtiofsd, you will see 
the virtiofsd will repeat the first operation in the VQ, until it "catch-up" with
the entry in the guest, like in my tests: the uniq value is incremented with each
operation, but here we keep using the number 20

[2023-07-17T14:49:21Z DEBUG virtiofsd] QUEUE_EVENT
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
...
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }


to make it failing, you should set a small queue-size in qemu, like:
-device vhost-user-fs-pci,queue-size=16,... \

Comment 5 xiagao 2023-07-18 04:39:10 UTC
(In reply to German Maglione from comment #4)
> (In reply to xiagao from comment #3)
> > Hi German, could you help to check the steps above if you're available,
> > thanks.
> 
> The steps are ok, but if you check the debug output of virtiofsd, you will
> see 
> the virtiofsd will repeat the first operation in the VQ, until it "catch-up"
> with
> the entry in the guest, like in my tests: the uniq value is incremented with
> each
> operation, but here we keep using the number 20
> 
> [2023-07-17T14:49:21Z DEBUG virtiofsd] QUEUE_EVENT
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> ...
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> 
> 
> to make it failing, you should set a small queue-size in qemu, like:
> -device vhost-user-fs-pci,queue-size=16,... \

Thank you. With queue-size=16, can reproduce the problem.

Comment 6 xiagao 2023-07-21 05:50:26 UTC
Pre-verify this bz, as it works with virtiofsd-1.7 version.

Comment 9 xiagao 2023-07-28 10:10:49 UTC
It works with virtiofsd-1.7, so verify it.

Comment 11 errata-xmlrpc 2023-11-07 08:36:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virtiofsd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6522


Note You need to log in before you can comment on or make changes to this bug.