Bug 2222217 - virtiofsd stops responding after pausing and resuming VM
Summary: virtiofsd stops responding after pausing and resuming VM
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: virtiofsd
Version: 9.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: German Maglione
QA Contact: xiagao
URL:
Whiteboard:
Depends On: 2222221
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-12 09:39 UTC by German Maglione
Modified: 2023-07-28 10:13 UTC (History)
4 users (show)

Fixed In Version: virtiofsd-1.7.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rust-vmm vhost pull 154 0 None Merged Fix return value of `GET_VRING_BASE` message 2023-07-12 09:42:40 UTC
Github rust-vmm vhost pull 161 0 None Merged get_vring_base should not reset the queue 2023-07-12 09:42:57 UTC
Gitlab virtio-fs virtiofsd issues 110 0 None closed virtio-fs stops responding after pausing and resuming VM 2023-07-12 09:42:01 UTC
Gitlab virtio-fs virtiofsd merge_requests 175 0 None merged Upgrade rust-vmm dependencies 2023-07-12 09:42:21 UTC
Red Hat Issue Tracker RHELPLAN-162095 0 None None None 2023-07-12 09:41:24 UTC

Description German Maglione 2023-07-12 09:39:53 UTC
Description of problem:

In upstream, it was reported that virtiofsd stops responding when pausing and resuming the VM:
- https://gitlab.com/virtio-fs/virtiofsd/-/issues/110

This is a regression, it works in the C version of virtiofsd

How reproducible: 100%

Steps to Reproduce:
1. Create a VM with virtiofs device
2. Boot the VM and mount virtiofs
3. virsh suspend vm
4. virsh resume vm

Actual results:
virtiofsd stops responding

Expected results:
virtiofsd should continue working

Additional info:
This is not a bug in virtiofsd, it was an error in one of our dependencies, the vhost-user-backend crate (rust-vmm):

When the VM is stopped, GET_VRING_BASE is issued, and when it is resumed, SET_VRING_BASE will set the retrieved value. Because GET_VRING_BASE is resetting the state of the VQ, it fails to resume the operation. This is already fixed upstream:
- https://github.com/rust-vmm/vhost/pull/154
- https://github.com/rust-vmm/vhost/pull/161
- https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/175

Comment 1 xiagao 2023-07-13 02:52:10 UTC

Windows driver also hit

Comment 2 xiagao 2023-07-13 04:49:45 UTC
I didn't reproduce this issue on the following env.

Guest: rhel9.3 5.14.0-333.el9.x86_64
host: rhel9.3  5.14.0-324.el9.x86_64
qemu-kvm-8.0.0-5.el9.x86_64
virtio-win-prewhql-0.1-239
kernel-5.14.0-324.el9.x86_64
edk2-ovmf-20230301gitf80f052277c8-5.el9.noarch

Steps like comment 0:

1. Create a VM with virtiofs device
# /usr/libexec/virtiofsd --shared-dir /home/test --socket-path /tmp/sock1 --lo│(qemu) boot-ovmf.sh: line 48: -chardev: command not found
g-level debug              

2. Boot the VM and mount virtiofs
 -chardev socket,id=char_virtiofs_fs,path=/tmp/sock1 \
 -device vhost-user-fs-pci,id=vufs_virtiofs_fs,chardev=char_virtiofs_fs,tag=myfs,bus=pcie-root-port-3,addr=0x0 \

3. stop and cont vm
(qemu) stop
(qemu) cont

4. check virtiofs in guest.

Results:
it works well, can read/write in virtiofs inside guest.

Results

Comment 3 xiagao 2023-07-16 14:16:50 UTC
Hi German, could you help to check the steps above if you're available, thanks.

Comment 4 German Maglione 2023-07-17 14:57:38 UTC
(In reply to xiagao from comment #3)
> Hi German, could you help to check the steps above if you're available,
> thanks.

The steps are ok, but if you check the debug output of virtiofsd, you will see 
the virtiofsd will repeat the first operation in the VQ, until it "catch-up" with
the entry in the guest, like in my tests: the uniq value is incremented with each
operation, but here we keep using the number 20

[2023-07-17T14:49:21Z DEBUG virtiofsd] QUEUE_EVENT
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
...
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }


to make it failing, you should set a small queue-size in qemu, like:
-device vhost-user-fs-pci,queue-size=16,... \

Comment 5 xiagao 2023-07-18 04:39:10 UTC
(In reply to German Maglione from comment #4)
> (In reply to xiagao from comment #3)
> > Hi German, could you help to check the steps above if you're available,
> > thanks.
> 
> The steps are ok, but if you check the debug output of virtiofsd, you will
> see 
> the virtiofsd will repeat the first operation in the VQ, until it "catch-up"
> with
> the entry in the guest, like in my tests: the uniq value is incremented with
> each
> operation, but here we keep using the number 20
> 
> [2023-07-17T14:49:21Z DEBUG virtiofsd] QUEUE_EVENT
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> ...
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> 
> 
> to make it failing, you should set a small queue-size in qemu, like:
> -device vhost-user-fs-pci,queue-size=16,... \

Thank you. With queue-size=16, can reproduce the problem.

Comment 6 xiagao 2023-07-21 05:50:26 UTC
Pre-verify this bz, as it works with virtiofsd-1.7 version.

Comment 9 xiagao 2023-07-28 10:10:49 UTC
It works with virtiofsd-1.7, so verify it.


Note You need to log in before you can comment on or make changes to this bug.