Bug 2222217

Summary: virtiofsd stops responding after pausing and resuming VM
Product: Red Hat Enterprise Linux 9 Reporter: German Maglione <gmaglione>
Component: virtiofsdAssignee: German Maglione <gmaglione>
Status: CLOSED ERRATA QA Contact: xiagao
Severity: medium Docs Contact:
Priority: medium    
Version: 9.3CC: jinzhao, juzhang, virt-maint, xiagao
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: virtiofsd-1.7.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:36:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2222221    
Bug Blocks:    

Description German Maglione 2023-07-12 09:39:53 UTC
Description of problem:

In upstream, it was reported that virtiofsd stops responding when pausing and resuming the VM:
- https://gitlab.com/virtio-fs/virtiofsd/-/issues/110

This is a regression, it works in the C version of virtiofsd

How reproducible: 100%

Steps to Reproduce:
1. Create a VM with virtiofs device
2. Boot the VM and mount virtiofs
3. virsh suspend vm
4. virsh resume vm

Actual results:
virtiofsd stops responding

Expected results:
virtiofsd should continue working

Additional info:
This is not a bug in virtiofsd, it was an error in one of our dependencies, the vhost-user-backend crate (rust-vmm):

When the VM is stopped, GET_VRING_BASE is issued, and when it is resumed, SET_VRING_BASE will set the retrieved value. Because GET_VRING_BASE is resetting the state of the VQ, it fails to resume the operation. This is already fixed upstream:
- https://github.com/rust-vmm/vhost/pull/154
- https://github.com/rust-vmm/vhost/pull/161
- https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/175

Comment 1 xiagao 2023-07-13 02:52:10 UTC

Windows driver also hit

Comment 2 xiagao 2023-07-13 04:49:45 UTC
I didn't reproduce this issue on the following env.

Guest: rhel9.3 5.14.0-333.el9.x86_64
host: rhel9.3  5.14.0-324.el9.x86_64
qemu-kvm-8.0.0-5.el9.x86_64
virtio-win-prewhql-0.1-239
kernel-5.14.0-324.el9.x86_64
edk2-ovmf-20230301gitf80f052277c8-5.el9.noarch

Steps like comment 0:

1. Create a VM with virtiofs device
# /usr/libexec/virtiofsd --shared-dir /home/test --socket-path /tmp/sock1 --lo│(qemu) boot-ovmf.sh: line 48: -chardev: command not found
g-level debug              

2. Boot the VM and mount virtiofs
 -chardev socket,id=char_virtiofs_fs,path=/tmp/sock1 \
 -device vhost-user-fs-pci,id=vufs_virtiofs_fs,chardev=char_virtiofs_fs,tag=myfs,bus=pcie-root-port-3,addr=0x0 \

3. stop and cont vm
(qemu) stop
(qemu) cont

4. check virtiofs in guest.

Results:
it works well, can read/write in virtiofs inside guest.

Results

Comment 3 xiagao 2023-07-16 14:16:50 UTC
Hi German, could you help to check the steps above if you're available, thanks.

Comment 4 German Maglione 2023-07-17 14:57:38 UTC
(In reply to xiagao from comment #3)
> Hi German, could you help to check the steps above if you're available,
> thanks.

The steps are ok, but if you check the debug output of virtiofsd, you will see 
the virtiofsd will repeat the first operation in the VQ, until it "catch-up" with
the entry in the guest, like in my tests: the uniq value is incremented with each
operation, but here we keep using the number 20

[2023-07-17T14:49:21Z DEBUG virtiofsd] QUEUE_EVENT
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
...
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }


to make it failing, you should set a small queue-size in qemu, like:
-device vhost-user-fs-pci,queue-size=16,... \

Comment 5 xiagao 2023-07-18 04:39:10 UTC
(In reply to German Maglione from comment #4)
> (In reply to xiagao from comment #3)
> > Hi German, could you help to check the steps above if you're available,
> > thanks.
> 
> The steps are ok, but if you check the debug output of virtiofsd, you will
> see 
> the virtiofsd will repeat the first operation in the VQ, until it "catch-up"
> with
> the entry in the guest, like in my tests: the uniq value is incremented with
> each
> operation, but here we keep using the number 20
> 
> [2023-07-17T14:49:21Z DEBUG virtiofsd] QUEUE_EVENT
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> ...
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> 
> 
> to make it failing, you should set a small queue-size in qemu, like:
> -device vhost-user-fs-pci,queue-size=16,... \

Thank you. With queue-size=16, can reproduce the problem.

Comment 6 xiagao 2023-07-21 05:50:26 UTC
Pre-verify this bz, as it works with virtiofsd-1.7 version.

Comment 9 xiagao 2023-07-28 10:10:49 UTC
It works with virtiofsd-1.7, so verify it.

Comment 11 errata-xmlrpc 2023-11-07 08:36:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virtiofsd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6522