Bug 2222217

Summary: virtiofsd stops responding after pausing and resuming VM
Product: Red Hat Enterprise Linux 9 Reporter: German Maglione <gmaglione>
Component: virtiofsdAssignee: German Maglione <gmaglione>
Status: VERIFIED --- QA Contact: xiagao
Severity: medium Docs Contact:
Priority: medium    
Version: 9.3CC: jinzhao, juzhang, virt-maint, xiagao
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: virtiofsd-1.7.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2222221    
Bug Blocks:    

Description German Maglione 2023-07-12 09:39:53 UTC
Description of problem:

In upstream, it was reported that virtiofsd stops responding when pausing and resuming the VM:
- https://gitlab.com/virtio-fs/virtiofsd/-/issues/110

This is a regression, it works in the C version of virtiofsd

How reproducible: 100%

Steps to Reproduce:
1. Create a VM with virtiofs device
2. Boot the VM and mount virtiofs
3. virsh suspend vm
4. virsh resume vm

Actual results:
virtiofsd stops responding

Expected results:
virtiofsd should continue working

Additional info:
This is not a bug in virtiofsd, it was an error in one of our dependencies, the vhost-user-backend crate (rust-vmm):

When the VM is stopped, GET_VRING_BASE is issued, and when it is resumed, SET_VRING_BASE will set the retrieved value. Because GET_VRING_BASE is resetting the state of the VQ, it fails to resume the operation. This is already fixed upstream:
- https://github.com/rust-vmm/vhost/pull/154
- https://github.com/rust-vmm/vhost/pull/161
- https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/175

Comment 1 xiagao 2023-07-13 02:52:10 UTC

Windows driver also hit

Comment 2 xiagao 2023-07-13 04:49:45 UTC
I didn't reproduce this issue on the following env.

Guest: rhel9.3 5.14.0-333.el9.x86_64
host: rhel9.3  5.14.0-324.el9.x86_64
qemu-kvm-8.0.0-5.el9.x86_64
virtio-win-prewhql-0.1-239
kernel-5.14.0-324.el9.x86_64
edk2-ovmf-20230301gitf80f052277c8-5.el9.noarch

Steps like comment 0:

1. Create a VM with virtiofs device
# /usr/libexec/virtiofsd --shared-dir /home/test --socket-path /tmp/sock1 --lo│(qemu) boot-ovmf.sh: line 48: -chardev: command not found
g-level debug              

2. Boot the VM and mount virtiofs
 -chardev socket,id=char_virtiofs_fs,path=/tmp/sock1 \
 -device vhost-user-fs-pci,id=vufs_virtiofs_fs,chardev=char_virtiofs_fs,tag=myfs,bus=pcie-root-port-3,addr=0x0 \

3. stop and cont vm
(qemu) stop
(qemu) cont

4. check virtiofs in guest.

Results:
it works well, can read/write in virtiofs inside guest.

Results

Comment 3 xiagao 2023-07-16 14:16:50 UTC
Hi German, could you help to check the steps above if you're available, thanks.

Comment 4 German Maglione 2023-07-17 14:57:38 UTC
(In reply to xiagao from comment #3)
> Hi German, could you help to check the steps above if you're available,
> thanks.

The steps are ok, but if you check the debug output of virtiofsd, you will see 
the virtiofsd will repeat the first operation in the VQ, until it "catch-up" with
the entry in the guest, like in my tests: the uniq value is incremented with each
operation, but here we keep using the number 20

[2023-07-17T14:49:21Z DEBUG virtiofsd] QUEUE_EVENT
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
...
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request: opcode=Getattr (3), inode=1, unique=20, pid=847
[2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 120, error: 0, unique: 20 }


to make it failing, you should set a small queue-size in qemu, like:
-device vhost-user-fs-pci,queue-size=16,... \

Comment 5 xiagao 2023-07-18 04:39:10 UTC
(In reply to German Maglione from comment #4)
> (In reply to xiagao from comment #3)
> > Hi German, could you help to check the steps above if you're available,
> > thanks.
> 
> The steps are ok, but if you check the debug output of virtiofsd, you will
> see 
> the virtiofsd will repeat the first operation in the VQ, until it "catch-up"
> with
> the entry in the guest, like in my tests: the uniq value is incremented with
> each
> operation, but here we keep using the number 20
> 
> [2023-07-17T14:49:21Z DEBUG virtiofsd] QUEUE_EVENT
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> ...
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Received request:
> opcode=Getattr (3), inode=1, unique=20, pid=847
> [2023-07-17T14:49:21Z DEBUG virtiofsd::server] Replying OK, header:
> OutHeader { len: 120, error: 0, unique: 20 }
> 
> 
> to make it failing, you should set a small queue-size in qemu, like:
> -device vhost-user-fs-pci,queue-size=16,... \

Thank you. With queue-size=16, can reproduce the problem.

Comment 6 xiagao 2023-07-21 05:50:26 UTC
Pre-verify this bz, as it works with virtiofsd-1.7 version.

Comment 9 xiagao 2023-07-28 10:10:49 UTC
It works with virtiofsd-1.7, so verify it.