Bug 1967496

Summary: [virtio-fs] nfs/xfstest generic/089 generic/478 failed
Product: Red Hat Enterprise Linux 8 Reporter: xiagao
Component: qemu-kvmAssignee: Hanna Czenczek <hreitz>
qemu-kvm sub component: virtio-fs QA Contact: xiagao
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: kkiwi, qzhang, virt-maint, xinma, yidliu, yihyu, yimsong, zhenyzha
Version: 8.5Keywords: Triaged
Target Milestone: beta   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-4.2.0-58.module+el8.5.0+12272+74ace547 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-09 18:01:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description xiagao 2021-06-03 08:59:23 UTC
Description of problem:
as $subject

Version-Release number of selected component (if applicable):
guest: 
kernel-4.18.0-310.el8.x86_64

host:
kernel-4.18.0-310.el8.x86_64
qemu-kvm-4.2.0-51.module+el8.5.0+11141+9dff516f.x86_64
RHEL-8.5.0-20210531.n.0

How reproducible:
100%

Steps to Reproduce:
1.start a RHEL8.5.0 guest with virtiofs
# /usr/libexec/virtiofsd --socket-path=/tmp/socket1 -o source=/home/test1 -o cache=auto &
# /usr/libexec/virtiofsd --socket-path=/tmp/socket2 -o source=/home/test2 -o cache=auto &

vhost-user-fs device in qemu cmd line:
    -chardev socket,id=char_virtiofs_fs1,path=/tmp/socket1 \
    -device vhost-user-fs-pci,id=vufs_virtiofs_fs1,chardev=char_virtiofs_fs1,tag=myfs1,queue-size=1024,bus=pcie.0,addr=0x3 \
    -chardev socket,id=char_virtiofs_fs2,path=/tmp/socket2 \
    -device vhost-user-fs-pci,id=vufs_virtiofs_fs2,chardev=char_virtiofs_fs2,tag=myfs2,queue-size=1024,bus=pcie.0,addr=0x4 \

2.Build && Install xfstest in guest.
  980  mkdir -p /mnt/myfs1
  981  mkdir -p /mnt/myfs2
  982  cd /home && rm -rf xfstests-dev && git clone https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
  983  yum install -y git acl attr automake bc dump e2fsprogs fio gawk gcc libtool lvm2 make psmisc quota sed xfsdump xfsprogs libacl-devel libattr-devel libaio-devel libuuid-devel xfsprogs-devel python3 sqlite
  984  cd /home/xfstests-dev/ && make && make install
  985  export TEST_DEV=myfs1 && export TEST_DIR=/mnt/myfs1 && export SCRATCH_DEV=myfs2 && export SCRATCH_MNT=/mnt/myfs2 && export FSTYP=virtiofs && export FSX_AVOID="-E" && echo -e 'TEST_DEV=myfs1\nTEST_DIR=/mnt/myfs1\nSCRATCH_DEV=myfs2\nSCRATCH_MNT=/mnt/myfs2\nFSTYP=virtiofs\nFSX_AVOID="-E"' > configs/localhost.config && echo "generic/003 generic/120 generic/426 generic/467 generic/477 generic/551" > blacklist
  986  useradd fsgqa && useradd 123456-fsgqa && useradd fsgqa2

3. run generic/089 generic/478 generic/632 test.
# ./check -virtiofs generic/089 generic/478 generic/632


Actual results:
generic/089	- output mismatch (see /home/xfstests-dev/results//generic/089.out.bad)
    --- tests/generic/089.out	2021-06-03 15:00:08.545291315 +0800
    +++ /home/xfstests-dev/results//generic/089.out.bad	2021-06-03 16:32:49.738609634 +0800
    @@ -1,18 +1,18 @@
     QA output created by 089
    -completed 50 iterations
    -completed 50 iterations
    +can't lock lock file t_mtab~: Operation not supported
    +can't lock lock file t_mtab~: Operation not supported
     completed 50 iterations
     completed 10000 iterations
    ...
    (Run 'diff -u /home/xfstests-dev/tests/generic/089.out /home/xfstests-dev/results//generic/089.out.bad'  to see the entire diff)
generic/478	- output mismatch (see /home/xfstests-dev/results//generic/478.out.bad)
    --- tests/generic/478.out	2021-06-03 15:00:08.679289881 +0800
    +++ /home/xfstests-dev/results//generic/478.out.bad	2021-06-03 16:33:51.424970772 +0800
    @@ -1,91 +1,259 @@
     QA output created by 478
    -get wrlck
    -lock could be placed
    -get wrlck
    -get wrlck
    -lock could be placed
    -get wrlck
    ...
    (Run 'diff -u /home/xfstests-dev/tests/generic/478.out /home/xfstests-dev/results//generic/478.out.bad'  to see the entire diff)
generic/632	[failed, exit status 1]- output mismatch (see /home/xfstests-dev/results//generic/632.out.bad)
    --- tests/generic/632.out	2021-06-03 15:00:08.732289314 +0800
    +++ /home/xfstests-dev/results//generic/632.out.bad	2021-06-03 16:33:52.734955964 +0800
    @@ -1,2 +1,2 @@
     QA output created by 632
    -silence is golden
    +No space left on device - Buggy mount countingsilence is golden
    ...
    (Run 'diff -u /home/xfstests-dev/tests/generic/632.out /home/xfstests-dev/results//generic/632.out.bad'  to see the entire diff)
Ran: generic/089 generic/478 generic/632
Failures: generic/089 generic/478 generic/632
Failed 3 of 3 tests

Expected results:
Test pass.

Additional info:
1.Test on RHEL8.5.0-av host + RHEL8.5.0 guest, only generic/632 fail.
host version:
kernel-4.18.0-310.el8.x86_64
qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb.x86_64

guest kernel: kernel-4.18.0-310.el8.x86_64

  
2. qemu cmd line
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35 \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 32768 \
    -object memory-backend-file,size=32G,mem-path=/dev/shm,share=yes,id=mem-mem1  \
    -smp 20,maxcpus=20,cores=10,threads=1,dies=1,sockets=2  \
    -numa node,memdev=mem-mem1,nodeid=0  \
    -cpu 'IvyBridge',+kvm_pv_unhalt \
    -device pvpanic,ioport=0x505,id=idHBdrQm \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/var/lib/avocado/data/avocado-vt/vl_avocado-vt-vm1_image1.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -chardev socket,id=char_virtiofs_fs1,path=/tmp/socket1 \
    -device vhost-user-fs-pci,id=vufs_virtiofs_fs1,chardev=char_virtiofs_fs1,tag=myfs1,queue-size=1024,bus=pcie.0,addr=0x3 \
    -chardev socket,id=char_virtiofs_fs2,path=/tmp/socket2 \
    -device vhost-user-fs-pci,id=vufs_virtiofs_fs2,chardev=char_virtiofs_fs2,tag=myfs2,queue-size=1024,bus=pcie.0,addr=0x4 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:e1:64:ad:96:fb,id=id0Bu4MK,netdev=idaPhKXd,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idaPhKXd  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm -monitor stdio \
    -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x5,chassis=5

Comment 1 Yiding Liu (Fujitsu) 2021-06-04 02:04:02 UTC
Hi xiagao

a) I can't reproduce generic/089 generic/478 error (I tested on aarch64).
```
[root@localhost xfstests]# cat local.config.nfs 
export TEST_DEV=nfsmyfs0
export TEST_DIR=/mnt/test
export SCRATCH_DEV=nfsmyfs1
export SCRATCH_MNT=/mnt/scratch
export FSX_AVOID="-E"

[root@localhost xfstests]# ./check -virtiofs generic/089 generic/478 generic/632
FSTYP         -- virtiofs
PLATFORM      -- Linux/aarch64 localhost 4.18.0-310.el8.aarch64 #1 SMP Thu May 27 14:52:00 EDT 2021
MKFS_OPTIONS  -- nfsmyfs1
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 nfsmyfs1 /mnt/scratch

generic/089	 96s
generic/478	 4s
generic/632	[failed, exit status 1]- output mismatch (see /root/xfstests/results//generic/632.out.bad)
    --- tests/generic/632.out	2021-06-04 09:50:17.829040740 +0800
    +++ /root/xfstests/results//generic/632.out.bad	2021-06-04 09:56:10.199040740 +0800
    @@ -1,2 +1,2 @@
     QA output created by 632
    -silence is golden
    +No space left on device - Buggy mount countingsilence is golden
    ...
    (Run 'diff -u /root/xfstests/tests/generic/632.out /root/xfstests/results//generic/632.out.bad'  to see the entire diff)
Ran: generic/089 generic/478 generic/632
Failures: generic/632
Failed 1 of 3 tests
```

My virtiofsd info
```
[root@fujitsu-fx700-01-n01 ~]# ps aux | grep virtiofsd
root       70386  0.0  0.0  87808  4736 ?        Sl   21:41   0:00 /usr/libexec/virtiofsd --fd=37 -o source=/mnt/share0,no_flock,no_posix_lock
root       70390  0.0  0.0  87808  4736 ?        Sl   21:41   0:00 /usr/libexec/virtiofsd --fd=37 -o source=/mnt/share1,no_flock,no_posix_lock
root       70394  0.0  0.0  87808  4736 ?        Sl   21:41   0:00 /usr/libexec/virtiofsd --fd=37 -o source=/mnt/nfs/share0,no_flock,no_posix_lock
root       70398  0.0  0.0  87808  4736 ?        Sl   21:41   0:00 /usr/libexec/virtiofsd --fd=37 -o source=/mnt/nfs/share1,no_flock,no_posix_lock
root       70458  0.0  0.0 29464448 2880 ?       Sl   21:41   0:00 /usr/libexec/virtiofsd --fd=37 -o source=/mnt/share0,no_flock,no_posix_lock
root       70459  0.0  0.0 29464448 2880 ?       Sl   21:41   0:00 /usr/libexec/virtiofsd --fd=37 -o source=/mnt/share1,no_flock,no_posix_lock
root       70460  0.0  0.0 29529984 5184 ?       Sl   21:41   0:00 /usr/libexec/virtiofsd --fd=37 -o source=/mnt/nfs/share1,no_flock,no_posix_lock
root       70463  3.5  0.0 29595520 19840 ?      Sl   21:41   0:46 /usr/libexec/virtiofsd --fd=37 -o source=/mnt/nfs/share0,no_flock,no_posix_lock
```

Env:
Host
kernel: 4.18.0-310.el8.aarch64
libvirt: libvirt-7.4.0-1.module+el8.5.0+11218+83343022.src.rpm
qemu-kvm: qemu-kvm-6.0.0-18.module+el8.5.0+11243+5269aaa1.src.rpm

Guest
kernel: 4.18.0-310.el8.aarch64


b) generic/632 only passed on the kernel that include below commit 
ee2e3f50629f ("mount: fix mounting of detached mounts onto targets that reside on shared mounts"

Comment 2 Klaus Heinrich Kiwi 2021-06-04 17:21:39 UTC
Max,

 can you take this one? Thanks.

Comment 3 Hanna Czenczek 2021-06-09 10:37:40 UTC
My findings so far:
089 and 478 fail because of -o posix_lock.  This is the default in RHEL 8, but has been changed upstream with qemu commit 88fc107956a5812649e5918e0c092d3f78bb28ad (“virtiofsd: Disable remote posix locks by default”).  This is why in comment 1, both passed, because -o no_posix_lock was passed.

I can’t think of a good reason not to backport this commit.


632 fails only with the 4.18.0-310.el8 in the guest, not with a current upstream kernel.  The qemu/virtiofsd version doesn’t matter.  I’ll have to investigate/bisect this one further.

Comment 4 Hanna Czenczek 2021-06-09 12:19:08 UTC
So turns out 632 is fixed by the kernel commit ee2e3f50629f17b0752b55b2566c15ce8dafb557, which in hindsight I guess is kind of obvious, because according to the commit that added 632 to the xfstests, it was added as a regression test for said commit ee2e3f50629f17b0752b55b2566c15ce8dafb557.

I don’t think its failure has anything to do with virtio-fs.

Comment 5 Hanna Czenczek 2021-06-09 12:23:49 UTC
(In reply to Max Reitz from comment #4)
> So turns out 632 is fixed by the kernel commit
> ee2e3f50629f17b0752b55b2566c15ce8dafb557, which in hindsight I guess is kind
> of obvious, because according to the commit that added 632 to the xfstests,
> it was added as a regression test for said commit
> ee2e3f50629f17b0752b55b2566c15ce8dafb557.
> 
> I don’t think its failure has anything to do with virtio-fs.

Maybe I should be a bit more verbose: ee2e3f50629f17b0752b55b2566c15ce8dafb557 is “mount: fix mounting of detached mounts onto targets that reside on shared mounts”, and it’s a general vfs commit that has nothing to do with virtio-fs in particular.  It’s about mounts in general.

So if we want 632 fixed, we would need a separate BZ, and that wouldn’t have anything to do with virtio-fs.  If we don’t particularly care, then 632 just shouldn’t be run with a RHEL guest kernel.

Comment 7 John Ferlan 2021-08-10 12:52:08 UTC
Can we get a qa_ack+ please?  I also set ITM=26 since that's +2 on DTM and the last ITM before exceptions required.  Series has 1 downstream ack and we'll push to get the last 2, so bz should be on_qa soon.  thanks!

Comment 8 yimsong 2021-08-16 03:34:00 UTC
Hit the same issue on rhel850 when run virtiofs test loop,
case:
virtio_fs_share_data.run_stress.with_xfstest.with_local_source.with_cache.auto.default.default
virtio_fs_share_data.run_stress.with_xfstest.with_nfs_source.with_cache.auto.default.default
pkg:
qemu-kvm-4.2.0-56.module+el8.5.0+12039+0434c559.x86_64
kernel-4.18.0-325.el8.x86_64
seabios-1.13.0-2.module+el8.3.0+7353+9de0a3cc.x86_64

Comment 13 Yanan Fu 2021-08-19 07:44:44 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 14 Hanna Czenczek 2021-08-19 08:34:26 UTC
Changing the title to reflect that the 632 failure has nothing to do with virtiofs.

Comment 15 xiagao 2021-08-19 09:44:20 UTC
(In reply to Hanna Reitz from comment #14)
> Changing the title to reflect that the 632 failure has nothing to do with
> virtiofs.

Yes, I only can reproduce generic/089 generic/478 failed on qemu-kvm-4.2.0-57.module+el8.5.0+12118+4998563d.x86_64.
Thanks.

Comment 16 xiagao 2021-08-19 09:45:47 UTC
generic/089 generic/478 passed on qemu-kvm-4.2.0-58.module+el8.5.0+12272+74ace547.x86_64.
based on comment 15, verify this bug.

Comment 18 errata-xmlrpc 2021-11-09 18:01:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4191