Bug 1808697 - virtiofsd doesn't get killed with 'virsh destroy'
Summary: virtiofsd doesn't get killed with 'virsh destroy'
Keywords:
Status: CLOSED DUPLICATE of bug 1897105
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.2
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: 8.2
Assignee: Ján Tomko
QA Contact: yafu
URL:
Whiteboard:
Depends On:
Blocks: 1885655
TreeView+ depends on / blocked
 
Reported: 2020-02-29 13:33 UTC by Andrew Jones
Modified: 2021-03-18 15:45 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-08 10:58:11 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andrew Jones 2020-02-29 13:33:11 UTC
When 'virsh destroy'ing a guest that has a virtiofs file system configured the virtiofsd daemon isn't killed. The next attempt to start the guest fails with

error: Failed to start domain DOMAIN
error: Cannot open log file: '/var/log/libvirt/qemu/DOMAIN-fs1-virtiofsd.log': Device or resource busy


To start the guest again one must first do something like

# kill -9 $(pidof virtiofsd)


This issue was found on an AArch64 machine while testing virtiofs in the guest with xfstests. xfstests hung and the guest had to be destroyed. Maybe the fact that virtiofsd was still in use at the time of the 'virsh destroy' also plays a part.

Comment 1 Stefan Hajnoczi 2020-03-03 08:00:22 UTC
virtiofsd terminates when the vhost-user UNIX domain socket connection is closed.  This should happen automatically when the QEMU process terminates.

I wonder what is happening.  Does libvirtd still have the fd open?

Comment 2 Ján Tomko 2020-03-03 11:39:59 UTC
That fd is held open by virtlogd "on behalf" of virtiofsd while virtiofsd is running.

A possible problem on the libvirtd side could be that libvirtd sends SIGTERM (and later SIGKILL)
only to the process it creates, not to the fork virtiofsd creates.

Any mentions of ProcessKill in libvirtd log? Was there just one leftover virtiofsd or two?

Comment 3 Andrew Jones 2020-03-03 13:06:01 UTC
(In reply to Ján Tomko from comment #2)
> Any mentions of ProcessKill in libvirtd log? Was there just one leftover
> virtiofsd or two?

I don't have the log any more. I only had to kill a single virtiofsd.

Comment 4 Luiz Capitulino 2020-03-23 13:44:51 UTC
Hi Jano,

What are your plans for this BZ? Do you need help from
Virt/ARM to make progress?

We (the Virt/ARM team), think that this may turn out to be
more serious than it seems. I think the worst case would
be the customer having several VMs that are destroyed &
restarted relatively often. This could create several dozens
of virtiofds processes lying around.

We're also not sure how easily the workaround can be
applied in this case, since we have to match virtiofds
PID to the VMs being destroyed.

Comment 5 Ján Tomko 2020-03-23 14:20:46 UTC
A reliable reproducer or at least logs would be nice.

The workaround of killing the whole process group of virtiofsd could help,
but it won't really solve the underlying bug, that is why didn't
virtiofsd exit when QEMU did.

Comment 6 Luiz Capitulino 2020-03-24 20:50:53 UTC
Drew,

Do you think it's worth spending time on trying
to reproduce?

I mean, if it's too of a corner case, they we
may treat this as very low priority.

Comment 7 Andrew Jones 2020-03-25 12:25:30 UTC
(In reply to Luiz Capitulino from comment #6)
> Drew,
> 
> Do you think it's worth spending time on trying
> to reproduce?
> 
> I mean, if it's too of a corner case, they we
> may treat this as very low priority.

I tried to reproduce, but no luck. I was on a different host, but I don't know how that would matter. No xfstest caused the guest to hang this time, although a few were either very slow or looping (generic/069, generic/074, generic/089). I gave up testing after attempting to destroy the guest during each of those long running tests and not getting any orphaned virtiofsd's. Before trying xfstests I also tried just doing a dd in the virtiofs file system and destroying it, which didn't reproduce either. If we're dealing with a race then it might be tricky to reproduce reliably, and, as it's possible to manually kill the virtiofsds as a workaround, then the priority of this bug can probably be lowered.

Comment 8 Luiz Capitulino 2020-03-25 20:50:25 UTC
Thanks, Drew. Lowering the priority.

Comment 9 yafu 2020-06-11 12:07:58 UTC
Hi Ján,

I can reproduce the issue with the following steps:

1.Prepare a guest with two filesystem device:
#virsh edit vm1
 <filesystem type='mount' accessmode='passthrough'>
      <driver type='virtiofs' queue='1024'/>
      <binary path='/usr/libexec/virtiofsd' xattr='on'>
        <lock posix='on' flock='on'/>
      </binary>
      <source dir='/path1'/>
      <target dir='mount_tag'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </filesystem>
    <filesystem type='mount' accessmode='passthrough'>
      <driver type='virtiofs'/>
      <binary path='/usr/libexec/virtiofsd' xattr='off'>
        <cache mode='none'/>
        <lock posix='on' flock='on'/>
      </binary>
      <source dir='/path2'/>
      <target dir='mount_tag1'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </filesystem>

2.Only create dir for the first filesystem device:
#mkdir /path1
# ls /path2
ls: cannot access '/path2': No such file or directory

3.Start the guest:
# virsh start vm1
error: Failed to start domain vm1
error: internal error: the virtiofs export directory '/path2' does not exist

4.Check the virtiofsd process:
# ps aux | grep -i virtiofsd
root        3857  0.0  0.0 142192  5696 ?        Sl   08:05   0:00 /usr/libexec/virtiofsd --fd=23 -o source=/path1,xattr,flock,posix_lock

5.Create dir for the second filesystem device:
#mkdir /path2

6.Start guest again:
# virsh start vm1
error: Failed to start domain vm1
error: Cannot open log file: '/var/log/libvirt/qemu/vm1-fs0-virtiofsd.log': Device or resource busy

Comment 10 yafu 2020-06-29 07:25:32 UTC
Forgot to add needinfo for comment 9.

Comment 12 Ján Tomko 2020-12-08 10:55:47 UTC
Fixed upstream by

commit 5cde9dee8c70b17c458d031ab6cf71dce476eea2
Refs: v6.9.0-186-g5cde9dee8c
Author:     Masayoshi Mizuma <m.mizuma.com>
AuthorDate: Wed Nov 11 08:35:24 2020 -0500
Commit:     Michal Prívozník <mprivozn>
CommitDate: Wed Nov 11 15:20:12 2020 +0100

    qemu: Move qemuExtDevicesStop() before removing the pidfiles

    A qemu guest which has virtiofs config fails to start if the previous
    starting failed because of invalid option or something.

    That's because the virtiofsd isn't killed by virPidFileForceCleanupPath()
    on the former failure because the pidfile was already removed by
    virFileDeleteTree(priv->libDir) in qemuProcessStop(), so
    virPidFileForceCleanupPath() just returned.

    Move qemuExtDevicesStop() before virFileDeleteTree(priv->libDir) so that
    virPidFileForceCleanupPath() can kill virtiofsd correctly.

    For example of the reproduction:

      # virsh start guest
      error: Failed to start domain guest
      error: internal error: process exited while connecting to monitor: qemu-system-x86_64: -foo: invalid option

      ... fix the option ...

      # virsh start guest
      error: Failed to start domain guest
      error: Cannot open log file: '/var/log/libvirt/qemu/guest-fs0-virtiofsd.log': Device or resource busy
      #

    Signed-off-by: Masayoshi Mizuma <m.mizuma.com>
    Reviewed-by: Michal Privoznik <mprivozn>

Comment 13 Ján Tomko 2020-12-08 10:58:11 UTC

*** This bug has been marked as a duplicate of bug 1897105 ***

Comment 14 Yiding Liu (Fujitsu) 2021-03-17 07:01:14 UTC
Hi, all.

This issue still exist.
And this bugzilla is different with BZ#1897105

BZ#1897105: Failed to start a guest with virtiofsd ==> Correct error and start guest again ==> Hit error '/var/log/libvirt/qemu/XXX-virtiofsd.log': Device or resource busy'
It is not related to 'virsh destroy' and this error was fixed.

As BZ#1808697 title said 'virtiofsd doesn't get killed with 'virsh destroy'', something wrong with virsh destroy.
Login a guest with virtiofs ==> Run a xfstest cases ==> Interupt that case ==> virsh destroy guest ==> virsh start guest ==> Hit error '/var/log/libvirt/qemu/XXX-virtiofsd.log': Device or resource busy'
I can reproduce this error on RHEL8.4 and RHEL9. More details please refer to BZ#1938936

Comment 15 Luiz Capitulino 2021-03-17 17:49:58 UTC
(In reply to Yiding Liu (Fujitsu) from comment #14)

> As BZ#1808697 title said 'virtiofsd doesn't get killed with 'virsh
> destroy'', something wrong with virsh destroy.
> Login a guest with virtiofs ==> Run a xfstest cases ==> Interupt that case
> ==> virsh destroy guest ==> virsh start guest ==> Hit error
> '/var/log/libvirt/qemu/XXX-virtiofsd.log': Device or resource busy'
> I can reproduce this error on RHEL8.4 and RHEL9. More details please refer
> to BZ#1938936

Yiding,

Since the reproduction steps seem a bit different, would you open
a new BZ? Would add me to the CC list?

Thanks!

Comment 16 Yiding Liu (Fujitsu) 2021-03-18 02:30:19 UTC
(In reply to Luiz Capitulino from comment #15)
> (In reply to Yiding Liu (Fujitsu) from comment #14)
> 
> > As BZ#1808697 title said 'virtiofsd doesn't get killed with 'virsh
> > destroy'', something wrong with virsh destroy.
> > Login a guest with virtiofs ==> Run a xfstest cases ==> Interupt that case
> > ==> virsh destroy guest ==> virsh start guest ==> Hit error
> > '/var/log/libvirt/qemu/XXX-virtiofsd.log': Device or resource busy'
> > I can reproduce this error on RHEL8.4 and RHEL9. More details please refer
> > to BZ#1938936
> 
> Yiding,
> 
> Since the reproduction steps seem a bit different, would you open
> a new BZ? Would add me to the CC list?
> 
> Thanks!

Hi, Luiz.

The new BZ is BZ#1940276.
Since i could reproduce error on both x86 and aarch64, so i set HARDWARE as all. Not sure whether it is correct :)

Comment 17 Luiz Capitulino 2021-03-18 15:45:09 UTC
(In reply to Yiding Liu (Fujitsu) from comment #16)
> (In reply to Luiz Capitulino from comment #15)
> > (In reply to Yiding Liu (Fujitsu) from comment #14)
> > 
> > > As BZ#1808697 title said 'virtiofsd doesn't get killed with 'virsh
> > > destroy'', something wrong with virsh destroy.
> > > Login a guest with virtiofs ==> Run a xfstest cases ==> Interupt that case
> > > ==> virsh destroy guest ==> virsh start guest ==> Hit error
> > > '/var/log/libvirt/qemu/XXX-virtiofsd.log': Device or resource busy'
> > > I can reproduce this error on RHEL8.4 and RHEL9. More details please refer
> > > to BZ#1938936
> > 
> > Yiding,
> > 
> > Since the reproduction steps seem a bit different, would you open
> > a new BZ? Would add me to the CC list?
> > 
> > Thanks!
> 
> Hi, Luiz.
> 
> The new BZ is BZ#1940276.
> Since i could reproduce error on both x86 and aarch64, so i set HARDWARE as
> all. Not sure whether it is correct :)

This is correct. Thanks a lot for all this work, Yiding!


Note You need to log in before you can comment on or make changes to this bug.