When 'virsh destroy'ing a guest that has a virtiofs file system configured the virtiofsd daemon isn't killed. The next attempt to start the guest fails with error: Failed to start domain DOMAIN error: Cannot open log file: '/var/log/libvirt/qemu/DOMAIN-fs1-virtiofsd.log': Device or resource busy To start the guest again one must first do something like # kill -9 $(pidof virtiofsd) This issue was found on an AArch64 machine while testing virtiofs in the guest with xfstests. xfstests hung and the guest had to be destroyed. Maybe the fact that virtiofsd was still in use at the time of the 'virsh destroy' also plays a part.
virtiofsd terminates when the vhost-user UNIX domain socket connection is closed. This should happen automatically when the QEMU process terminates. I wonder what is happening. Does libvirtd still have the fd open?
That fd is held open by virtlogd "on behalf" of virtiofsd while virtiofsd is running. A possible problem on the libvirtd side could be that libvirtd sends SIGTERM (and later SIGKILL) only to the process it creates, not to the fork virtiofsd creates. Any mentions of ProcessKill in libvirtd log? Was there just one leftover virtiofsd or two?
(In reply to Ján Tomko from comment #2) > Any mentions of ProcessKill in libvirtd log? Was there just one leftover > virtiofsd or two? I don't have the log any more. I only had to kill a single virtiofsd.
Hi Jano, What are your plans for this BZ? Do you need help from Virt/ARM to make progress? We (the Virt/ARM team), think that this may turn out to be more serious than it seems. I think the worst case would be the customer having several VMs that are destroyed & restarted relatively often. This could create several dozens of virtiofds processes lying around. We're also not sure how easily the workaround can be applied in this case, since we have to match virtiofds PID to the VMs being destroyed.
A reliable reproducer or at least logs would be nice. The workaround of killing the whole process group of virtiofsd could help, but it won't really solve the underlying bug, that is why didn't virtiofsd exit when QEMU did.
Drew, Do you think it's worth spending time on trying to reproduce? I mean, if it's too of a corner case, they we may treat this as very low priority.
(In reply to Luiz Capitulino from comment #6) > Drew, > > Do you think it's worth spending time on trying > to reproduce? > > I mean, if it's too of a corner case, they we > may treat this as very low priority. I tried to reproduce, but no luck. I was on a different host, but I don't know how that would matter. No xfstest caused the guest to hang this time, although a few were either very slow or looping (generic/069, generic/074, generic/089). I gave up testing after attempting to destroy the guest during each of those long running tests and not getting any orphaned virtiofsd's. Before trying xfstests I also tried just doing a dd in the virtiofs file system and destroying it, which didn't reproduce either. If we're dealing with a race then it might be tricky to reproduce reliably, and, as it's possible to manually kill the virtiofsds as a workaround, then the priority of this bug can probably be lowered.
Thanks, Drew. Lowering the priority.
Hi Ján, I can reproduce the issue with the following steps: 1.Prepare a guest with two filesystem device: #virsh edit vm1 <filesystem type='mount' accessmode='passthrough'> <driver type='virtiofs' queue='1024'/> <binary path='/usr/libexec/virtiofsd' xattr='on'> <lock posix='on' flock='on'/> </binary> <source dir='/path1'/> <target dir='mount_tag'/> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </filesystem> <filesystem type='mount' accessmode='passthrough'> <driver type='virtiofs'/> <binary path='/usr/libexec/virtiofsd' xattr='off'> <cache mode='none'/> <lock posix='on' flock='on'/> </binary> <source dir='/path2'/> <target dir='mount_tag1'/> <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/> </filesystem> 2.Only create dir for the first filesystem device: #mkdir /path1 # ls /path2 ls: cannot access '/path2': No such file or directory 3.Start the guest: # virsh start vm1 error: Failed to start domain vm1 error: internal error: the virtiofs export directory '/path2' does not exist 4.Check the virtiofsd process: # ps aux | grep -i virtiofsd root 3857 0.0 0.0 142192 5696 ? Sl 08:05 0:00 /usr/libexec/virtiofsd --fd=23 -o source=/path1,xattr,flock,posix_lock 5.Create dir for the second filesystem device: #mkdir /path2 6.Start guest again: # virsh start vm1 error: Failed to start domain vm1 error: Cannot open log file: '/var/log/libvirt/qemu/vm1-fs0-virtiofsd.log': Device or resource busy
Forgot to add needinfo for comment 9.
Fixed upstream by commit 5cde9dee8c70b17c458d031ab6cf71dce476eea2 Refs: v6.9.0-186-g5cde9dee8c Author: Masayoshi Mizuma <m.mizuma.com> AuthorDate: Wed Nov 11 08:35:24 2020 -0500 Commit: Michal Prívozník <mprivozn> CommitDate: Wed Nov 11 15:20:12 2020 +0100 qemu: Move qemuExtDevicesStop() before removing the pidfiles A qemu guest which has virtiofs config fails to start if the previous starting failed because of invalid option or something. That's because the virtiofsd isn't killed by virPidFileForceCleanupPath() on the former failure because the pidfile was already removed by virFileDeleteTree(priv->libDir) in qemuProcessStop(), so virPidFileForceCleanupPath() just returned. Move qemuExtDevicesStop() before virFileDeleteTree(priv->libDir) so that virPidFileForceCleanupPath() can kill virtiofsd correctly. For example of the reproduction: # virsh start guest error: Failed to start domain guest error: internal error: process exited while connecting to monitor: qemu-system-x86_64: -foo: invalid option ... fix the option ... # virsh start guest error: Failed to start domain guest error: Cannot open log file: '/var/log/libvirt/qemu/guest-fs0-virtiofsd.log': Device or resource busy # Signed-off-by: Masayoshi Mizuma <m.mizuma.com> Reviewed-by: Michal Privoznik <mprivozn>
*** This bug has been marked as a duplicate of bug 1897105 ***
Hi, all. This issue still exist. And this bugzilla is different with BZ#1897105 BZ#1897105: Failed to start a guest with virtiofsd ==> Correct error and start guest again ==> Hit error '/var/log/libvirt/qemu/XXX-virtiofsd.log': Device or resource busy' It is not related to 'virsh destroy' and this error was fixed. As BZ#1808697 title said 'virtiofsd doesn't get killed with 'virsh destroy'', something wrong with virsh destroy. Login a guest with virtiofs ==> Run a xfstest cases ==> Interupt that case ==> virsh destroy guest ==> virsh start guest ==> Hit error '/var/log/libvirt/qemu/XXX-virtiofsd.log': Device or resource busy' I can reproduce this error on RHEL8.4 and RHEL9. More details please refer to BZ#1938936
(In reply to Yiding Liu (Fujitsu) from comment #14) > As BZ#1808697 title said 'virtiofsd doesn't get killed with 'virsh > destroy'', something wrong with virsh destroy. > Login a guest with virtiofs ==> Run a xfstest cases ==> Interupt that case > ==> virsh destroy guest ==> virsh start guest ==> Hit error > '/var/log/libvirt/qemu/XXX-virtiofsd.log': Device or resource busy' > I can reproduce this error on RHEL8.4 and RHEL9. More details please refer > to BZ#1938936 Yiding, Since the reproduction steps seem a bit different, would you open a new BZ? Would add me to the CC list? Thanks!
(In reply to Luiz Capitulino from comment #15) > (In reply to Yiding Liu (Fujitsu) from comment #14) > > > As BZ#1808697 title said 'virtiofsd doesn't get killed with 'virsh > > destroy'', something wrong with virsh destroy. > > Login a guest with virtiofs ==> Run a xfstest cases ==> Interupt that case > > ==> virsh destroy guest ==> virsh start guest ==> Hit error > > '/var/log/libvirt/qemu/XXX-virtiofsd.log': Device or resource busy' > > I can reproduce this error on RHEL8.4 and RHEL9. More details please refer > > to BZ#1938936 > > Yiding, > > Since the reproduction steps seem a bit different, would you open > a new BZ? Would add me to the CC list? > > Thanks! Hi, Luiz. The new BZ is BZ#1940276. Since i could reproduce error on both x86 and aarch64, so i set HARDWARE as all. Not sure whether it is correct :)
(In reply to Yiding Liu (Fujitsu) from comment #16) > (In reply to Luiz Capitulino from comment #15) > > (In reply to Yiding Liu (Fujitsu) from comment #14) > > > > > As BZ#1808697 title said 'virtiofsd doesn't get killed with 'virsh > > > destroy'', something wrong with virsh destroy. > > > Login a guest with virtiofs ==> Run a xfstest cases ==> Interupt that case > > > ==> virsh destroy guest ==> virsh start guest ==> Hit error > > > '/var/log/libvirt/qemu/XXX-virtiofsd.log': Device or resource busy' > > > I can reproduce this error on RHEL8.4 and RHEL9. More details please refer > > > to BZ#1938936 > > > > Yiding, > > > > Since the reproduction steps seem a bit different, would you open > > a new BZ? Would add me to the CC list? > > > > Thanks! > > Hi, Luiz. > > The new BZ is BZ#1940276. > Since i could reproduce error on both x86 and aarch64, so i set HARDWARE as > all. Not sure whether it is correct :) This is correct. Thanks a lot for all this work, Yiding!