Bug 2167302

Summary: Instance spawn failing with "Failed to build and run instance: libvirt.libvirtError: internal error: Process exited prior to exec: libvirt: QEMU Driver error : failed to umount devfs on /dev: Device or resource busy"
Product: Red Hat Enterprise Linux 9 Reporter: Sandeep Yadav <sandyada>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
libvirt sub component: General QA Contact: zhentang <zhetang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: ailan, chhu, dasmith, dsmigiel, dzheng, eglynn, jdenemar, jhakimra, juzhou, kchamart, kthakre, lhuang, lmen, mprivozn, mxie, pgrist, pvlasin, rlandy, sbauza, sgordon, smooney, tyan, virt-maint, vromanso, wznoinsk, yicui, zhetang
Version: 9.2Keywords: Triaged, Upstream
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-9.0.0-4.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 07:27:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 8 Michal Privoznik 2023-02-07 12:58:54 UTC
This is genuine libvirt bug. Fairly easy to reproduce:

  mount -t tmpfs tmpfs /dev/shm && mount -t tmpfs tmpfs /dev/shm

Let me move it over to libivirt.

Comment 10 Michal Privoznik 2023-02-07 15:03:15 UTC
Patch posted on the list:

https://listman.redhat.com/archives/libvir-list/2023-February/237603.html

Comment 12 Michal Privoznik 2023-02-08 07:49:29 UTC
Aaaand I just pushed it as:

commit 5155ab4b2a704285505dfea6ffee8b980fdaa29e
Author:     Michal Prívozník <mprivozn>
AuthorDate: Tue Feb 7 15:06:32 2023 +0100
Commit:     Michal Prívozník <mprivozn>
CommitDate: Wed Feb 8 08:39:17 2023 +0100

    qemu_namespace: Deal with nested mounts when umount()-ing /dev
    
    In one of recent commits (v9.0.0-rc1~106) I've made our QEMU
    namespace code umount the original /dev. One of the reasons was
    enhanced security, because previously we just mounted a tmpfs
    over the original /dev. Thus a malicious QEMU could just
    umount("/dev") and it would get to the original /dev with all
    nodes.
    
    Now, on some systems this introduced a regression:
    
       failed to umount devfs on /dev: Device or resource busy
    
    But how this could be? We've moved all file systems mounted under
    /dev to a temporary location. Or have we? As it turns out, not
    quite. If there are two file systems mounted on the same target,
    e.g. like this:
    
      mount -t tmpfs tmpfs /dev/shm/ && mount -t tmpfs tmpfs /dev/shm/
    
    then only the top most (i.e. the last one) is moved. See
    qemuDomainUnshareNamespace() for more info.
    
    Now, we could enhance our code to deal with these "doubled" mount
    points. Or, since it is the top most file system that is
    accessible anyways (and this one is preserved), we can
    umount("/dev") in a recursive fashion.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2167302
    Fixes: 379c0ce4bfed8733dfbde557c359eecc5474ce38
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Jim Fehlig <jfehlig>

v9.0.0-184-g5155ab4b2a

Comment 14 zhentang 2023-02-08 09:46:39 UTC
tested on libvirt-9.0.0-3.el9
can reproduce the bug

Test steps:

[root@zhetang-mig ~]# mount -t tmpfs tmpfs /dev/shm && mount -t tmpfs tmpfs /dev/shm

[root@zhetang-mig ~]# virsh start agent-test
error: Failed to start domain 'agent-test'
error: internal error: Process exited prior to exec: libvirt: QEMU Driver error : failed to umount devfs on /dev: Device or resource busy

Comment 19 zhentang 2023-02-10 06:35:51 UTC
pre-verified on libvirt-9.0.0-4.el9
vm can start successfully. 

[root@zhetang-zbug ~]# mount -t tmpfs tmpfs /dev/shm && mount -t tmpfs tmpfs /dev/shm
[root@zhetang-zbug ~]# virsh start agent-test
Domain 'agent-test' started

Comment 22 Waldemar Znoinski 2023-02-15 11:41:44 UTC
*** Bug 2170011 has been marked as a duplicate of this bug. ***

Comment 23 zhoujunqin 2023-02-16 05:51:49 UTC
Add additional test results with libvirt_osp integration test: PASS

1. I Hit this bug in the RHEL9.2_OSP17.1 pre-integration test last week.

Package version: libvirt version: libvirt-9.0.0-3.el9
OSP puddle tag: 17.1_20230130.1

Test result: Failed to boot up the instance.
Getting the following error from: [root@compute-0 qemu]# vi /var/log/libvirt/qemu/instance-00000001.log.1 

...
2023-02-09 06:57:50.681+0000: starting up libvirt version: 9.0.0, package: 3.el9 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2023-02-01-08:06:20, ), qemu version: 7.2.0qemu-kvm-7.2.0-7.el9, kernel: 5.14.0-244.el9.x86_64, hostname: compute-0.localdomain
LC_ALL=C \
...
-msg timestamp=on
libvirt: QEMU Driver error : failed to umount devfs on /dev: Device or resource busy
2023-02-09 06:57:50.685+0000: shutting down, reason=failed

2. Verification: PASS
After this bug was fixed, I update the libvirt version of compute node to: libvirt-daemon-9.0.0-4.el9.x86_64

Test results: The instance can create and boot up successfully.
Automation job: PASS
[1]https://libvirt-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/INT%20Runtest/view/Libvirt-OSP/job/RHEL-9.2-OSP-17.1-runtest-libvirt-hotplug/1/
[2]https://libvirt-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/INT%20Runtest/view/Libvirt-OSP/job/RHEL-9.2-OSP-17.1-runtest-libvirt-life-cycle/
[3]https://libvirt-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/INT%20Runtest/view/Libvirt-OSP/job/RHEL-9.2-OSP-17.1-runtest-libvirt-snapshot/

Comment 24 zhentang 2023-02-16 06:02:42 UTC
verified on libvirt-9.0.0-5.el9
guest can start correctly

1. mount tmpfs on /dev

[root@zhetang-zbug ~]# mount -t tmpfs tmpfs /dev/shm/ && mount -t tmpfs tmpfs /dev/shm/
[root@zhetang-zbug ~]# mount | grep tmpfs
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=4096k,nr_inodes=1048576,mode=755,inode64)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel,inode64)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,size=1574480k,nr_inodes=819200,mode=755,inode64)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=787236k,nr_inodes=196809,mode=700,inode64)
tmpfs on /dev/shm type tmpfs (rw,relatime,seclabel,inode64)
tmpfs on /dev/shm type tmpfs (rw,relatime,seclabel,inode64)

2. start guest vm

[root@zhetang-zbug ~]# virsh start agent-test
Domain 'agent-test' started

[root@zhetang-zbug ~]# virsh list 
 Id   Name         State
----------------------------
 2    agent-test   running


[root@zhetang-zbug ~]# cat /proc/5127/mounts | grep tmpfs
tmpfs /dev/shm tmpfs rw,seclabel,relatime,inode64 0 0
tmpfs /run tmpfs rw,seclabel,nosuid,nodev,size=1574480k,nr_inodes=819200,mode=755,inode64 0 0
tmpfs /run/user/0 tmpfs rw,seclabel,nosuid,nodev,relatime,size=787236k,nr_inodes=196809,mode=700,inode64 0 0
devfs /dev tmpfs rw,seclabel,nosuid,relatime,size=64k,mode=755,inode64 0 0

Comment 26 errata-xmlrpc 2023-05-09 07:27:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171