Bug 2151869
Summary: | libvirt kills virtual machine on restart when 2M and 1G hugepages are mounted | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Jaroslav Suchanek <jsuchane> | |
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | |
libvirt sub component: | General | QA Contact: | liang cong <lcong> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | ailan, duclee, dzheng, gveitmic, haizhao, jdenemar, jsuchane, lcong, lmen, mprivozn, virt-maint, yafu, yalzhang, ymankad | |
Version: | 9.2 | Keywords: | Triaged, Upstream, ZStream | |
Target Milestone: | rc | Flags: | lcong:
needinfo+
|
|
Target Release: | 9.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | libvirt-9.0.0-1.el9 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 2123196 | |||
: | 2152083 2152084 2155189 (view as bug list) | Environment: | ||
Last Closed: | 2023-05-09 07:27:43 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | 9.0.0 | |
Embargoed: | ||||
Bug Depends On: | 2123196, 2155189 | |||
Bug Blocks: | 2132176, 2132177, 2132178, 2152083, 2152084 |
Description
Jaroslav Suchanek
2022-12-08 11:53:48 UTC
Merged upstream: 0377177c78 qemu_process.c: Propagate hugetlbfs mounts on reconnect 5853d70718 qemu_namespace: Introduce qemuDomainNamespaceSetupPath() 46b03819ae qemu_namespace: Fix a corner case in qemuDomainGetPreservedMounts() 687374959e qemu_namespace: Tolerate missing ACLs when creating a path in namespace v8.7.0-134-g0377177c78 And one follow up: 3478cca80e (tag: v8.8.0-rc2) qemuProcessReconnect: Don't build memory paths Hi Michal, I found issue when preverifying on upstream build libvirtv8.10.0-126-g8908615ef3, since the current upstream build is two month later than your commit, I am not sure if other patch affect the code, pls help to clarify, thx. Verify steps: 1. Prepare huge page memory: # echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages # echo 2048 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 2. Define a guest with below memorybacking xml. <memoryBacking> <hugepages> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> 3. Start the VM and stop virtqemud # virsh start vm1 && systemctl stop virtqemud Domain 'vm1' started Warning: Stopping virtqemud.service, but it can still be activated by: virtqemud-admin.socket virtqemud.socket virtqemud-ro.socket 4. Mount 1G hugepage path # mkdir /dev/hugepages1G # mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages1G 5. Do virsh list and guest still in running state. # virsh -r list --all Id Name State ---------------------- 1 vm1 running # virsh -r list --all Id Name State ---------------------- 1 vm1 running 6. Prepare memory device hotplug xml like below: # cat dimm1G.xml <memory model='dimm'> <source> <pagesize unit='KiB'>1048576</pagesize> <nodemask>0-1</nodemask> </source> <target> <size unit='KiB'>1048576</size> <node>0</node> </target> </memory> 7. Hotplug dimm memory device: # virsh attach-device vm1 dimm1G.xml Device attached successfully 8. Prepare memory device with 2M hugepage source hotplug xml like below: # cat dimm2M.xml <memory model='dimm'> <source> <pagesize unit='KiB'>2048</pagesize> <nodemask>0-1</nodemask> </source> <target> <size unit='KiB'>1048576</size> <node>0</node> </target> </memory> 9. Hotplug dimm memory device: # virsh attach-device vm1 dimm1G.xml error: Failed to attach device from dimm1G.xml error: internal error: unable to execute QEMU command 'object-add': can't open backing store /dev/hugepages1G/libvirt/qemu/1-vm1 for guest RAM: No such file or directory Hi michal, I found an issue when preverify on scratch build: libvirt-8.10.0-3.el9_rc.655748269d.x86_64 Test steps: Verify steps: 1. Prepare huge page memory: # echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages # echo 3072 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 2. Define a guest with below memorybacking xml. <memoryBacking> <hugepages> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> 3. Start the VM and stop virtqemud # virsh start vm1 && systemctl stop virtqemud Domain 'vm1' started Warning: Stopping virtqemud.service, but it can still be activated by: virtqemud-admin.socket virtqemud.socket virtqemud-ro.socket 4. Mount 1G hugepage path # mkdir /dev/hugepages1G # mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages1G 5. Do virsh list and guest still in running state. # virsh -r list --all Id Name State ---------------------- 1 vm1 running # virsh -r list --all Id Name State ---------------------- 1 vm1 running 6. Prepare memory device hotplug xml like below: # cat dimm1G.xml <memory model='dimm'> <source> <pagesize unit='KiB'>1048576</pagesize> <nodemask>0-1</nodemask> </source> <target> <size unit='KiB'>1048576</size> <node>0</node> </target> </memory> 7. Hotplug dimm memory device: # virsh attach-device vm1 dimm1G.xml Device attached successfully 8. Prepare memory device hotplug xml like below: # cat dimm1G.xml <memory model='dimm'> <source> <pagesize unit='KiB'>2048</pagesize> <nodemask>0-1</nodemask> </source> <target> <size unit='KiB'>1048576</size> <node>0</node> </target> </memory> 9. Hotplug dimm memory device: # virsh attach-device vm1 dimm2M.xml Device attached successfully 10. Shut off guest vm # virsh destroy vm1 Domain 'vm1' destroyed 11. Restart virtqemud # systemctl restart virtqemud 12. Start guest vm again # virsh start vm1 error: Failed to start domain 'vm1' error: internal error: Process exited prior to exec: libvirt: QEMU Driver error : failed to umount devfs on /dev: Device or resource busy Addition info: I didn't find this issue on bug preverification of bug#2152083 bug#2152084 > Addition info: I didn't find this issue on bug preverification of > bug#2152083 bug#2152084 Yeah, that is older RHEL (9.0.0 and 9.1.0). Similarly, this worked for RHEL-8 (bug 2123196). I wonder what has changed. Let me debug. Meanwhile - what's your kernel version and output of 'mount' please? (In reply to Michal Privoznik from comment #9) > > Addition info: I didn't find this issue on bug preverification of > > bug#2152083 bug#2152084 > > Yeah, that is older RHEL (9.0.0 and 9.1.0). Similarly, this worked for > RHEL-8 (bug 2123196). I wonder what has changed. Let me debug. Meanwhile - > what's your kernel version and output of 'mount' please? kernel version: 5.14.0-200.el9.x86_64 I retest on build libvirt-8.10.0-3.el9_rc.655748269d.x86_64, qemu-kvm-7.1.0-6.el9.x86_64 with mount info as below: 1. Prepare huge page memory: # echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages # echo 3072 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 2. Define a guest with below memorybacking xml. <memoryBacking> <hugepages> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> 3. Start the VM and stop virtqemud # virsh start vm1 && systemctl stop virtqemud Domain 'vm1' started Warning: Stopping virtqemud.service, but it can still be activated by: virtqemud-admin.socket virtqemud.socket virtqemud-ro.socket 4. Mount 1G hugepage path # mkdir /dev/hugepages1G # mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages1G 5. Do virsh list and guest still in running state. # virsh -r list --all Id Name State ---------------------- 1 vm1 running # virsh -r list --all Id Name State ---------------------- 1 vm1 running 6. check mount info: # mount | grep dev proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel) devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=4096k,nr_inodes=1048576,mode=755,inode64) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel,inode64) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,size=13059680k,nr_inodes=819200,mode=755,inode64) cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime,seclabel) bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700) /dev/mapper/rhel_dell--per740xd--16-root on / type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota) debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime,seclabel) mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime,seclabel) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel,pagesize=2M) none on /run/credentials/systemd-sysusers.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700) /dev/mapper/rhel_dell--per740xd--16-home on /home type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota) /dev/sda1 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota) tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=6529836k,nr_inodes=1632459,mode=700,inode64) hugetlbfs on /dev/hugepages1G type hugetlbfs (rw,relatime,seclabel,pagesize=1024M) 7. Check mount namespace of qemu # cat /proc/`pidof qemu-kvm`/mountinfo | grep dev 436 435 253:0 / / rw,relatime master:1 - xfs /dev/mapper/rhel_dell--per740xd--16-root rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota 437 436 0:22 / /sys rw,nosuid,nodev,noexec,relatime master:2 - sysfs sysfs rw,seclabel 438 437 0:6 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime master:3 - securityfs securityfs rw 439 437 0:26 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime master:4 - cgroup2 cgroup2 rw,seclabel,nsdelegate,memory_recursiveprot 440 437 0:27 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime master:5 - pstore pstore rw,seclabel 441 437 0:28 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime master:6 - bpf bpf rw,mode=700 443 437 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime master:14 - debugfs debugfs rw,seclabel 444 437 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime master:17 - tracefs tracefs rw,seclabel 445 437 0:32 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime master:18 - fusectl fusectl rw 446 437 0:33 / /sys/kernel/config rw,nosuid,nodev,noexec,relatime master:19 - configfs configfs rw 448 460 0:23 / /dev/shm rw,nosuid,nodev master:9 - tmpfs tmpfs rw,seclabel,inode64 449 460 0:24 / /dev/pts rw,nosuid,noexec,relatime master:10 - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000 450 460 0:19 / /dev/mqueue rw,nosuid,nodev,noexec,relatime master:15 - mqueue mqueue rw,seclabel 451 460 0:31 / /dev/hugepages rw,relatime master:16 - hugetlbfs hugetlbfs rw,seclabel,pagesize=2M 452 436 0:25 / /run rw,nosuid,nodev master:11 - tmpfs tmpfs rw,seclabel,size=13059680k,nr_inodes=819200,mode=755,inode64 453 452 0:34 / /run/credentials/systemd-sysusers.service ro,nosuid,nodev,noexec,relatime master:20 - ramfs none rw,mode=700 454 452 0:39 / /run/user/0 rw,nosuid,nodev,relatime master:192 - tmpfs tmpfs rw,seclabel,size=6529836k,nr_inodes=1632459,mode=700,inode64 455 436 0:21 / /proc rw,nosuid,nodev,noexec,relatime master:12 - proc proc rw 457 436 253:2 / /home rw,relatime master:39 - xfs /dev/mapper/rhel_dell--per740xd--16-home rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota 458 436 8:1 / /boot rw,relatime master:45 - xfs /dev/sda1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota 460 436 0:41 / /dev rw,nosuid,relatime - tmpfs devfs rw,seclabel,size=64k,mode=755,inode64 8. Prepare memory device hotplug xml like below: # cat dimm1G.xml <memory model='dimm'> <source> <pagesize unit='KiB'>1048576</pagesize> <nodemask>0-1</nodemask> </source> <target> <size unit='KiB'>1048576</size> <node>0</node> </target> </memory> 9. Hotplug dimm memory device: # virsh attach-device vm1 dimm1G.xml error: Failed to attach device from dimm1G.xml error: internal error: unable to execute QEMU command 'object-add': can't open backing store /dev/hugepages1G/libvirt/qemu/1-vm1 for guest RAM: No such file or directory Seems the mount is not propagate to qemu process. I will try on latest rhel9.2 to see if any changes. I tested on build: # rpm -q libvirt qemu-kvm libvirt-8.10.0-3.el9_rc.655748269d.x86_64 qemu-kvm-7.1.0-6.el9.x86_64 kernel: 5.14.0-212.el9.x86_64 still get the same issue with comment#10 Indeed. There's something terribly broken (in kernel perhaps?): kernel-5.14.0-212.el9.x86_64 systemd-252-2.el9.x86_64 1) just to make sure mount events are being propagated: # mount --make-rshared / 2) check hugetlbfs mounts: # mount | grep huge hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel,pagesize=2M) 3) in another terminal, unshare mount namespace: # unshare -m 4) verify that the hugetlbfs is mounted (from inside the NS): # mount | grep huge hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel,pagesize=2M) 5) now, mount another hugetlbfs (from the parent NS, NOT the one just unshared): # mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages1G 6) verify the mount got propagated (from the NS): # mount | grep huge hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel,pagesize=2M) Therefore, this is completely independent of libvirt and should be reported against kernel for further investigation. Then, we can revisit our bug. Alright, after thorough investigation I've merged two commits that fix the problem: 4a91324b61 qemu_namespace: Fix detection of nested mount points 379c0ce4bf qemu_namespace: Umount the original /dev before replacing it with tmpfs v8.10.0-174-g4a91324b61 Preverified on upstream build libvirt v8.10.0-176-g6cd2b4e101, # rpm -q qemu-kvm qemu-kvm-7.2.0-1.fc38.x86_64 Verify steps: 1. Prepare huge page memory: # echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages # echo 3072 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 2. Define a guest with below memorybacking xml. <memoryBacking> <hugepages> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> 3. Start the VM and stop virtqemud # virsh start vm1 && systemctl stop virtqemud Domain 'vm1' started Warning: Stopping virtqemud.service, but it can still be activated by: virtqemud-ro.socket virtqemud-admin.socket virtqemud.socket 4. Mount 1G hugepage path # mkdir /dev/hugepages1G # mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages1G 5. Do virsh list and guest still in running state. # virsh -r list --all Id Name State ---------------------- 1 vm1 running # virsh -r list --all Id Name State ---------------------- 1 vm1 running 6. Prepare memory device hotplug xml like below: # cat dimm1G.xml <memory model='dimm'> <source> <pagesize unit='KiB'>1048576</pagesize> <nodemask>0-1</nodemask> </source> <target> <size unit='KiB'>1048576</size> <node>0</node> </target> </memory> 7. Hotplug dimm memory device: # virsh attach-device vm1 dimm1G.xml Device attached successfully 8. Prepare memory device with 2M hugepage source hotplug xml like below: # cat dimm2M.xml <memory model='dimm'> <source> <pagesize unit='KiB'>2048</pagesize> <nodemask>0-1</nodemask> </source> <target> <size unit='KiB'>1048576</size> <node>0</node> </target> </memory> 9. Hotplug dimm memory device: # virsh attach-device vm1 dimm2M.xml Device attached successfully 10. Shutoff vm # virsh destroy vm1 Domain vm1 destroyed 11. Restart virtqemud # systemctl restart virtqemud 12. Start vm # virsh start vm1 Domain 'vm1' started Also check the below scenarios: Steps: 1. memory backing 2M guest vm start -> stop virtqemud -> mount 1G path -> start virtqemud -> hotplug 1G dimm -> restart vm -> restart virtqemud -> hotplug 1G dimm 2. mount 1G path -> memory backing 2M guest vm start -> restart virtqemud -> hogplug 1G dimm -> restart virtqemud -> restart vm ->hogplug 1G dimm Tested with these settings:remember_owner=1 or 0, memfd memory backing, default memory backing, 1G hugepage memory backing, 1G hugepage path as /mnt/hugepages1G (In reply to Michal Privoznik from comment #15) > Alright, after thorough investigation I've merged two commits that fix the > problem: > > 4a91324b61 qemu_namespace: Fix detection of nested mount points > 379c0ce4bf qemu_namespace: Umount the original /dev before replacing it with > tmpfs > > v8.10.0-174-g4a91324b61 Hi michal, Shall we backport these 2 patches to rhel9.1, rhel9.0 or rhel8.6, rhel8.7, rhel8.8 builds? And could you help to clarify why I did not see the issue fixed by these 2 patches on rhel9.1 and rhel9.0 zstream build? Thx a lot. (In reply to liang cong from comment #17) > Hi michal, > Shall we backport these 2 patches to rhel9.1, rhel9.0 or rhel8.6, rhel8.7, > rhel8.8 builds? I'd rather avoid that. This was a very marginal scenario to begin with and the bug that those two commits fix is just a portion on that scenario. I think it's safe to assume nobody will run into that situation. > And could you help to clarify why I did not see the issue fixed by these 2 > patches on rhel9.1 and rhel9.0 zstream build? Because of how things work under the hood. What the first patch fixes is the following: previously, libvirt would fail to see that /dev/hugepages and /dev/hugepages1G are two different dirs, because it used plain prefix comparison. And yes, the former is prefix of the other. If you'd use different dirs, e.g. /dev/hugepages and /dev/myhugepages1G then everything would work even without those two patches. Having said all of this, I think we can agree that this is very rare situation (and we even document that users should set up their mount points upfront). Therefore, I think we don't need any more backports. Verified on build libvirt-9.0.0-3.el9.x86_64 Verify steps: 1. Prepare huge page memory: # echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages # echo 3072 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 2. Define a guest with below memorybacking xml. <memoryBacking> <hugepages> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> 3. Start the VM and stop virtqemud # virsh start vm1 && systemctl stop virtqemud Domain 'vm1' started Warning: Stopping virtqemud.service, but it can still be activated by: virtqemud-admin.socket virtqemud.socket virtqemud-ro.socket 4. Mount 1G hugepage path # mkdir /dev/hugepages1G # mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages1G 5. Do virsh list and guest still in running state. # virsh -r list --all Id Name State ---------------------- 1 vm1 running # virsh -r list --all Id Name State ---------------------- 1 vm1 running 6. Prepare memory device hotplug xml like below: # cat dimm1G.xml <memory model='dimm'> <source> <pagesize unit='KiB'>1048576</pagesize> <nodemask>0-1</nodemask> </source> <target> <size unit='KiB'>1048576</size> <node>0</node> </target> </memory> 7. Hotplug dimm memory device: # virsh attach-device vm1 dimm1G.xml Device attached successfully 8. Prepare memory device with 2M hugepage source hotplug xml like below: # cat dimm2M.xml <memory model='dimm'> <source> <pagesize unit='KiB'>2048</pagesize> <nodemask>0-1</nodemask> </source> <target> <size unit='KiB'>1048576</size> <node>0</node> </target> </memory> 9. Hotplug dimm memory device: # virsh attach-device vm1 dimm2M.xml Device attached successfully 10. Shutoff vm # virsh destroy vm1 Domain vm1 destroyed 11. Restart virtqemud # systemctl restart virtqemud 12. Start vm # virsh start vm1 Domain 'vm1' started Also check the below scenarios: Steps: 1. memory backing 2M guest vm start -> stop virtqemud -> mount 1G path -> start virtqemud -> hotplug 1G dimm -> restart vm -> restart virtqemud -> hotplug 1G dimm 2. mount 1G path -> memory backing 2M guest vm start -> restart virtqemud -> hogplug 1G dimm -> restart virtqemud -> restart vm ->hogplug 1G dimm Tested with these settings:remember_owner=1 or 0, memfd memory backing, default memory backing, 1G hugepage memory backing, 1G hugepage path as /mnt/hugepages1G Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2171 |