Bug 1495511
Summary: | Guest could not start with hugepages | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Junxiang Li <junli> | ||||
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||
Status: | CLOSED ERRATA | QA Contact: | chhu | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.5 | CC: | dyuan, dzheng, gsun, junli, lhuang, mprivozn, rbalakri, xuzhang, yalzhang, yisun | ||||
Target Milestone: | rc | Keywords: | Automation, Regression, Upstream | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-3.9.0-1.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-04-10 10:57:19 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Junxiang Li
2017-09-26 08:44:46 UTC
I'm unable to reproduce this issue. Can you please attach full debug logs? http://wiki.libvirt.org/page/DebugLogs I also tried to reproduce this bug with the same version on x86_64. qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64 libvirt-3.7.0-2.el7.x86_64 But I can't reproduce it in my two x86_64 servers. Created attachment 1336200 [details]
failed log
(In reply to Michal Privoznik from comment #3) > I'm unable to reproduce this issue. Can you please attach full debug logs? > > http://wiki.libvirt.org/page/DebugLogs Sry, between the step 4 and step 5, it needs to restart the libvirtd.service. 4.5 systemctl restart libvirtd I tried more test on x86_64. If I appended intel_iommu=on default_hugepagesz=2M hugepagesz=2M hugepages=2000 to GRUB_CMDLINE_LINUX line in /etc/default/grub. Then, I did the configuration as described in the bug. It can work correctly. But if the configuration in /etc/default/grub was not added or "intel_iommu=on default_hugepagesz=1G hugepages=20 hugepagesz=1G hugepagesz=2M" was appended (note: it's 1G as default), error messages prompted - [root@intel-wildcatpass-03 ~]# virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: internal error: Unable to find any usable hugetlbfs mount for 2048 KiB Forgot to mention that - the /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages exist when or "intel_iommu=on default_hugepagesz=1G hugepages=20 hugepagesz=1G hugepagesz=2M" was appended. But the error happened finally. (In reply to Jing Qi from comment #7) > I tried more test on x86_64. > If I appended intel_iommu=on default_hugepagesz=2M hugepagesz=2M > hugepages=2000 to GRUB_CMDLINE_LINUX line in /etc/default/grub. Then, I did > the configuration as described in the bug. It can work correctly. > > But if the configuration in /etc/default/grub was not added or > "intel_iommu=on default_hugepagesz=1G hugepages=20 hugepagesz=1G > hugepagesz=2M" was appended (note: it's 1G as default), error messages > prompted - > > [root@intel-wildcatpass-03 ~]# virsh start avocado-vt-vm1 > error: Failed to start domain avocado-vt-vm1 > error: internal error: Unable to find any usable hugetlbfs mount for 2048 KiB This is expected. You don't have any hugetlbfs mounted for handling 2MB huge pages. If you mount it: mount -t hugetlbfs hugetlbfs /path/to/hugepages2M -o pagesize=2M and restart libvirt, you should be able to start the domain. Well, as long as you allow it to use 2MB hugepages (because the domain XML requires 16MB). I wonder if this is a namespace bug. I mean, libvirt is creating the path as can be seen in the log: 2017-10-09 08:30:40.680+0000: 85039: debug : virFileMakePathHelper:2912 : path=/dev/hugepages16M/libvirt/qemu/2-junli1495511 mode=0700 2017-10-09 08:30:40.680+0000: 85039: debug : virFileMakePathHelper:2912 : path=/dev/hugepages16M/libvirt/qemu mode=0700 2017-10-09 08:30:40.680+0000: 85039: info : virSecuritySELinuxSetFileconHelper:1156 : Setting SELinux context on '/dev/hugepages16M/libvirt/qemu/2-junli1495511' to 'system_u:object_r:svirt_image_t:s0:c21,c878' 2017-10-09 08:30:40.681+0000: 85039: info : virSecurityDACSetOwnershipInternal:556 : Setting DAC user and group on '/dev/hugepages16M/libvirt/qemu/2-junli1495511' to '107:107' However, by default, qemu runs with a limited copy of /dev. So perhaps libvirt is not preserving the /dev/hugepages16M/ path? You can check this by disabling mount namespace. Just set namespaces = [] in qemu.conf (don't forget to restart libvirt). (In reply to Michal Privoznik from comment #9) > (In reply to Jing Qi from comment #7) > > I tried more test on x86_64. > > If I appended intel_iommu=on default_hugepagesz=2M hugepagesz=2M > > hugepages=2000 to GRUB_CMDLINE_LINUX line in /etc/default/grub. Then, I did > > the configuration as described in the bug. It can work correctly. > > > > But if the configuration in /etc/default/grub was not added or > > "intel_iommu=on default_hugepagesz=1G hugepages=20 hugepagesz=1G > > hugepagesz=2M" was appended (note: it's 1G as default), error messages > > prompted - > > > > [root@intel-wildcatpass-03 ~]# virsh start avocado-vt-vm1 > > error: Failed to start domain avocado-vt-vm1 > > error: internal error: Unable to find any usable hugetlbfs mount for 2048 KiB > > This is expected. You don't have any hugetlbfs mounted for handling 2MB huge > pages. If you mount it: > > mount -t hugetlbfs hugetlbfs /path/to/hugepages2M -o pagesize=2M > > and restart libvirt, you should be able to start the domain. Well, as long > as you allow it to use 2MB hugepages (because the domain XML requires 16MB). > >> >> I need to clarify that - I created a new hugetlbfs path of /dev/hugepages2M during my test with "intel_iommu=on default_hugepagesz=1G hugepages=20 hugepagesz=1G hugepagesz=2M" configed. And I also did the mount command - mount -t hugtlbfs hugetlbfs /dev/hugepages2M And restarted libvirt.The domain was started failed as described in comment 7. That's the issue I meant. Do you think it's expected? > > I wonder if this is a namespace bug. I mean, libvirt is creating the path as > can be seen in the log: > > 2017-10-09 08:30:40.680+0000: 85039: debug : virFileMakePathHelper:2912 : > path=/dev/hugepages16M/libvirt/qemu/2-junli1495511 mode=0700 > 2017-10-09 08:30:40.680+0000: 85039: debug : virFileMakePathHelper:2912 : > path=/dev/hugepages16M/libvirt/qemu mode=0700 > 2017-10-09 08:30:40.680+0000: 85039: info : > virSecuritySELinuxSetFileconHelper:1156 : Setting SELinux context on > '/dev/hugepages16M/libvirt/qemu/2-junli1495511' to > 'system_u:object_r:svirt_image_t:s0:c21,c878' > 2017-10-09 08:30:40.681+0000: 85039: info : > virSecurityDACSetOwnershipInternal:556 : Setting DAC user and group on > '/dev/hugepages16M/libvirt/qemu/2-junli1495511' to '107:107' > > > However, by default, qemu runs with a limited copy of /dev. So perhaps > libvirt is not preserving the /dev/hugepages16M/ path? You can check this by > disabling mount namespace. Just set namespaces = [] in qemu.conf (don't > forget to restart libvirt). Please ignore my last comment. I tried with 16M hugepages and I reproduced the bug. I also checked it by disabling mount namespace with setting namespaces = [] in qemu.conf and restarted libvirtd. Try to start domain and still met error. # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: internal error: Unable to find any usable hugetlbfs mount for 15625 KiB With checking, there is no /sys/devices/system/node/nodeX/hugepages/hugepages-15625kb dir exist.Currently uploading log file meet some service issues. Got to know 16M is only supported in PPC. So, the failure in x86 is as expected. Please ignore comment 12 & 13. Thanks! (In reply to Michal Privoznik from comment #9) > (In reply to Jing Qi from comment #7) > > I tried more test on x86_64. > > If I appended intel_iommu=on default_hugepagesz=2M hugepagesz=2M > > hugepages=2000 to GRUB_CMDLINE_LINUX line in /etc/default/grub. Then, I did > > the configuration as described in the bug. It can work correctly. > > > > But if the configuration in /etc/default/grub was not added or > > "intel_iommu=on default_hugepagesz=1G hugepages=20 hugepagesz=1G > > hugepagesz=2M" was appended (note: it's 1G as default), error messages > > prompted - > > > > [root@intel-wildcatpass-03 ~]# virsh start avocado-vt-vm1 > > error: Failed to start domain avocado-vt-vm1 > > error: internal error: Unable to find any usable hugetlbfs mount for 2048 KiB > > This is expected. You don't have any hugetlbfs mounted for handling 2MB huge > pages. If you mount it: > > mount -t hugetlbfs hugetlbfs /path/to/hugepages2M -o pagesize=2M > > and restart libvirt, you should be able to start the domain. Well, as long > as you allow it to use 2MB hugepages (because the domain XML requires 16MB). > > > I wonder if this is a namespace bug. I mean, libvirt is creating the path as > can be seen in the log: > > 2017-10-09 08:30:40.680+0000: 85039: debug : virFileMakePathHelper:2912 : > path=/dev/hugepages16M/libvirt/qemu/2-junli1495511 mode=0700 > 2017-10-09 08:30:40.680+0000: 85039: debug : virFileMakePathHelper:2912 : > path=/dev/hugepages16M/libvirt/qemu mode=0700 > 2017-10-09 08:30:40.680+0000: 85039: info : > virSecuritySELinuxSetFileconHelper:1156 : Setting SELinux context on > '/dev/hugepages16M/libvirt/qemu/2-junli1495511' to > 'system_u:object_r:svirt_image_t:s0:c21,c878' > 2017-10-09 08:30:40.681+0000: 85039: info : > virSecurityDACSetOwnershipInternal:556 : Setting DAC user and group on > '/dev/hugepages16M/libvirt/qemu/2-junli1495511' to '107:107' > > > However, by default, qemu runs with a limited copy of /dev. So perhaps > libvirt is not preserving the /dev/hugepages16M/ path? You can check this by > disabling mount namespace. Just set namespaces = [] in qemu.conf (don't > forget to restart libvirt). Yes, the guest will start after to set the namespcases =[] in qemu.conf I've found the root cause of this problem. Here are the steps to reproduce it: # mkdir /dev/hugepages # mount -t tmpfs tmpfs /dev/hugepages/ # mkdir /dev/hugepages2M # mount -t hugetlbfs hugetlbfs /dev/hugepages2M/ # echo hugetlbfs_mount=["/dev/hugepages2M"] >> /etc/libvirt/qemu.conf # systemctl restart libvirtd # virsh start $guest where $guest is any domain with hugepages. BTW, this is not ppc specific. This bug reproduces on other arches too and with these steps I even got it to reproduce on my devel x86_64 machine. The problem is that when libvirt starts a domain, it creates a separate mount namespace for it so that it can construct private /dev there. And while it tries to preserve any mountpoint under /dev it also tries to do some heuristic. For instance, if /dev/foo/bar and /dev/foo are both mountpoints, only /dev/foo needs to be preserved. /dev/foo/bar gets preserved with it too -> no need for libvirt to touch it. And this is where the problem lies: we are doing plain prefix matching to identify such cases, which works for [/dev/foo/bar, /dev/foo] but doesn't for [/dev/hugepages2M, /dev/hugepages]. Working on the patch. Patch proposed online: https://www.redhat.com/archives/libvir-list/2017-October/msg00851.html And I've just pushed the patch upstream: commit 4f1570720218302b749dd4bad243509c0f5c45a5 Author: Michal Privoznik <mprivozn> AuthorDate: Thu Oct 19 15:23:15 2017 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Thu Oct 19 17:33:31 2017 +0200 qemu-ns: Detect /dev/* mount point duplicates better https://bugzilla.redhat.com/show_bug.cgi?id=1495511 When creating new /dev for domain ran in namespace we try to preserve all sub-mounts of /dev. Well, not quite all. For instance if /dev/foo/bar and /dev/foo are both mount points, only /dev/foo needs preserving. /dev/foo/bar is preserved with it too. Now, to identify such cases like this one STRPREFIX() is used. That is not good enough. While it works for [/dev/foo/bar; /dev/foo] case, it fails for [/dev/prefix; /dev/prefix2] where the strings share the same prefix but are in fact two different paths. The solution is to use STRSKIP(). Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Erik Skultety <eskultet> v3.8.0-200-g4f1570720 1. For x86_64: Reproduced the issue on packages: libvirt-3.2.0-14.el7_4.7.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64 Verified on packages: libvirt-3.9.0-6.el7.x86_64 qemu-kvm-rhev-2.10.0-14.el7.x86_64 Test steps: a. 2M hugepage: 1) Check there are /dev/hugepages, and mount it. # mount -t tmpfs tmpfs /dev/hugepages/ 2) Create dir:/dev/hugepages2M, and mount it. # mkdir /dev/hugepages2M # mount -t hugetlbfs hugetlbfs /dev/hugepages2M/ 3) Edit qemu.conf to include below line: hugetlbfs_mount=["/dev/hugepages2M"] 4) Restart libvirtd service. # systemctl restart libvirtd 5) Start a guest with hugepage successfully, and check qemu command line the mem-path is /dev/hugepages2M/libvirt/. # virsh start r7 Domain r7 started # virsh dumpxml r7 <domain type='kvm' id='1'> <name>r7</name> <uuid>fb80bcc5-5b7b-4cbf-951f-1463e922c218</uuid> <memory unit='KiB'>513024</memory> <currentMemory unit='KiB'>512576</currentMemory> <memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='2'/> </hugepages> <nosharepages/> <locked/> <source type='file'/> <access mode='shared'/> <allocation mode='immediate'/> </memoryBacking> <vcpu placement='static'>1</vcpu> <resource> ...... <os> <type arch='x86_64' machine='pc-i440fx-rhel7.5.0'>hvm</type> <boot dev='hd'/> </os> # ps -ef|grep r7|grep huge qemu 43125 1 0 Jan10 ? 00:00:31 /usr/libexec/qemu-kvm -name guest=r7,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-r7/master-key.aes -machine pc-i440fx-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off,mem-merge=off -cpu SandyBridge,vme=on,ss=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on,pdpe1gb=on -m 501 -mem-prealloc -mem-path /dev/hugepages2M/libvirt/qemu/1-r7 -realtime mlock=on -smp 1,sockets=1,cores=1,threads=1 ...... # virsh freepages --all Node 0: 4KiB: 1949038 2048KiB: 773 1048576KiB: 0 ... b. 1G hugepage 1) Env settings: # mkdir /dev/hugepages1G # mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages1G Edit qemu.conf: hugetlbfs_mount=["/dev/hugepages1G"] # systemctl restart libvirtd 2) Start a guest with 1G hugepage successfully. Check the qemu command line, the mem-path is mem-path=/dev/hugepages1G ... # virsh start r7 Domain r7 started # ps -ef|grep qemu| grep hugep qemu 4322 1 99 04:27 ? 00:00:29 /usr/libexec/qemu-kvm -name guest=r7,...... -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages1G/libvirt/qemu/1-r7,size=1073741824,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=1073741824,host-nodes=0,policy=bind -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 ...... # virsh dumpxml r7 <domain type='kvm' id='1'> <name>r7</name> <uuid>21a8301a-0ede-4f14-ad94-e7250be67120</uuid> <memory unit='KiB'>2097152</memory> <currentMemory unit='KiB'>2097152</currentMemory> <memoryBacking> <hugepages> <page size='1048576' unit='KiB' nodeset='0'/> </hugepages> </memoryBacking> <vcpu placement='static'>4</vcpu> <numatune> <memory mode='strict' nodeset='0'/> </numatune> ...... c. 2M + 1G hugepage 1) mount for both 2M and 1G hugepage. # mount |grep hugetlbfs hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel) hugetlbfs on /dev/hugepages1G type hugetlbfs (rw,relatime,seclabel,pagesize=1G) hugetlbfs on /dev/hugepages2M type hugetlbfs (rw,relatime,seclabel,pagesize=2M) Edit qemu.conf: hugetlbfs_mount=["/dev/hugepages2M", "/dev/hugepages1G"] # systemctl restart libvirtd # virsh freepages --all Node 0: 4KiB: 1249440 2048KiB: 2048 1048576KiB: 2 Node 1: 4KiB: 2460758 2048KiB: 0 1048576KiB: 2 2) Start guest with 2M/1G hugepage successfully. 2. For ppc64 Reproduced on package: libvirt-3.7.0-1.el7.ppc64le Ran passed on package: libvirt-3.9.0-7.el7.ppc64le Ran script: rhev.guest_numa.possitive_test.hugepage.per_node.16M.no_topo.no_numatune_memnode.no_numatune_mem According to the test results above, set the bug status to "VERIFIED". Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0704 |