Bug 1300680
Summary: | OVS-DPDK failed to boot more than 1 instance on OVS-DPDK setup | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eran Kuris <ekuris> | ||||
Component: | openstack-nova | Assignee: | Sahid Ferdjaoui <sferdjao> | ||||
Status: | CLOSED ERRATA | QA Contact: | Prasanth Anbalagan <panbalag> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 8.0 (Liberty) | CC: | amuller, berrange, chrisw, dasmith, dshaks, edannon, editucci, eglynn, ekuris, fpan, jdonohue, jean-mickael.guerin, joycej, jschluet, kchamart, mlopes, nyechiel, pcm, samuel.gauthier, sbauza, sferdjao, sgordon, srevivo, twilson, vincent.jardin, vromanso | ||||
Target Milestone: | --- | Keywords: | TechPreview, ZStream | ||||
Target Release: | 8.0 (Liberty) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | hot | ||||||
Fixed In Version: | openstack-nova-12.0.4-5.el7ost | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-08-31 17:36:13 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1194008, 1295530, 1300693 | ||||||
Attachments: |
|
Description
Eran Kuris
2016-01-21 12:50:16 UTC
"os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM", seems like a pretty straight forward error message. Can you paste 'cat /proc/meminfo | grep HugePages'? [root@puma48 ~]# cat /proc/meminfo | grep HugePages AnonHugePages: 376832 kB HugePages_Total: 8 HugePages_Free: 5 HugePages_Rsvd: 0 HugePages_Surp: 0 If you are trying to boot 4 instances, each with 2GB of ram, and (at least) 1GB reserved for OVS, that would definitely fail if you only have 8 1GB hugepages to begin with. I would expect more than 1 to work, though. I logged into the test VM and can verify that it it fails to boot the second VM, whether booted separately or via --num-instances=2. Since this is a memory-related issue, it isn't a neutron-related and I'm not sure what the issue is. Moving to openstack-nova. Terry with regular flavor which is not use hugepages I can created few Vm's . When I used "dpdk-flavor" I could not create more than 1 VM. I try that when I boot it separately and via --num-instances=2. In order to boot additional VM's that are each 2Gb, the blue print will need to configure more and just 8GB hugepages on the HOST. The output show that 5 GB of hugepages were in use leaving only 3 GB of hugepages left. # numastat -c qemu will dynamically show the number of hugepages in use by KVM. I am not sure it exact, first of all each VM is 1Gb and I tried to extend the memory usage and I see same issue . (In reply to Eran Kuris from comment #5) > Terry with regular flavor which is not use hugepages I can created few Vm's . > When I used "dpdk-flavor" I could not create more than 1 VM. I try that when > I boot it separately and via --num-instances=2. What you can do with regular VMs isn't really directly relevant here, as when you aren't using huge pages for the VM overcommit is allowed and the default ratio is 16:1 so to the nova-scheduler your 8 Gb machine actually looks like 128 Gb in that case. (In reply to Eran Kuris from comment #7) > I am not sure it exact, first of all each VM is 1Gb and I tried to extend > the memory usage and I see same issue . This is very confusing, you say each VM is 1 Gb but the VMs in your reproducer steps are 2 Gb w/ 2 vCPUs. That is what Terry, Shak, and I are referring to and as Shak said in comment # 6 there are only 3 x 1 Gb pages free based on the output you provided in comment # 2. That is not enough to boot the 2 x 2 Gb huge page backed VMs requested. If you want to try and re-produce again it would be good to have the /proc/meminfo content before and after each boot request. Stephen: When I logged into his machine, it showed 5 1GB hugepages free out of 8. 1GB was used by OVS and 2GB was used by the first VM. Booting a *single* VM (which should be 2GB of mem) failed despite 5GB being available. (a single extra VM, bringing the total to 2 that is) Eoghan , I would like to set a session so we can debug the setup and process this task. :-) - I can confirm that flavor is 1GB and not 2BG. [root@puma53 ~(keystone_admin)]# nova flavor-show m1.medium_dpdk +----------------------------+--------------------------------------+ | Property | Value | +----------------------------+--------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | disk | 15 | | extra_specs | {"hw:mem_page_size": "large"} | | id | 2dde415a-267c-4fdc-a244-08f9a634ade5 | | name | m1.medium_dpdk | | os-flavor-access:is_public | True | | ram | 1024 | | rxtx_factor | 1.0 | | swap | | | vcpus | 2 | +----------------------------+--------------------------------------+ - We have bug in our current implementation of hugepages in Nova it seems to do not take into account NUMA node where to backing memory for guests and always use NUMA node 0. This compute host provides 8 hugepages shared between 2 NUMA node. On NUMA node 0, 3 pages are available (1 is already used by OVS) and on NUMA node 1, 4 pages are available. [root@puma48 ~]# virsh freepages --all Node 0: 4KiB: 6907982 1048576KiB: 3 Node 1: 4KiB: 7054079 1048576KiB: 4 Unfortunately because of that bug we can only boot 3 guests configured with 'm1.medium_dpdk', they all will have memory backed to node 0. (It's easy to confirm that by looking to the XML) [root@puma48 ~]# virsh dumpxml 7 | grep page <hugepages> <page size='1048576' unit='KiB' nodeset='0'/> </hugepages> [root@puma53 ~(keystone_admin)]# nova list +--------------------------------------+------+--------+------------+-------------+--------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------+--------+------------+-------------+--------------------+ | 9b6f268f-a373-41ee-9dc0-040b2dd2b401 | i1 | ACTIVE | - | Running | net1=192.168.99.57 | | db2fbacd-4521-4620-8fa0-97232fe464ba | i2 | ACTIVE | - | Running | net1=192.168.99.58 | | 7438518b-1e91-4151-a302-ca8e5d150847 | i3 | ACTIVE | - | Running | net1=192.168.99.59 | | 7f64fa95-39f2-46be-affd-b0af01b7af8d | i4 | ERROR | - | NOSTATE | net1=192.168.99.60 | +--------------------------------------+------+--------+------------+-------------+--------------------+ [root@puma48 ~]# virsh freepages --all Node 0: 4KiB: 6885635 1048576KiB: 0 Node 1: 4KiB: 7055161 1048576KiB: 4 (In reply to Sahid Ferdjaoui from comment #13) > [root@puma48 ~]# virsh dumpxml 7 | grep page > <hugepages> > <page size='1048576' unit='KiB' nodeset='0'/> > </hugepages> > I made mistake attribute 'nodeset' here is to indicate on which guest nodes pages are backed. We should to configure element memnode of nuamtune. Probably to the union of host NUMA nodes when guest does not have specific NUMA requirement. <numatune> <memory mode='strict' nodeset='0'/> <memnode cellid='0' mode='strict' nodeset='0-1'/> </numatune> (In reply to Sahid Ferdjaoui from comment #14) > (In reply to Sahid Ferdjaoui from comment #13) > > [root@puma48 ~]# virsh dumpxml 7 | grep page > > <hugepages> > > <page size='1048576' unit='KiB' nodeset='0'/> > > </hugepages> > > > > I made mistake attribute 'nodeset' here is to indicate on which guest nodes > pages are backed. > > We should to configure element memnode of nuamtune. Probably to the union of > host NUMA nodes when guest does not have specific NUMA requirement. > > <numatune> > <memory mode='strict' nodeset='0'/> > <memnode cellid='0' mode='strict' nodeset='0-1'/> > </numatune> Ok I made wrong assumption the code is well handling numa node placement, but computing of available resources is wrong. Current code computing available page size by getting total allocated minus instances which use pages. So it does not take into account that 1 page used by OVS and continue to think it can fit instance on numa node 0. I provided an upstream patch [1] to fix this issue This changes provides new option 'reserved_memory_pages' which will be used to reserve from Nova point of view an amount of pages for third part component. In our use case it will be 1 page for OVS. So this will fix how nova compute free pages on host NUMA node. [1] https://review.openstack.org/277422 I tried to enter the fix manually but I didnt find the files on my setup. Can you explain how can I verify the fix ? (In reply to Eran Kuris from comment #17) > I tried to enter the fix manually but I didnt find the files on my setup. > Can you explain how can I verify the fix ? I had a review from Daniel Berrangé who asks me to change something related to the option, let me update this upstream then when I got his ACK I will backport it for OSP8 and provide to you test packages. We can expect to have this test packages for tomorrow or the day after, sounds good for you? yes it sounds ok (In reply to Eran Kuris from comment #19) > yes it sounds ok You can find a scratch-build [1], please restart openstack-service after to have installed them, you will have to configure compute-nodes and services with this new option: reserved_memory_pages = ["0:1G:1"] Please let me know any feedback, s [1] https://brewweb.devel.redhat.com/taskinfo?taskID=10470071 Sahid , I would like to get more info how to install those packages so I can test this fix. When I run yum install of those packages I got error : Error: Package: 1:openstack-nova-scheduler-12.0.1-2bz1300680v1.el7ost.noarch (/openstack-nova-scheduler-12.0.1-2bz1300680v1.el7ost.noarch) Requires: openstack-nova-common = 1:12.0.1-2bz1300680v1.el7ost Installed: 1:openstack-nova-common-12.0.1-2.el7ost.noarch (@rhelosp-8.0-puddle) openstack-nova-common = 1:12.0.1-2.el7ost Error: Package: 1:openstack-nova-objectstore-12.0.1-2bz1300680v1.el7ost.noarch (/openstack-nova-objectstore-12.0.1-2bz1300680v1.el7ost.noarch) Requires: openstack-nova-common = 1:12.0.1-2bz1300680v1.el7ost Installed: 1:openstack-nova-common-12.0.1-2.el7ost.noarch (@rhelosp-8.0-puddle) openstack-nova-common = 1:12.0.1-2.el7ost You could try using --skip-broken to work around the problem ** Found 1 pre-existing rpmdb problem(s), 'yum check' output follows: openvswitch-dpdk-2.4.0-0.10346.git97bab959.2.el7.x86_64 has installed conflicts openvswitch: openvswitch-dpdk-2.4.0-0.10346.git97bab959.2.el7.x86_64 (In reply to Eran Kuris from comment #21) > Sahid , I would like to get more info how to install those packages so I can > test this fix. > When I run yum install of those packages I got error : > Error: Package: 1:openstack-nova-scheduler-12.0.1-2bz1300680v1.el7ost.noarch > (/openstack-nova-scheduler-12.0.1-2bz1300680v1.el7ost.noarch) > Requires: openstack-nova-common = 1:12.0.1-2bz1300680v1.el7ost > Installed: 1:openstack-nova-common-12.0.1-2.el7ost.noarch > (@rhelosp-8.0-puddle) > openstack-nova-common = 1:12.0.1-2.el7ost > Error: Package: > 1:openstack-nova-objectstore-12.0.1-2bz1300680v1.el7ost.noarch > (/openstack-nova-objectstore-12.0.1-2bz1300680v1.el7ost.noarch) > Requires: openstack-nova-common = 1:12.0.1-2bz1300680v1.el7ost > Installed: 1:openstack-nova-common-12.0.1-2.el7ost.noarch > (@rhelosp-8.0-puddle) > openstack-nova-common = 1:12.0.1-2.el7ost > You could try using --skip-broken to work around the problem > ** Found 1 pre-existing rpmdb problem(s), 'yum check' output follows: > openvswitch-dpdk-2.4.0-0.10346.git97bab959.2.el7.x86_64 has installed > conflicts openvswitch: > openvswitch-dpdk-2.4.0-0.10346.git97bab959.2.el7.x86_64 It's because of version number of the packages which was not incrementally updated. You can use: rpm -ivh --force *.rpm fix is resolving your issue. tested on vlan environment The current fix has been reverted upstream, rather than carry a forked patch that may be incompatible with the ultimate upstream solution we must also revert the fix from RHOSP 8. When the final upstream fix is available we will re-assess backportability. Sahid please process the revert under this BZ. We will need to create a clone for the long term resolution. New packages with reverted option to reserve mem pages in compute node. openstack-nova-12.0.2-3.el7ost dropping from advisory (In reply to Stephen Gordon from comment #27) > The current fix has been reverted upstream, rather than carry a forked patch > that may be incompatible with the ultimate upstream solution we must also > revert the fix from RHOSP 8. When the final upstream fix is available we > will re-assess backportability. The changes have been reverted but this bug is still open and dropped from advisory. Can we push it out to z-stream candidate since upstream hasn't reached a solution yet? *** Bug 1348732 has been marked as a duplicate of this bug. *** There has been no meaningful activity on this bug in almost 3 months. Can someone provide on update on when this is targeted to be fixed? (In reply to Sahid Ferdjaoui from comment #4 of bug 1300680) > One possible solution (the easy one) is to have that option > reserved_huge_pages set for all services, so the scheduler will know about > the number of pages reserved. But the problem here is that all compute nodes > are going to share the same number of pages reserved. > > An other solution (probably better) would be to have a fix libvirt driver > specific. So that option reserved_huge_pages will be read when the driver is > computing available resources, the number of pages available will be stored > subtracted by the number of pages reserved. > > I'm closing this one as duplicated since we do not want to track 2 BZ for > the same problem. > > *** This bug has been marked as a duplicate of bug 1300680 *** Private build provided to Cisco for testing, waiting for feedback. Tested the private build and was able to reserve 256 huge pages for two NUMA nodes using 2048kB page size. Worked just fine for our application needs. Please advise on how we move forward so that we can get this integrated into the Nova RPMs. Thanks! Paul provided the update on the cisco side - but I guess I also need to reply so the bug system knows it doens't need info from me anymore. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1794.html Note: the google doc guide mentioned in the description has now been published here: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html-single/network_functions_virtualization_configuration_guide/ |