1335667 – cannot scale more than 25+ VMs at a time on the same compute reliably with OSP8

Bug 1335667 - cannot scale more than 25+ VMs at a time on the same compute reliably with OSP8

Summary: cannot scale more than 25+ VMs at a time on the same compute reliably with OSP8

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-neutron
Sub Component:
Version:	8.0 (Liberty)
Hardware:	All
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	8.0 (Liberty)
Assignee:	Nir Magnezi
QA Contact:	Toni Freger
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-12 19:47 UTC by John Wu
Modified:	2019-09-10 14:07 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-05-13 01:46:44 UTC
Target Upstream Version:
Embargoed:
Flags:	nmagnezi: needinfo-

Attachments	(Terms of Use)

Description John Wu 2016-05-12 19:47:33 UTC

Description of problem:

We need the following Liberty neutron patches ported to the Red Hat’s OSP8 for scalability reason.  Without the must have patch, we cannot scale more than 25+ VMs at a time on the same compute reliably.

Must have
=========
https://review.openstack.org/#/c/293286/

Good to have
===========
https://review.openstack.org/#/c/271804/
https://review.openstack.org/#/c/280092/

We have tested the must have patch by itself and along with the good to have patches, both scenarios we were able to increase performance.


Version-Release number of selected component (if applicable):

openstack-neutron.noarch     1:7.0.1-15.el7ost
openstack-neutron-ml2.noarch 1:7.0.1-15.el7ost


How reproducible:

Using OpenStack Rally running nova_boot scenario multiple times

Steps to Reproduce:

1. Bring up a working Liberty OpenStack cloud with two computes
2. Install OpenStack Rally
3. Upload the cirros image
4. Run the "./rally/samples/tasks/scenarios/nova/boot.json" scenario with the following modification
{
   "NovaServers.boot_server": [
        {
            "args": {
                "flavor": {
                    "name": "ram64"
                },
                "image": {
                    "name": "^cirros-0.3.4-x86_64.*"
                },
                "auto_assign_nic" : true    <=== enable networking
            },
            "runner": {
                "type": "constant",
                "times": 50,         <=== increased from 10 to 50
                "concurrency": 5     <=== increased from 2 to 5
            },
            "context": {
                "users": {
                    "tenants": 1,
                    "users_per_tenant": 1
                },
                "network": {
                    "start_cidr": "10.0.0.0/16"
                },
                "flavors": [
                    {
                        "name": "ram64",
                        "ram": 64
                    }
                ],
                "quotas" : {
                    "nova": {
                         "instances" : -1,
                         "cores" : -1,
                         "ram" : -1,
                         "metadata_items" : -1
                    },
                    "neutron" : {
                        "network" : -1,
                        "subnet" : -1,
                        "port" : -1,
                        "router" : -1,
                        "floatingip" : -1,
                        "security_group" : -1,
                        "security_group_rule": -1
                    }
                }
            }
        }
   ]
}

5. First run will be fast, nova boot time is around ~15-20 seconds
6. Repeat the same test few more times, each time the nova boot time will start to degrade to over time
7. Eventually the boot time will be over 60 seconds which is the vif_plugging_timeout value
 

Actual results:

neutron-server.log
===================
<..snip..>
2016-05-10 19:21:32.736 38 ERROR neutron.notifiers.nova [-] Failed to notify nova on events: [{'status': 'completed', 'tag': u'1698baa3-9fe0-4539-ba48-1eb4f5683b8d', 'name': 'network-vif-plugged', 'server_uuid': u'e3f13bb1-da01-4337-abf6-ad672461de2a'}]
<..snip..>

VM timeout waiting for network and metadata service


Expected results:

Should not observed the above error message and VM should be able to boot successfully with DHCP address.


Additional info:

Comment 2 Assaf Muller 2016-05-12 20:12:45 UTC

@Nir, all three requested patches are already merged in stable/liberty, can you say when is the next OSP 8 release that will include those patches?

Comment 3 Chris Ricker 2016-05-13 01:46:21 UTC

These are all already in the shipped OSP8 update: openstack-neutron-7.0.4-2.el7ost.src.rpm

Sorry for the noise.

Note You need to log in before you can comment on or make changes to this bug.