Bug 1310745 - During scale test raised error Failed to allocate the network(s), not rescheduling which prevent launch instances
Summary: During scale test raised error Failed to allocate the network(s), not resched...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 7.0 (Kilo)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 7.0 (Kilo)
Assignee: anil venkata
QA Contact: Toni Freger
URL:
Whiteboard:
Depends On:
Blocks: 1393437
TreeView+ depends on / blocked
 
Reported: 2016-02-22 15:25 UTC by Yuri Obshansky
Modified: 2019-09-09 14:58 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1393437 (view as bug list)
Environment:
Last Closed: 2018-06-26 07:07:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nova-compute.log file (4.11 MB, text/plain)
2016-02-22 15:25 UTC, Yuri Obshansky
no flags Details
ospd node neutron-server log file (2.52 MB, application/x-gzip)
2016-02-22 15:28 UTC, Yuri Obshansky
no flags Details
ospd node neutron-openswitch log file (1.51 MB, application/x-gzip)
2016-02-22 15:28 UTC, Yuri Obshansky
no flags Details
openvswitch-agent.log file from overcloud-compute-2 (9.64 MB, application/x-gzip)
2016-06-16 11:47 UTC, Yuri Obshansky
no flags Details
neutron server log contr-0 after disable firewall driver (14.22 MB, application/x-gzip)
2016-06-21 12:31 UTC, Yuri Obshansky
no flags Details
neutron server log contr-1 after disable firewall driver (14.25 MB, application/x-gzip)
2016-06-21 12:32 UTC, Yuri Obshansky
no flags Details
neutron server log contr-2 after disable firewall driver (215.79 KB, application/x-gzip)
2016-06-21 12:33 UTC, Yuri Obshansky
no flags Details
openvswitch-agent.log compute 2 - part aa (16.00 MB, application/x-gzip)
2016-06-21 12:36 UTC, Yuri Obshansky
no flags Details
openvswitch-agent.log compute 2 - part ab (16.00 MB, application/octet-stream)
2016-06-21 12:38 UTC, Yuri Obshansky
no flags Details
openvswitch-agent.log compute 2 - part ac (16.00 MB, application/octet-stream)
2016-06-21 12:40 UTC, Yuri Obshansky
no flags Details
openvswitch-agent.log compute 2 - part ad (12.01 MB, application/octet-stream)
2016-06-21 12:42 UTC, Yuri Obshansky
no flags Details
error (7.69 KB, text/plain)
2017-05-03 20:21 UTC, Yuri Obshansky
no flags Details
openvswitch-agent.log and ovs-vswitchd.log (13.68 KB, text/plain)
2017-05-04 14:08 UTC, Yuri Obshansky
no flags Details
sosreport controller part aa (15.00 MB, application/x-xz)
2017-05-04 16:35 UTC, Yuri Obshansky
no flags Details
sosreport controller part ab (15.00 MB, application/octet-stream)
2017-05-04 16:36 UTC, Yuri Obshansky
no flags Details
sosreport controller part ac (12.84 MB, application/octet-stream)
2017-05-04 16:37 UTC, Yuri Obshansky
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 337064 0 None MERGED Avoid duplicate ipset processing for security groups 2021-02-12 06:28:37 UTC
OpenStack gerrit 351773 0 None ABANDONED Avoid apply firewall if remote sg members update 2021-02-12 06:28:38 UTC

Description Yuri Obshansky 2016-02-22 15:25:22 UTC
Created attachment 1129364 [details]
nova-compute.log file

Description of problem:
During scale test launched 3217 instances/vms and all other failed to be launched with Error: "Build of instance 3899eea1-1367-4932-9207-0474fbaaafd5 aborted: Failed to allocate the network(s), not rescheduling"

In nova-compute.log found exceptions:
2016-02-18 10:38:12.474 17443 WARNING nova.virt.libvirt.driver
[req-03e59ac6-649c-4571-951b-ef496440a4e9
ea29bb495c1f43f89e8dc4f812f8f180 178100f38c934c29824db1a6ee40d7e2 - - -]
Timeout waiting for vif plugging callback for instance
9e39ceb1-1177-4267-b7ed-299d116927fc
2016-02-18 10:38:15.919 17443 ERROR nova.compute.manager
[req-03e59ac6-649c-4571-951b-ef496440a4e9
ea29bb495c1f43f89e8dc4f812f8f180 178100f38c934c29824db1a6ee40d7e2 - - -]
[instance: 9e39ceb1-1177-4267-b7ed-299d116927fc] Instance failed to spawn
2016-02-18 10:38:15.919 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc] Traceback (most recent call last):
2016-02-18 10:38:15.919 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]   File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2486,
in _build_resources
2016-02-18 10:38:15.919 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]     yield resources
2016-02-18 10:38:15.919 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]   File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2358,
in _build_and_run_instance
2016-02-18 10:38:15.919 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]    
block_device_info=block_device_info)
2016-02-18 10:38:15.919 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line
2492, in spawn
2016-02-18 10:38:15.919 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]    
block_device_info=block_device_info)
2016-02-18 10:38:15.919 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line
4592, in _create_domain_and_network
2016-02-18 10:38:15.919 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]     raise
exception.VirtualInterfaceCreateException()
2016-02-18 10:38:15.919 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc] VirtualInterfaceCreateException:
Virtual Interface creation failed
2016-02-18 10:38:15.919 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]
2016-02-18 10:38:16.873 17443 ERROR nova.compute.manager
[req-03e59ac6-649c-4571-951b-ef496440a4e9
ea29bb495c1f43f89e8dc4f812f8f180 178100f38c934c29824db1a6ee40d7e2 - - -]
[instance: 9e39ceb1-1177-4267-b7ed-299d116927fc] Failed to allocate
network(s)
2016-02-18 10:38:16.873 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc] Traceback (most recent call last):
2016-02-18 10:38:16.873 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]   File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2358,
in _build_and_run_instance
2016-02-18 10:38:16.873 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]    
block_device_info=block_device_info)
2016-02-18 10:38:16.873 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line
2492, in spawn
2016-02-18 10:38:16.873 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]    
block_device_info=block_device_info)
2016-02-18 10:38:16.873 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line
4592, in _create_domain_and_network
2016-02-18 10:38:16.873 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]     raise
exception.VirtualInterfaceCreateException()
2016-02-18 10:38:16.873 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc] VirtualInterfaceCreateException:
Virtual Interface creation failed
2016-02-18 10:38:16.873 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]
2016-02-18 10:38:16.876 17443 ERROR nova.compute.manager
[req-03e59ac6-649c-4571-951b-ef496440a4e9
ea29bb495c1f43f89e8dc4f812f8f180 178100f38c934c29824db1a6ee40d7e2 - - -]
[instance: 9e39ceb1-1177-4267-b7ed-299d116927fc] Build of instance
9e39ceb1-1177-4267-b7ed-299d116927fc aborted: Failed to allocate the
network(s), not rescheduling.
2016-02-18 10:38:16.876 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc] Traceback (most recent call last):
2016-02-18 10:38:16.876 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]   File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2261,
in _do_build_and_run_instance
2016-02-18 10:38:16.876 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]     filter_properties)
2016-02-18 10:38:16.876 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]   File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2393,
in _build_and_run_instance
2016-02-18 10:38:16.876 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]     reason=msg)
2016-02-18 10:38:16.876 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc] BuildAbortException: Build of
instance 9e39ceb1-1177-4267-b7ed-299d116927fc aborted: Failed to
allocate the network(s), not rescheduling.
2016-02-18 10:38:16.876 17443 TRACE nova.compute.manager [instance:
9e39ceb1-1177-4267-b7ed-299d116927fc]
2016-02-18 10:38:17.536 17443 ERROR nova.network.neutronv2.api
[req-03e59ac6-649c-4571-951b-ef496440a4e9
ea29bb495c1f43f89e8dc4f812f8f180 178100f38c934c29824db1a6ee40d7e2 - - -]
Unable to clear device ID for port 'None'
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api Traceback
(most recent call last):
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api   File
"/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line
366, in _unbind_ports
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api    
port_client.update_port(port_id, port_req_body)
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api   File
"/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line
102, in with_params
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api     ret =
self.function(instance, *args, **kwargs)
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api   File
"/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line
549, in update_port
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api    
return self.put(self.port_path % (port), body=body)
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api   File
"/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line
302, in put
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api    
headers=headers, params=params)
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api   File
"/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line
270, in retry_request
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api    
headers=headers, params=params)
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api   File
"/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line
211, in do_request
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api    
self._handle_fault_response(status_code, replybody)
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api   File
"/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line
185, in _handle_fault_response
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api    
exception_handler_v20(status_code, des_error_body)
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api   File
"/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line
83, in exception_handler_v20
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api    
message=message)
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api
NeutronClientException: 404 Not Found
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api
2016-02-18 10:38:17.536 17443 TRACE nova.network.neutronv2.api The
resource could not be found.

OpenStack environment is: 
3 controller nodes and 6 compute nodes, HA.
where each is 24 CPUs and 64 G RAM
Instances/vms were created using:
- image: cirros-0.3.2-sc (12.6 MB)
- flavor: m1.nano (1VCPU, 64MB RAM, 1GB Disk)

Version-Release number of selected component (if applicable):
rhos-release 7 -p 2015-12-16.1
# rpm -qa | grep nova
openstack-nova-console-2015.1.2-7.el7ost.noarch
openstack-nova-novncproxy-2015.1.2-7.el7ost.noarch
python-nova-2015.1.2-7.el7ost.noarch
openstack-nova-api-2015.1.2-7.el7ost.noarch
openstack-nova-common-2015.1.2-7.el7ost.noarch
python-novaclient-2.23.0-2.el7ost.noarch
openstack-nova-compute-2015.1.2-7.el7ost.noarch
openstack-nova-conductor-2015.1.2-7.el7ost.noarch
openstack-nova-scheduler-2015.1.2-7.el7ost.noarch
openstack-nova-cert-2015.1.2-7.el7ost.noarch

How reproducible:
Run scale test which creates 50 tenants, 50 network per tenant 
and as more as possible instances/vm

Steps to Reproduce:
1.
2.
3.

Actual results:
Failed create more than 3217 instances
Raise exception in nova-compute.log

Expected results:


Additional info:

Comment 2 Yuri Obshansky 2016-02-22 15:27:10 UTC
See test results
https://mojo.redhat.com/docs/DOC-1067371

Comment 3 Yuri Obshansky 2016-02-22 15:28:04 UTC
Created attachment 1129365 [details]
ospd node neutron-server log file

Comment 4 Yuri Obshansky 2016-02-22 15:28:46 UTC
Created attachment 1129366 [details]
ospd node neutron-openswitch log file

Comment 5 Dan Smith 2016-02-24 19:59:10 UTC
The problem is that neutron has not called to nova to deliver the vif-plugged event, which causes nova to time out. I don't see any errors in the neutron log that indicate that something failed which prevented this call, but after nova fails the request, it isn't able to talk to the neutron API to deallocate the resources either.

Suggest we point some neutron people at this to take a look.

Comment 7 Yuri Obshansky 2016-03-07 07:29:20 UTC
Bug reproduced (OSP 7.3) on rhos-release 7 -p 2016-02-24.1

Comment 8 Yuri Obshansky 2016-03-07 07:30:27 UTC
See test results on 
https://mojo.redhat.com/docs/DOC-1070135

Comment 9 Assaf Muller 2016-06-07 23:23:09 UTC
Would it be possible to gain access to a setup as the issue reproduces?

Comment 10 Yuri Obshansky 2016-06-08 10:57:44 UTC
I don't have a setup of OPS7 right now.
Bug was reported before 4-5 months.
I have a setup of OSP 8 -p 2016-05-31.1
and I'm going to rerun the same scale test soon.
If you would like to take a look at the issue on OSP 8 let me know, 
Because I applied a workaround into the setup
"Workaround for an issue "Failed to allocate the network(s), not rescheduling"
Need update nova settings on compute nodes - vif_plugging_is_fatal=False in files /etc/nova/nova.conf instead of vif_plugging_is_fatal=True."
I'll update you when the test will be start

Comment 11 Yuri Obshansky 2016-06-14 07:05:12 UTC
The issue reproduced on OSP 8 -p 2016-05-31.1
Tonight I executed the same scale test and created only 3661 instances
All other created with Status Error.
I provided to Assaf setup details and access.

Comment 12 Yuri Obshansky 2016-06-16 11:47:44 UTC
Created attachment 1168705 [details]
openvswitch-agent.log file from overcloud-compute-2

Comment 13 anil venkata 2016-06-16 13:40:24 UTC
Here, nova is able to create port for vm and able to bind to compute node(checked through neutron server log).
Then it is plugging the vif(confirmed through ovsdb log message in ovs agent log) and waiting for 5 minutes for the network-vif-plugged from neutron server. 
(When timeout happend, it is deleting vif, checked through ovsdb log message in ovs agent)

When nova plugged vif, ovs agent received information about new port through ovsdb, but it was unable to process this port as it was busy in rpc_loop handling existing ports(>600 ports in compute node). While processing these ports, agent called update_device_list RPC method, but failed with RPC timeout error.  I see response comping for this RPC after 2 minutes(Default RPC timeout is 1 minute). As there is timeout, agent is again trying to process these 600 ports.
As ovs agent is busy processing these ports all time, it is ignoring wiring up of new port added by nova and hence nova is not receiving network-vif-plugged notification, and failing building of instance with "Timeout waiting for vif plugging callback for instance".
 
One of the solution can be increasing the RPC timeout. But there is a u/s patch https://review.openstack.org/#/c/314319/ which adds an exponential backoff mechanism for timeout values on RPC call. Yuri will be testing with this patch on his test setup with OSP 8 -p 2016-05-31.1

Comment 14 Yuri Obshansky 2016-06-21 12:31:34 UTC
Created attachment 1170248 [details]
neutron server log contr-0 after disable firewall driver

Comment 15 Yuri Obshansky 2016-06-21 12:32:59 UTC
Created attachment 1170249 [details]
neutron server log contr-1 after disable firewall driver

Comment 16 Yuri Obshansky 2016-06-21 12:33:42 UTC
Created attachment 1170251 [details]
neutron server log contr-2 after disable firewall driver

Comment 17 Yuri Obshansky 2016-06-21 12:36:17 UTC
Created attachment 1170254 [details]
openvswitch-agent.log compute 2 - part aa

Comment 18 Yuri Obshansky 2016-06-21 12:38:25 UTC
Created attachment 1170255 [details]
openvswitch-agent.log compute 2 - part ab

Comment 19 Yuri Obshansky 2016-06-21 12:40:02 UTC
Created attachment 1170257 [details]
openvswitch-agent.log compute 2 - part ac

Comment 20 Yuri Obshansky 2016-06-21 12:42:09 UTC
Created attachment 1170258 [details]
openvswitch-agent.log compute 2 - part ad

Comment 21 anil venkata 2016-07-26 14:53:57 UTC
Need to try with this change https://review.openstack.org/#/c/337064/

Comment 22 anil venkata 2016-08-29 13:56:53 UTC
@Yuri

Any update on getting setup to try the fix https://review.openstack.org/#/c/337064/

Thanks
Anil

Comment 23 Yuri Obshansky 2016-08-30 05:41:35 UTC
Hi Anil, 

I still don't have a hardware for bug reproducing.
But, hopefully, I'll got the servers on next week.
I'll update you with status.

Thanks,
Yuri

Comment 24 anil venkata 2016-11-21 15:33:46 UTC
These changes improve the performance of ovs agent

[1] https://review.openstack.org/#/c/337064/
[2] https://review.openstack.org/#/c/352440/
[3] https://review.openstack.org/#/c/351773/

Change [1] is already merged in u/s and is also part of OSP9. Changes [2] and [3] are in waiting for review in u/s. These changes avoid unnecessary duplicate ipset calls, making ovs agent run faster when many ports are added.

Comment 26 Yuri Obshansky 2016-11-23 14:11:57 UTC
Hi Toni, 

Not yet, I'll reproduce it at once I have a working setup.
I've opened a ticket at eng-ops 
https://engineering.redhat.com/rt/Ticket/Display.html?id=427891
Waiting for the fix

Yuri

Comment 28 anil venkata 2017-02-09 10:14:47 UTC
Thanks, Can you please provide me the access details to investigate further?

Comment 30 Yuri Obshansky 2017-05-01 02:01:22 UTC
Hi, Anil
Finally, I've got a hardware for reproducing the bug.
Now, it's reproduced on rhos 8 
[root@overcloud-compute-0 nova]#  rpm -qa | grep nova
openstack-nova-novncproxy-12.0.5-9.el7ost.noarch
openstack-nova-compute-12.0.5-9.el7ost.noarch
openstack-nova-api-12.0.5-9.el7ost.noarch
python-novaclient-3.1.0-2.el7ost.noarch
python-nova-12.0.5-9.el7ost.noarch
openstack-nova-conductor-12.0.5-9.el7ost.noarch
openstack-nova-common-12.0.5-9.el7ost.noarch
openstack-nova-scheduler-12.0.5-9.el7ost.noarch
openstack-nova-console-12.0.5-9.el7ost.noarch
openstack-nova-cert-12.0.5-9.el7ost.noarch
with the same error:
2017-05-01 01:56:40.239 33560 ERROR nova.compute.manager [instance: ef090168-e140-45ba-894b-7bad74172217] BuildAbortException: Build of instance ef090168-e140-45ba-894b-7bad74172217 aborted: Failed to allocate the network(s), not rescheduling.
Now, how we can apply your fix and rerun the test?
Yuri

Comment 31 Yuri Obshansky 2017-05-01 20:26:03 UTC
Forget add neutron versions
[stack@c02-h02-r620 ~]$ rpm -qa | grep neutron
openstack-neutron-7.2.0-5.el7ost.noarch
openstack-neutron-common-7.2.0-5.el7ost.noarch
openstack-neutron-ml2-7.2.0-5.el7ost.noarch
openstack-neutron-openvswitch-7.2.0-5.el7ost.noarch
python-neutron-7.2.0-5.el7ost.noarch
python-neutronclient-3.1.0-3.el7ost.noarch

Comment 32 Yuri Obshansky 2017-05-03 20:20:36 UTC
Hi Anil,
Unfortunately, the bug reproduced on rhos-release 9 -p 2017-04-07.5 also
[root@c02-h02-r620 ~]# rpm -qa | grep neutron
python-neutron-lib-0.0.2-1.el7ost.noarch
openstack-neutron-8.3.0-5.el7ost.noarch
python-neutronclient-4.1.1-2.el7ost.noarch
python-neutron-vpnaas-tests-8.0.0-1.el7ost.noarch
python-neutron-tests-8.3.0-5.el7ost.noarch
python-neutron-8.3.0-5.el7ost.noarch
python-neutron-fwaas-tests-8.0.0-3.el7ost.noarch
openstack-neutron-ml2-8.3.0-5.el7ost.noarch
python-neutron-lbaas-tests-8.1.0-2.el7ost.noarch
python-neutron-vpnaas-8.0.0-1.el7ost.noarch
openstack-neutron-openvswitch-8.3.0-5.el7ost.noarch
python-neutron-lbaas-8.1.0-2.el7ost.noarch
openstack-neutron-common-8.3.0-5.el7ost.noarch
python-neutron-fwaas-8.0.0-3.el7ost.noarch

See attached txt file with error.
I didn't clean stack, waiting when you'll take a look.

Yuri

Comment 33 Yuri Obshansky 2017-05-03 20:21:35 UTC
Created attachment 1275991 [details]
error

Comment 34 Sahid Ferdjaoui 2017-05-04 11:59:51 UTC
(In reply to Yuri Obshansky from comment #33)
> Created attachment 1275991 [details]
> error

Can you provide the logs of neutron and ovs agent related to that error ?

Also can you ensure that "/etc/neutron/neutron.conf" is well configured with:

 notify_nova_on_port_status_changes=True
 notify_nova_on_port_data_changes=True
 nova_url=http://<IP>/v2.1

Thanks

Comment 35 Yuri Obshansky 2017-05-04 14:07:07 UTC
Hi, 
I checked /etc/neutron/neutron.conf on compute node
...
# Send notification to nova when port status changes (boolean value)
#notify_nova_on_port_status_changes = true

# Send notification to nova when port data (fixed_ips/floatingip) changes so
# nova can update its cache. (boolean value)
#notify_nova_on_port_data_changes = true
....

There is not  nova_url=http://<IP>/v2.1 setting in it.

Attached part of openvswitch-agent.log and ovs-vswitchd.log
in the same time frame.

Thanks

Comment 36 Yuri Obshansky 2017-05-04 14:08:01 UTC
Created attachment 1276361 [details]
openvswitch-agent.log and ovs-vswitchd.log

Comment 37 Sahid Ferdjaoui 2017-05-04 14:36:45 UTC
(In reply to Yuri Obshansky from comment #35)
> Hi, 
> I checked /etc/neutron/neutron.conf on compute node
> ...
> # Send notification to nova when port status changes (boolean value)
> #notify_nova_on_port_status_changes = true
> 
> # Send notification to nova when port data (fixed_ips/floatingip) changes so
> # nova can update its cache. (boolean value)
> #notify_nova_on_port_data_changes = true
> ....
> 
> There is not  nova_url=http://<IP>/v2.1 setting in it.

Ok, can you check with what is indicated here [0] and let us to know the result?

[0] https://access.redhat.com/solutions/1212643

Comment 38 Yuri Obshansky 2017-05-04 14:49:03 UTC
Hi, 
I cannot see solution -> Subscriber exclusive content !!!
Ping me in IRC - yobshans on neutron/rhos-dev/rhos-qe channels
Thanks

Comment 39 Yuri Obshansky 2017-05-04 16:21:26 UTC
Hi, 
We double check the configuration on Controller node
[root@overcloud-controller-0 ~]# egrep '(^nova_|^notify_nova)' /etc/neutron/neutron.conf
notify_nova_on_port_status_changes=True
notify_nova_on_port_data_changes=True
nova_url=http://172.16.0.10:8774/v2.1/%(tenant_id)s 
It looks ok.
I'll generate sosreport for for controller and compute nodes.
Thanks

Comment 40 Yuri Obshansky 2017-05-04 16:35:04 UTC
Created attachment 1276394 [details]
sosreport controller part aa

Original file is large, about 44 MB
Split by using
split -b 15M sosreport-overcloud-controller-0.localdomain-20170504153651.tar.xz "sos-contr-"
Could join by 
cat sos-contr-* > sosreport-contr.tar.gz

Comment 41 Yuri Obshansky 2017-05-04 16:36:11 UTC
Created attachment 1276395 [details]
sosreport controller part ab

Comment 42 Yuri Obshansky 2017-05-04 16:37:28 UTC
Created attachment 1276396 [details]
sosreport controller part ac

Comment 43 anil venkata 2018-06-26 07:07:42 UTC
This needs a fresh setup to understand the performance issues, then a couple of patches to fix them. Better to do scale testing on later branches than OSP7.


Note You need to log in before you can comment on or make changes to this bug.