Bug 1280487

Summary: Nova compute manager fails to connect to Neutron; errors from nova-ceph due to file descriptor limits
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: openstack-novaAssignee: Eoghan Glynn <eglynn>
Status: CLOSED NOTABUG QA Contact: nlevinki <nlevinki>
Severity: high Docs Contact:
Priority: high    
Version: 6.0 (Juno)CC: berrange, dasmith, eglynn, jmelvin, kchamart, lyarwood, ndipanov, pbrady, sbauza, sferdjao, sgordon, vromanso, yeylon
Target Milestone: ---Keywords: Unconfirmed, ZStream
Target Release: 6.0 (Juno)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-17 20:55:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeremy 2015-11-11 20:53:47 UTC
Description of problem:

We are seeing the following errors in the /var/log/nova-compute.log on all our compute nodes. We need to determine what is causing this:

2015-11-11 07:41:50.239 20900 ERROR nova.compute.manager [-] [instance: 2ae6a0ea-dacf-4944-a915-038547533667] An error occurred while refreshing the network cache.
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667] Traceback (most recent call last):
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5432, in _heal_instance_info_cache
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     self._get_instance_nw_info(context, instance, use_slave=True)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1292, in _get_instance_nw_info
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     instance)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 599, in get_instance_nw_info
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     port_ids)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 613, in _get_instance_nw_info
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     port_ids)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 1349, in _build_network_info_model
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     data = client.list_ports(**search_opts)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/__init__.py", line 84, in wrapper
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     ret = obj(*args, **kwargs)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 98, in with_params
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     ret = self.function(instance, *args, **kwargs)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 312, in list_ports
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     **_params)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/__init__.py", line 84, in wrapper
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     ret = obj(*args, **kwargs)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 1334, in list
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     for r in self._pagination(collection, path, **params):
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 1347, in _pagination
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     res = self.get(path, params=params)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/__init__.py", line 84, in wrapper
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     ret = obj(*args, **kwargs)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 1320, in get
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     headers=headers, params=params)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/__init__.py", line 84, in wrapper
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     ret = obj(*args, **kwargs)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 1297, in retry_request
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     headers=headers, params=params)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/__init__.py", line 84, in wrapper
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     ret = obj(*args, **kwargs)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 1240, in do_request
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     content_type=self.content_type())
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 187, in do_request
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     self.endpoint_url + url, method, **kwargs)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]   File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 133, in _cs_request
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]     raise exceptions.ConnectionFailed(reason=e)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667] ConnectionFailed: Connection to neutron failed: HTTPConnectionPool(host='10.29.1.30', port=9696): Max retries exceeded with url: /v2.0/ports.json?tenant_id=6a8d91c2ae924da8b3f2fdb2ec7bd2f9&device_id=2ae6a0ea-dacf-4944-a915-038547533667 (Caused by <class 'socket.error'>: [Errno 24] Too many open files)
2015-11-11 07:41:50.239 20900 TRACE nova.compute.manager [instance: 2ae6a0ea-dacf-4944-a915-038547533667]

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Kashyap Chamarthy 2015-11-13 14:48:28 UTC
I'm no Ceph expert, but, for the "Too many open files" error from the 2015-11-06-14xxxx/lshkvm211a/var/log/nova/rbd-nova.log, seems like you may have to tune `ulimit` settings:

The below enumerates your hard and soft limits:

    ulimit -Sa
    ulimit -Ha

If that's not sufficient for Ceph, you may have to tune them.

See also related thread on Ceph users list:

    http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-June/040630.html