Bug 1002178

Summary: Instances can't access neutron-metadata-agent when installing it in a different machine using packstack
Product: Red Hat OpenStack Reporter: Rami Vaknin <rvaknin>
Component: openstack-packstackAssignee: Martin Magr <mmagr>
Status: CLOSED NOTABUG QA Contact: Ofer Blaut <oblaut>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: amuller, aortega, derekh, lpeer, mmagr, nyechiel, oblaut, twilson, yeylon, yfried
Target Milestone: Upstream M2   
Target Release: 5.0 (RHEL 7)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-19 11:03:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rami Vaknin 2013-08-28 14:52:55 UTC
Version:
Grizzly on rhel6.4
openstack-quantum-2013.1.3-1.el6ost.noarch
puddle 2013-08-27.1

Description:
I've installed openstack on 3 bare-metal machines using packstack:
1. Services node, including quantum-server
2. L3+DHCP+Compute (running quantum-ns-metadata-proxy)
3. quantum-metadata-agent

CONFIG_QUANTUM_INSTALL=y
CONFIG_QUANTUM_SERVER_HOST=10.35.160.29
CONFIG_QUANTUM_USE_NAMESPACES=y
CONFIG_QUANTUM_L3_HOSTS=10.35.160.27
CONFIG_QUANTUM_DHCP_HOSTS=10.35.160.27
CONFIG_QUANTUM_L2_PLUGIN=openvswitch
CONFIG_QUANTUM_METADATA_HOSTS=10.35.160.23

Instances can't access the quantum-metadata-agent.

It looks like packstack has not configured the quantum-metadata-agent well so quantum-metadata-agent still listens on unix socket domain in order to get requests from the quantum-ns-metadata-proxy instead of listening on regular socket.

By sniffing the quantum-ns-metadata-proxy while sending a "curl http://169.254.169.254/openstack", the quantum-ns-metadata-proxy answer the instance with "500 Internal Server Error", probably because it can't communicate with the quantum-metadata-agent.

# ps -elf | grep metadata
0 S quantum   5618     1  0  80   0 - 65352 ep_pol 13:17 ?        00:00:00 python /usr/bin/quantum-metadata-agent --log-file /var/log/quantum/metadata-agent.log --config-file /usr/share/quantum/quantum-dist.conf --config-file /etc/quantum/quantum.conf --config-file /etc/quantum/metadata_agent.ini
0 S root     28953 28289  0  80   0 - 25812 pipe_w 17:21 pts/1    00:00:00 grep metadata

# lsof -p 5618 | grep unix
python  5618 quantum    4u  unix 0xffff8808312143c0      0t0    18336 /var/lib/quantum/metadata_proxy

# tail -19 /var/log/quantum/quantum-ns-metadata-proxybf5221f4-f797-4f2c-a429-cf84e388f15b.log 
2013-08-28 17:46:11    ERROR [quantum.agent.metadata.namespace_proxy] Unexpected error.
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/quantum/agent/metadata/namespace_proxy.py", line 81, in __call__
    req.body)
  File "/usr/lib/python2.6/site-packages/quantum/agent/metadata/namespace_proxy.py", line 112, in _proxy_request
    connection_type=UnixDomainHTTPConnection)
  File "/usr/lib/python2.6/site-packages/httplib2/__init__.py", line 1447, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/lib/python2.6/site-packages/httplib2/__init__.py", line 1199, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/lib/python2.6/site-packages/httplib2/__init__.py", line 1173, in _conn_request
    conn.connect()
  File "/usr/lib/python2.6/site-packages/quantum/agent/metadata/namespace_proxy.py", line 55, in connect
    self.sock.connect(cfg.CONF.metadata_proxy_socket)
  File "/usr/lib/python2.6/site-packages/eventlet/greenio.py", line 167, in connect
    while not socket_connect(fd, address):
  File "/usr/lib/python2.6/site-packages/eventlet/greenio.py", line 37, in socket_connect
    raise socket.error(err, errno.errorcode[err])
error: [Errno 2] ENOENT

Comment 2 Alvaro Lopez Ortega 2013-11-15 12:06:19 UTC
@twilson, your help would be greatly appreciated here.

Comment 3 Martin Magr 2013-11-18 14:50:55 UTC
Can you please check this again with most recent Grizzly build? I installed the same architecture as you suggested, but did not encounter your problem.

Comment 4 Rami Vaknin 2013-11-20 12:52:46 UTC
Grizzly? this bug is targeted to 4.0 so I assume you meant 4.0 (correct me if I'm wrong).

This bug reproduces on the latest 4.0 puddle (2013-11-18.8) on 6.5, exactly the same way with the same exception, still the metadata-agent machine opens a unix socket domain and by sniffing I can tell it does not try to access the nova_metadata_ip and nova_metadata_port in its configuration file.
The instance still gets and error 500 page.

Please contact me by email/irc for my env's details if you need it for more debugging (please soon).

I would also love to see your env where it was not reproduced.

my current env:
2 compute-only nodes
1 dhcp + metadata agent
1 all the rest - controller with all the services + ns-metadata-proxy except for dhcp/compute/metadata agent.

Comment 5 Martin Magr 2013-11-20 15:24:14 UTC
<mmagr> rvaknin, so ... if I understand it corectly 10.35.160.27 (puma09) is set as metadata host, but puma10 (10.35.160.29) is where the NOENT tracebacks are located right ... so that's the reason why I didn't see the tracebacks ... I was checking only metadata host's log
<rvaknin> mmagr, puma09 is the metadata-agent
<rvaknin> mmagr, puma10 is ns-metadata-proxy + nova
<rvaknin> mmagr, you also see the unix socket domain in puma09
<rvaknin> mmagr, while it shouldn't be created at all
<mmagr> rvaknin, I'm not a neutron expert, but I suppose the fix would be setting enable_metadata on hosts which are not supposed to be metadata hosts
<mmagr> rvaknin, I know, I'm not sure why metadata proxy is running on puma10 ... it should not IMHO since it has invalid config there
<rvaknin> mmagr, the proxy should run on puma09
<rvaknin> mmagr, the proxy is the one sits on the l3 machine becuase it can be accessed from all the instances due to the qrouter namespace
<rvaknin> mmagr, if you didn't have a proxy - the requests from the proxy won't be able to access the metadata agent
<rvaknin> mmagr, and the agent is the one that pass the requests to nova's metadata server

So after my investigation, opening of the socket is based on metadata_proxy_socket parameter in /etc/neutron/metadata_agent.ini. I'm not sure if we have to set it correctly (currently it's commented so defaults to $state_path/metadata_proxy). I tried to set all values in /etc/neutron/metadata_agent.ini at puma09 exactly as they are set at puma10. This fixed traceback throwing and also socket has not been created, but I'm not sure if this is correct fix ... since I know almost nothing about Neutron.

Comment 6 Bob Kukura 2013-12-10 19:07:51 UTC
I'm not very familiar with the metadata proxy in neutron, but spoke with Terry Wilson about this, and he's willing to look at it.

Comment 8 Martin Magr 2014-01-17 10:55:49 UTC
Mail notification came, but I don't see the comment here, so reposting. (Oh how I looove bugzilla :) )

Product: Red Hat OpenStack
Version: 3.0
Component: openstack-packstack

Terry Wilson <twilson> has asked Martin Magr <mmagr> for
needinfo:
Bug 1002178: Instances can't access quantum-metadata-agent when installing it
in a different machine using packstack
https://bugzilla.redhat.com/show_bug.cgi?id=1002178


------- Additional Comments from Terry Wilson <twilson>
From what I can tell, the metadata agent and the l3 agent must be on the same
host. The l3 agent launches the neutron-ns-metadata-agent which communicates
over a unix socket to the metadata agent.

We need to update the documentation to reflect this, but I don't think there is
anything else to do without revamping how those processes communicate.

mmagr: We might consider just removing the METADATA_HOSTS parameter and making
it equal the L3_HOSTS. Is there a way that we can remove the option without
breaking existing answer files?

Comment 9 Martin Magr 2014-01-17 10:58:26 UTC
Extra answer file parameters are not a problem, so removing METADATA_HOSTS from neutron plugin will make packstack ignoring it, if it will stay in answer file.

Comment 11 Assaf Muller 2014-02-19 14:53:17 UTC
How about removing the metadata agent/proxy option from the Packstack answer file, but having Packstack install the metadata agent on every node that it installs the L3 agent on?

Comment 12 Terry Wilson 2014-02-20 22:31:39 UTC
Assaf: I have suggested the same thing above, so I agree :).

Comment 13 Martin Magr 2014-04-30 09:58:44 UTC
Note that this issue will be fixed by implementation of the following blueprint:
https://blueprints.launchpad.net/packstack/+spec/simplification

Relevant patch is here:
https://review.openstack.org/#/c/79999/

Comment 14 yfried 2014-05-18 06:02:36 UTC
Assaf, Terry, Martin,
There's a mode of work for networks without routers where MD-proxy is accessed via dhcp-agent instead of l3-agent. 
Will this be affected if we remove the ability to install neutron-MD-agent separately?
http://techbackground.blogspot.co.il/2013/06/metadata-via-dhcp-namespace.html

Comment 15 Assaf Muller 2014-05-19 14:38:34 UTC
@Yair: With Packstack's move to compute, controller and network nodes this bug becomes non-relevant. The metadata agent will reside along with both the l3 and dhcp agents on the network node(s).