Red Hat Bugzilla – Bug 1002178
Instances can't access neutron-metadata-agent when installing it in a different machine using packstack
Last modified: 2016-04-26 11:49:48 EDT
Grizzly on rhel6.4
I've installed openstack on 3 bare-metal machines using packstack:
1. Services node, including quantum-server
2. L3+DHCP+Compute (running quantum-ns-metadata-proxy)
Instances can't access the quantum-metadata-agent.
It looks like packstack has not configured the quantum-metadata-agent well so quantum-metadata-agent still listens on unix socket domain in order to get requests from the quantum-ns-metadata-proxy instead of listening on regular socket.
By sniffing the quantum-ns-metadata-proxy while sending a "curl http://169.254.169.254/openstack", the quantum-ns-metadata-proxy answer the instance with "500 Internal Server Error", probably because it can't communicate with the quantum-metadata-agent.
# ps -elf | grep metadata
0 S quantum 5618 1 0 80 0 - 65352 ep_pol 13:17 ? 00:00:00 python /usr/bin/quantum-metadata-agent --log-file /var/log/quantum/metadata-agent.log --config-file /usr/share/quantum/quantum-dist.conf --config-file /etc/quantum/quantum.conf --config-file /etc/quantum/metadata_agent.ini
0 S root 28953 28289 0 80 0 - 25812 pipe_w 17:21 pts/1 00:00:00 grep metadata
# lsof -p 5618 | grep unix
python 5618 quantum 4u unix 0xffff8808312143c0 0t0 18336 /var/lib/quantum/metadata_proxy
# tail -19 /var/log/quantum/quantum-ns-metadata-proxybf5221f4-f797-4f2c-a429-cf84e388f15b.log
2013-08-28 17:46:11 ERROR [quantum.agent.metadata.namespace_proxy] Unexpected error.
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/quantum/agent/metadata/namespace_proxy.py", line 81, in __call__
File "/usr/lib/python2.6/site-packages/quantum/agent/metadata/namespace_proxy.py", line 112, in _proxy_request
File "/usr/lib/python2.6/site-packages/httplib2/__init__.py", line 1447, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/usr/lib/python2.6/site-packages/httplib2/__init__.py", line 1199, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/lib/python2.6/site-packages/httplib2/__init__.py", line 1173, in _conn_request
File "/usr/lib/python2.6/site-packages/quantum/agent/metadata/namespace_proxy.py", line 55, in connect
File "/usr/lib/python2.6/site-packages/eventlet/greenio.py", line 167, in connect
while not socket_connect(fd, address):
File "/usr/lib/python2.6/site-packages/eventlet/greenio.py", line 37, in socket_connect
raise socket.error(err, errno.errorcode[err])
error: [Errno 2] ENOENT
@twilson, your help would be greatly appreciated here.
Can you please check this again with most recent Grizzly build? I installed the same architecture as you suggested, but did not encounter your problem.
Grizzly? this bug is targeted to 4.0 so I assume you meant 4.0 (correct me if I'm wrong).
This bug reproduces on the latest 4.0 puddle (2013-11-18.8) on 6.5, exactly the same way with the same exception, still the metadata-agent machine opens a unix socket domain and by sniffing I can tell it does not try to access the nova_metadata_ip and nova_metadata_port in its configuration file.
The instance still gets and error 500 page.
Please contact me by email/irc for my env's details if you need it for more debugging (please soon).
I would also love to see your env where it was not reproduced.
my current env:
2 compute-only nodes
1 dhcp + metadata agent
1 all the rest - controller with all the services + ns-metadata-proxy except for dhcp/compute/metadata agent.
<mmagr> rvaknin, so ... if I understand it corectly 10.35.160.27 (puma09) is set as metadata host, but puma10 (10.35.160.29) is where the NOENT tracebacks are located right ... so that's the reason why I didn't see the tracebacks ... I was checking only metadata host's log
<rvaknin> mmagr, puma09 is the metadata-agent
<rvaknin> mmagr, puma10 is ns-metadata-proxy + nova
<rvaknin> mmagr, you also see the unix socket domain in puma09
<rvaknin> mmagr, while it shouldn't be created at all
<mmagr> rvaknin, I'm not a neutron expert, but I suppose the fix would be setting enable_metadata on hosts which are not supposed to be metadata hosts
<mmagr> rvaknin, I know, I'm not sure why metadata proxy is running on puma10 ... it should not IMHO since it has invalid config there
<rvaknin> mmagr, the proxy should run on puma09
<rvaknin> mmagr, the proxy is the one sits on the l3 machine becuase it can be accessed from all the instances due to the qrouter namespace
<rvaknin> mmagr, if you didn't have a proxy - the requests from the proxy won't be able to access the metadata agent
<rvaknin> mmagr, and the agent is the one that pass the requests to nova's metadata server
So after my investigation, opening of the socket is based on metadata_proxy_socket parameter in /etc/neutron/metadata_agent.ini. I'm not sure if we have to set it correctly (currently it's commented so defaults to $state_path/metadata_proxy). I tried to set all values in /etc/neutron/metadata_agent.ini at puma09 exactly as they are set at puma10. This fixed traceback throwing and also socket has not been created, but I'm not sure if this is correct fix ... since I know almost nothing about Neutron.
I'm not very familiar with the metadata proxy in neutron, but spoke with Terry Wilson about this, and he's willing to look at it.
Mail notification came, but I don't see the comment here, so reposting. (Oh how I looove bugzilla :) )
Product: Red Hat OpenStack
Terry Wilson <email@example.com> has asked Martin Magr <firstname.lastname@example.org> for
Bug 1002178: Instances can't access quantum-metadata-agent when installing it
in a different machine using packstack
------- Additional Comments from Terry Wilson <email@example.com>
From what I can tell, the metadata agent and the l3 agent must be on the same
host. The l3 agent launches the neutron-ns-metadata-agent which communicates
over a unix socket to the metadata agent.
We need to update the documentation to reflect this, but I don't think there is
anything else to do without revamping how those processes communicate.
mmagr: We might consider just removing the METADATA_HOSTS parameter and making
it equal the L3_HOSTS. Is there a way that we can remove the option without
breaking existing answer files?
Extra answer file parameters are not a problem, so removing METADATA_HOSTS from neutron plugin will make packstack ignoring it, if it will stay in answer file.
How about removing the metadata agent/proxy option from the Packstack answer file, but having Packstack install the metadata agent on every node that it installs the L3 agent on?
Assaf: I have suggested the same thing above, so I agree :).
Note that this issue will be fixed by implementation of the following blueprint:
Relevant patch is here:
Assaf, Terry, Martin,
There's a mode of work for networks without routers where MD-proxy is accessed via dhcp-agent instead of l3-agent.
Will this be affected if we remove the ability to install neutron-MD-agent separately?
@Yair: With Packstack's move to compute, controller and network nodes this bug becomes non-relevant. The metadata agent will reside along with both the l3 and dhcp agents on the network node(s).