Bug 1169408
Summary: | Neutron router interface port creation fails with radvd >= 2.0 due to blocked router update processing | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Nir Magnezi <nmagnezi> | ||||||||||
Component: | openstack-neutron | Assignee: | Ihar Hrachyshka <ihrachys> | ||||||||||
Status: | CLOSED NEXTRELEASE | QA Contact: | Nir Magnezi <nmagnezi> | ||||||||||
Severity: | low | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | rawhide | CC: | apevec, chrisw, ihrachys, jlibosva, lhh, lpeer, majopela, nyechiel, oblaut, p, rk, twilson, yeylon | ||||||||||
Target Milestone: | --- | Keywords: | MoveUpstream | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2015-01-08 15:45:12 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1046786, 1083891 | ||||||||||||
Attachments: |
|
Created attachment 963326 [details]
port_down
Created attachment 963327 [details]
port_active
Created attachment 963340 [details]
IPv6 plan log
As was said before, the problem is with one of the routers. Other routers work fine. Some log reading below: When a subnet is attached to the router, the following can be found in server.log on l3 agent side: 2014-11-30 11:29:52.352 2686 DEBUG neutron.agent.l3_agent [req-06a126a5-fac3-4f5c-915e-07cb51da821b None] Got routers updated notification :[u'c480186a-0a2f-4f1c-b0d1-89251760f9cd'] routers_updated /usr/lib/python2.7/site-packages/neutron/agent/l3_agent.py:1763 Though there are no "Starting router update for %s" messages in the log. They were initially shown in logs some time ago four times, but then disappeared. 2014-11-27 15:15:54.005 2686 DEBUG neutron.agent.l3_agent [-] Starting router update for c480186a-0a2f-4f1c-b0d1-89251760f9cd _process_router_update /usr/lib/python2.7/site-packages/neutron/agent/l3_agent.py:1830 2014-11-27 15:15:57.433 2686 DEBUG neutron.agent.l3_agent [-] Starting router update for c480186a-0a2f-4f1c-b0d1-89251760f9cd _process_router_update /usr/lib/python2.7/site-packages/neutron/agent/l3_agent.py:1830 2014-11-27 15:16:03.493 2686 DEBUG neutron.agent.l3_agent [-] Starting router update for c480186a-0a2f-4f1c-b0d1-89251760f9cd _process_router_update /usr/lib/python2.7/site-packages/neutron/agent/l3_agent.py:1830 2014-11-27 15:16:33.067 2686 DEBUG neutron.agent.l3_agent [-] Starting router update for c480186a-0a2f-4f1c-b0d1-89251760f9cd _process_router_update /usr/lib/python2.7/site-packages/neutron/agent/l3_agent.py:1830 2014-11-27 15:16:47.848 2686 DEBUG neutron.agent.l3_agent [-] Starting router update for c480186a-0a2f-4f1c-b0d1-89251760f9cd _process_router_update /usr/lib/python2.7/site-packages/neutron/agent/l3_agent.py:1830 So the update notification is received from controller, and probably put into RouterUpdate queue, but then not served properly. Code wise, for updates, there is a RouterProcessingQueue that provides parallel access to updates via each_update_to_next_router() generator that is intended to serialize parallel update requests by giving exclusive read access to update queue to one of the green threads that serve _process_router_update() calls. The code of the queue and exclusive accessor seems quite hacky and may indeed contain some race condition that would end up blocking any new updates to some unlucky router. It could also be the case that _process_router_update() greenthread pool is full and locked, so no new updates are actually served, but in that case new requests for other routers wouldn't be served properly. Also, no l3 agent file locks seem to be set for a long time in /var/lib/neutron/... Reproduction scenario narrowed down to this: 1. Attach an IPv4 subnet --> port is active 2. Attach an IPv6 RADVD (stateless, stateful, slacc) subnet --> port is active 3. from that point and on, this router is broken and all attachments stop working. port status remain down. So it's radvd thing. Once I downgrade to radvd < 2.0, everything works fine again. Looking at the diff between 1.14 and 2.0, daemonization code was changed significantly for the 2.0 release. (In reply to Ihar Hrachyshka from comment #8) > So it's radvd thing. Once I downgrade to radvd < 2.0, everything works fine > again. Looking at the diff between 1.14 and 2.0, daemonization code was > changed significantly for the 2.0 release. Hi Ihar, Nice catch :) What do you think would be the right way to proceed here? Should we file a bug for radvd or downgrade the version? Thanks, Nir @Nir, we should fix neutron to work properly with all versions of radvd. It's as easy as passing additional '-m syslog' argument to radvd to make it close stderr in all versions of the daemon. I'm working on fixing it in u/s. Now that we dropped radvd 2.+ from RDO and RHOSP, we may downgrade the bug to Rawhide, the only Red Hat project that includes the new radvd version. |
Created attachment 963325 [details] tests Description of problem: ======================= At a certain point, while testing neutron for IPv6, neutron ports created for router interfaces remind DOWN upon attachment. This happened twice while running the same test plan and at similar stage. Since there is no short reproduction scenario, I'll attach files with the tests I executed. In addition, I noticed that while this issue persists with the router I created, when I create an additional router (without removing the "problematic" one), router interface attachments (hence, port creations) works fine. I will add information to this bug as we discover more. Version-Release number of selected component (if applicable): ============================================================= RHEL-OSP6-Beta: openstack-neutron-2014.2-11.el7ost.noarch How reproducible: ================= Reproduced 2 times so far. Steps to Reproduce: =================== * See attached file named: tests * The steps below sums the reproduction when the issue takes place in the Openstack setup 1. create a neutron router 2. create network, subnet (I used both IPv4 & IPv6) 3. attach the subnet to the router Actual results: =============== As described in: 'Description of problem' for server.log when port creation fails: see attached: port_down file You'll notice that it does not reach to the 'Attempting to bind port' part. Expected results: ================= Should work OK. for server.log when port creation succeeds: see attached: port_active file Additional info: ================ Will probably be found in the upcoming comments.