Description of problem: We have been able to reproduce the VRRP transition flapping behavior in a scaled down environment. I have also attached a dump of the list of packages installed in this environment. Here are the details: Hardware Specs: CPUs 2 Cores 16 Enabled Cores 16 Threads 32 Available Memory 393216 Total Memory 393216 Memory Speed 1600 Adapters 1 InterfacesTotal NICs 8 Environment: 3 Network Nodes 1026 - HA Routers 1206 - Floating IP's 444 - Load Balancer Pools 1052 - Networks How To Reproduce: 1. Start with all 3 network nodes in service 2. Add the resources to OpenStack as described above. At this point we observed the environment to be fairly stable. 3. Force a failover by stopping all Neutron services on 1 of the network nodes and stopping all keepalived processes (a reboot would suffice). 4. At this point you may begin to observe a router transition storm start to occur. 5. Restart Neutron services on the network node taken out of service in step 3. It is at this step that the environment begins to descend into a massive router transition storm. Version-Release number of selected component (if applicable): How reproducible: Every time Steps to Reproduce: 1. Have lots of HA routers 2. Reboot one of the servers 3. Actual results: Flapping storm occurs where all HA router will begin flapping between MASTER and BACKUP Expected results: Should transition to BACKUP or MASTER once Additional info:
Proposed change https://review.openstack.org/#/c/470905/ to fix [1] [1] https://bugs.launchpad.net/neutron/+bug/1597461
Hi Anil, The customer had another outage recently. In your opinion, how long will it take to deliver a hotfix? - Andreas
Hi Andreas This bug needs 3 patches(one u/s patch not yet merged, I hope it gets merged soon) to be backported to OSP7. These patches in newer branches are using RPC calls which are not in Kilo(and also huge code changes) making backporting difficult. I am working on this with priority and trying to provide build within this week. Thanks Anil
Below patches are ready for review. I will ask my team members to review them with priority https://code.engineering.redhat.com/gerrit/#/c/101640/ https://code.engineering.redhat.com/gerrit/#/c/101642/ https://code.engineering.redhat.com/gerrit/#/c/109264/
*** Bug 1461244 has been marked as a duplicate of this bug. ***
When trying to do code verification I couldn't find get_routers_id function under neutron/api/rpc/handlers/l3_rpc.py in the code on the controller. Any reason for that?
Code verification that Red Hat Engineering Gerrit in this bug exists in the code for openstack-neutron-2015.1.4-16.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1747
I know that this is closed, but just for future reference, when this issue occured, the following could be observed: - IN l3 agent log, lot of router transitioning messages are seen: ~~~ # grep 'transitioned to' var/log/neutron/l3-agent.log | tail 2017-02-21 10:28:38.094 74155 INFO neutron.agent.l3.ha [-] Router f979011b-944b-4f49-887d-feace356f9f7 transitioned to master 2017-02-21 10:28:38.426 74155 INFO neutron.agent.l3.ha [-] Router 9392d069-e589-46a8-8979-294c98b03040 transitioned to master 2017-02-21 10:28:43.579 74155 INFO neutron.agent.l3.ha [-] Router 23fbce01-37da-4f1f-8af9-105510086966 transitioned to master 2017-02-21 10:28:43.616 74155 INFO neutron.agent.l3.ha [-] Router 1f8e167b-8987-47b7-b1ba-4de944c986a3 transitioned to master 2017-02-21 10:28:43.654 74155 INFO neutron.agent.l3.ha [-] Router 00476f59-ae81-4dc4-bce1-e5dc3b05f2c3 transitioned to master 2017-02-21 10:28:43.783 74155 INFO neutron.agent.l3.ha [-] Router 1fbbbc96-693a-4e94-bc86-801e27934412 transitioned to master 2017-02-21 10:28:43.803 74155 INFO neutron.agent.l3.ha [-] Router e83726b8-3197-4d34-b4bc-cbfc940399e1 transitioned to master 2017-02-21 10:28:57.864 74155 INFO neutron.agent.l3.ha [-] Router 035db19e-3826-41cf-b40a-d253f36ede84 transitioned to master 2017-02-21 10:28:57.969 74155 INFO neutron.agent.l3.ha [-] Router 34c89792-bb5f-481e-9ec2-fb71409b548e transitioned to master 2017-02-21 10:28:59.003 74155 INFO neutron.agent.l3.ha [-] Router 31be9964-23fe-4b56-a57b-3247c068d7c8 transitioned to master ~~~