Hide Forgot
Description of problem: Openstack neutron router was active on two controller nodes simultaneously. Cu removed the ports from router after that it started running only on one controller node. Output at time of issue : # neutron l3-agent-list-hosting-router PUSH_ROUTER +--------------------------------------+-----------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+-----------+----------------+-------+----------+ | f42f49d9-8651-4963-8722-66c8aafa39b3 | ospctrl01 | True | :-) | active | | e58a950d-e348-4804-a000-9b8ca163fced | ospctrl02 | True | :-) | active | | 0f55da82-46f1-4354-bece-aadcec1f1924 | ospctrl03 | True | :-) | standby | +--------------------------------------+-----------+----------------+-------+----------+ Output after issue fixed: # neutron l3-agent-list-hosting-router PUSH_ROUTER +--------------------------------------+-----------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+-----------+----------------+-------+----------+ | f42f49d9-8651-4963-8722-66c8aafa39b3 | ospctrl01 | True | :-) | active | | e58a950d-e348-4804-a000-9b8ca163fced | ospctrl02 | True | :-) | standby | | 0f55da82-46f1-4354-bece-aadcec1f1924 | ospctrl03 | True | :-) | standby | +--------------------------------------+-----------+----------------+-------+----------+ Version-Release number of selected component (if applicable): RHEL OSP 7 How reproducible: First time issue seen. Steps to Reproduce: 1. 2. 3. Actual results: neutron router was running on two controller nodes due to which instances were not reachable using floating IP. Expected results: neutron router in any case should be running on a single controller node. Additional info: Adding more info in next private comment.
Also, this might or might not be related to an upstream bug currently in flight: https://bugs.launchpad.net/neutron/+bug/1580648. I'm posting this here for future reference.
Lastly, this sounds a bit like https://bugzilla.redhat.com/show_bug.cgi?id=1181592. Miguel, can you take a look at the logs please?
Hey, I needed the /var/log/messages, /etc/hosts and some other details to confirm jschwarz's theory in Comment 7, that sounds very reasonable, @vikrant, check that specific bz. Could you post the full sosreport logs for confirmation, please? Extra details: This happens because keepalived in the qrouter namespace does not have access to the host defined DNS in /etc/resolv.conf, and keepalive tries to resolv the IP address of the current host via DNS and locks for 60 seconds (stopping VRRP, so other host transitions in as MASTER) You can set a workaround in place with instructions in: https://bugzilla.redhat.com/show_bug.cgi?id=1181592#c12 So when keepalived tries to resolv it, it will be found in /etc/hosts, and DNS query will be avoided. Best regards.
I would also like to add that all the flip-flop transitions doesn't occur sporadically (i.e. during the entire day), but during specific times of the day (mostly 14:00 - 01:00, which can be considered normal depending on the time zone). This can go towards the idea of some kind of user actions done on the setup, which in turns causes a re-write of the keepalived.conf, causing the process to reload the configuration file and then encounter the DNS issue. If you could also ask what he was doing during the times where the issue was encountered (comment #2), i.e. if he was adding new VMs, etc - that would be also very helpful.
Any update , user is hitting same issue again & Again
Apologizes, for some reason I didn't receive email notifications about this Bugzilla. From a brief look at the logs, it looks like the flip-flop pattern occurs once every 37-40 seconds consistently, which implies that there might indeed be an issue with the DNS. Miguel, please have a look at the log and let me know what you think. Also, Anil, can we ask the user to run the command specified in comment #9 on each of the network nodes (specifically also on ospctrl02 and ospctrl03): dig A $(hostname) | grep -A1 "ANSWER SEC" | tail -n 1 | awk '{print $NF " " $1}' | sed -e 's/.$//g' >>/etc/hosts ; grep $(hostname) /etc/hosts || echo "Failure setting up the hostname entry" We'll make sure to follow up on this.