| Summary: | neutron router was active on two controller nodes | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | VIKRANT <vaggarwa> |
| Component: | openstack-neutron | Assignee: | John Schwarz <jschwarz> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Toni Freger <tfreger> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.0 (Kilo) | CC: | adhingra, amuller, chrisw, jlibosva, jschwarz, majopela, nyechiel, srevivo, vaggarwa |
| Target Milestone: | async | Keywords: | ZStream |
| Target Release: | 7.0 (Kilo) | Flags: | vaggarwa:
needinfo-
|
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-12-08 11:32:26 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
VIKRANT
2016-09-01 09:20:37 UTC
Also, this might or might not be related to an upstream bug currently in flight: https://bugs.launchpad.net/neutron/+bug/1580648. I'm posting this here for future reference. Lastly, this sounds a bit like https://bugzilla.redhat.com/show_bug.cgi?id=1181592. Miguel, can you take a look at the logs please? Hey, I needed the /var/log/messages, /etc/hosts and some other details to confirm jschwarz's theory in Comment 7, that sounds very reasonable, @vikrant, check that specific bz. Could you post the full sosreport logs for confirmation, please? Extra details: This happens because keepalived in the qrouter namespace does not have access to the host defined DNS in /etc/resolv.conf, and keepalive tries to resolv the IP address of the current host via DNS and locks for 60 seconds (stopping VRRP, so other host transitions in as MASTER) You can set a workaround in place with instructions in: https://bugzilla.redhat.com/show_bug.cgi?id=1181592#c12 So when keepalived tries to resolv it, it will be found in /etc/hosts, and DNS query will be avoided. Best regards. I would also like to add that all the flip-flop transitions doesn't occur sporadically (i.e. during the entire day), but during specific times of the day (mostly 14:00 - 01:00, which can be considered normal depending on the time zone). This can go towards the idea of some kind of user actions done on the setup, which in turns causes a re-write of the keepalived.conf, causing the process to reload the configuration file and then encounter the DNS issue. If you could also ask what he was doing during the times where the issue was encountered (comment #2), i.e. if he was adding new VMs, etc - that would be also very helpful. Any update , user is hitting same issue again & Again Apologizes, for some reason I didn't receive email notifications about this Bugzilla. From a brief look at the logs, it looks like the flip-flop pattern occurs once every 37-40 seconds consistently, which implies that there might indeed be an issue with the DNS. Miguel, please have a look at the log and let me know what you think. Also, Anil, can we ask the user to run the command specified in comment #9 on each of the network nodes (specifically also on ospctrl02 and ospctrl03): dig A $(hostname) | grep -A1 "ANSWER SEC" | tail -n 1 | awk '{print $NF " " $1}' | sed -e 's/.$//g' >>/etc/hosts ; grep $(hostname) /etc/hosts || echo "Failure setting up the hostname entry" We'll make sure to follow up on this. |