Bug 1570136
| Summary: | All HA routers go into backup state - becoming unusable | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | kforde |
| Component: | openstack-neutron | Assignee: | Assaf Muller <amuller> |
| Status: | CLOSED EOL | QA Contact: | Toni Freger <tfreger> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 11.0 (Ocata) | CC: | alhernan, amuller, chrisw, jlibosva, kforde, nyechiel, sbaker, srevivo |
| Target Milestone: | --- | Keywords: | Triaged, ZStream |
| Target Release: | 11.0 (Ocata) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-06-22 12:35:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
kforde
2018-04-20 17:18:14 UTC
Hi Kieran, The keepalived configuration pasted in comment 1 seems correct, router replicas are supposed to be set up with the same configuration. Next time this reproduces, can you please attach sosreports from all controllers but more importantly grab someone from the Networking team such as Brian, Jakub, Slawek or Bernard? Hi Assaf, The router config in comment 1 is for two different routers in the same tenant. That doesn't appear correct to me ... is it? From what I can see the two separate routers above are put in the same VRRP group and so we have a situation with 6 routers made up of 1 master and 5 backups, instead of 2 separate routers comprised of 1 master + 2 backups each. Maybe I totally on the wrong track? Kieran and I had a call this morning. We can confirm there is an issue in vr_id allocation as the second tenant router has same vr_id as the first router. There is only a single vr_id allocation for two routers. We created third router and it got a new allocation correctly. Fortunately, we do have debug logs from the time second router was created so after looking into those we shall hopefully find the cause. Turns out the logs were not in DEBUG mode as we recently did a deploy that overwrote out DEBUG mode settings! These logs didn't show anything obvious causing this issue. We haven't see a repeat of this for the past 4 days. I will enable debugging again, and see if the problem happens again. OSP11 is now retired, see details at https://access.redhat.com/errata/product/191/ver=11/rhel---7/x86_64/RHBA-2018:1828 |