Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be unavailable on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1574950 - HA router can become master but won't be reported in database
Summary: HA router can become master but won't be reported in database
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: zstream
: ---
Assignee: Slawek Kaplonski
QA Contact: Toni Freger
URL:
Whiteboard:
Depends On: 1563443
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-04 12:05 UTC by Jakub Libosvar
Modified: 2021-06-22 00:32 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1563443
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1824856 0 None None None 2019-04-15 17:16:18 UTC

Description Jakub Libosvar 2018-05-04 12:05:11 UTC
In case there is an issue with filesystem on the node running l3-agent and HA router changed its status but because of the issue can't write to state file, it can become a master (having FIPs, sending garsp) but won't be reported as master in statefile and neutron database.

Also the issue is swallowed and reported message about failing parsing output of ip monitor, which is not true.

Comment 4 Assaf Muller 2018-05-18 13:17:28 UTC
Jakub and I spoke about this issue and I wanted to provide our thoughts. In the case where a router replica transitions from standby to active (but also in other cases), it might happen that the keepalived-state-change-monitor encounters an error (for example in this case as a result of a permissions issue in /var/lib/neutron), but generally speaking under any error condition, we thought that keepalived-state-change-monitor should update the L3 agent that an error has occurred. Then the L3 agent would put that router replica in 'ERROR' state and update neutron-server, which would update the DB and API responses. This would allow the operator to know that an error happened for that particular router replica and that they should investigate. Bonus points if we also have keepalived-state-change-monitor send the actual error message to the agent. We'd then update the RPC format between the agent and the server and add a DB field like 'error_message' which we could display to the operator.


Note You need to log in before you can comment on or make changes to this bug.