Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
During HA testing of our OSP-d deployed OpenStack, we found that a VIP does not fail over when the interface it reside on is disconnected.
In our environment, our provisioning interface is not bonded, rather it is just using a single NIC. During HA testing, we disabled the switch port (which I guess is equivalent to pulling the cable) of the provisioning interface on the controller hosting ctlplane VIP. The one the keystone admin endpoint is on.
Sep 28 09:48:26 overcloud-controller-0.localdomain kernel: be2net 0000:11:00.1 enp17s0f1: Link is Down
Sep 28 09:48:26 overcloud-controller-0.localdomain NetworkManager[1344]: <info> (enp17s0f1): link disconnected
Sep 28 09:48:28 overcloud-controller-0.localdomain kernel: be2net 0000:11:00.1 enp17s0f1: Link is Down
And that's pretty much all that happened. The VIP did _not_ fail over and the server did _not_ get fenced. Needless to say, the OverCloud was less then fully operational...
Version-Release number of selected component (if applicable):
OSP8, deployed using OSP-d
openstack-tripleo-heat-templates-0.8.14-18.el7ost.noarch
resource-agents-3.9.5-54.el7_2.16
How reproducible:
Every time
Steps to Reproduce:
1. Pull the cable on the admin interface on the controller hosting the ctlplane VIP
2. Try accessing the keystone admin interface
Additional info:
One way around this might be to set up an ethmonitor resource, as described in https://access.redhat.com/solutions/2044713
Hi David,
Can you elaborate on the way you "disabled" the interface?
The way the interface is disconnected can affect the HA action.
Comment 2Fabio Massimo Di Nitto
2016-09-29 08:44:19 UTC
The VIP resource agent does not monitor eth status. This is by design and that´s why there is a ethmonitor agent.
Also please note that we cannot deploy ethmonitor automatically either. Some environment (for instance virt environment) won´t notice a cable pull (host doesn´t propagate eth status to the VM attached to a given eth).
This is expected behaviour that can be changed by using ethmonitor or pingd agent.
Comment 3Fabio Massimo Di Nitto
2016-09-29 08:45:57 UTC
(In reply to Fabio Massimo Di Nitto from comment #2)
> The VIP resource agent does not monitor eth status. This is by design and
> that´s why there is a ethmonitor agent.
>
> Also please note that we cannot deploy ethmonitor automatically either. Some
> environment (for instance virt environment) won´t notice a cable pull (host
> doesn´t propagate eth status to the VM attached to a given eth).
>
> This is expected behaviour that can be changed by using ethmonitor or pingd
> agent.
Forgot to mention that the ethtool / mii-tool status detection is strictly dependent on kernel driver. If the kernel driver doesn´t export link-status, the output is moot.
There may be corner cases we can't handle, but I think its reasonable to expect the IPaddr2 agent can handle common situations where an address has been configured but the interface is not available.
In response to #1, I had help from the network guy, but I believe he did the equivalent of pulling the cable, but inside a blade chassis.
Also, in response to #2 and #3, I agree with Andrew in #4.
Even if link status is not fool-proof, I don't see that it would do any harm to _try_ to take action based on link-status.
Further, from a quick read of the ethmonitor agent man-page, it apparently also can react based on whether an arping is successful or not, which would make it independent of whether link status can be detected.
Also, I noticed the Bz was moved away from OpenStack to RHEL, so I guess it's now about modifying the IPaddr2 agent to include the functionality of the ethmonitor. This of course would be just OK, and would save us complicating the OSP cluster layout even further